90% of retail quants don't have a database. Just files.
The fix costs €0. The bug costs you months of research.
Here’s the setup nobody questions. You download data and you save it as a .csv, maybe a .parquet if you’re “serious.” Then you add a folder, then ten more. That’s not a database, that’s a drawer you throw things into.
And it breaks your research in three ways.
1. No history
Every update overwrites the last one, so your history is whatever your last script run decided it would be. If your data provider silently revised last month’s values, you’ll never know. And the exact backtest you ran in March? You can’t replay it. That data is already gone.
2. Writes aren’t atomic
Your update script crashes halfway through, and half the data is written while the other half isn't. So you rerun it, and now you've got duplicates. Your base is quietly corrupted, and you won't find out until it's already in your results.
3. No clean incremental updates
Adding today’s data means rewriting the whole file, or bolting on a fragile append that has no idea what’s already there. New rows, existing rows, your files can’t tell the difference, so every update is a small gamble.
A file is not a database.
The fix isn’t expensive. It costs €0. It’s called Delta Lake: a transactional layer on top of Parquet. ACID writes, time travel, clean incremental updates, the things you actually need.
Next week, I’ll show you how to feed it: crypto and CFDs, live, for €0. That’s QuantLake. More in a few days.


