Quantreo

Max Drawdown Is a Distribution, Not a Number

Lucas — Fri, 12 Jun 2026 14:30:56 GMT

Most track records report a single max drawdown and treat it as a property of the strategy. A risk memo states the max drawdown is 12 percent, and that figure gets used as if the next one cannot be worse without something breaking. It can. That 12 percent is one realization of a path dependent statistic, not a ceiling.

Max drawdown is more fragile than almost any other metric, because it depends on the ordering of returns, not their average. It lives in consecutive loss sequences. Resample that sequence while respecting its structure and the full distribution appears. On the trend strategy below, the observed drawdown sits in the lower tail, and the 95th percentile scenario is roughly 1.5x worse, with nothing changed in the strategy itself.

Subscribe now

1. Why max drawdown is the most fragile number in your backtest

Mean, volatility, Sharpe are averages. Reshuffle your returns and they barely move. Max drawdown is a function of the path: it depends entirely on the order in which losses arrived, because it measures the deepest peak to trough the equity curve reached.

Same returns, same Sharpe, different ordering, different drawdown. Cluster the losses and it deepens, spread them out and it shrinks. So the observed max drawdown is one realized path out of many the same process could produce, closer to a sample maximum than to a stable parameter. Reporting it as a fixed risk limit treats one lucky or unlucky ordering as the boundary of what the strategy can do.

The fix: estimate its distribution by resampling the return stream, while preserving the loss clustering the drawdown lives on. That last constraint is where the bootstrap choice becomes the whole game.

2. Resampling the drawdown

The idea is one sentence: resample the strategy’s return stream in blocks, rebuild the equity curve, take its drawdown, and repeat a few thousand times. The spread of those drawdowns is the distribution the single number was hiding.

Two constraints carry the result. Blocks, not single points: drawdowns live in consecutive losses, so resampling contiguous chunks preserves the clustering that an iid resample destroys and understates. And each resampled path keeps the original length, since max drawdown grows mechanically with the window. One note on interpretation: this holds the strategy fixed and varies only the ordering, which is exactly the question here, how much could sequencing alone have moved my worst loss.

On the trend strategy used here, the observed max drawdown is 17.9 percent. The bootstrap median is 15.7 percent, the 95th percentile is 26.9 percent. The worst loss actually observed sits near the middle of the distribution, and the severe scenario is roughly 1.5x deeper, with nothing changed in the strategy.

The reading is direct. The observed drawdown is a typical draw, not a ceiling. Risk sizing should reference the 95th percentile, not the historical worst: a book you would run at one level of leverage on a 17.9 percent drawdown looks different once a 27 percent drawdown is inside its normal range.

3. One trap, and the takeaway

A tempting variant is to bootstrap the price series instead, rebuild the path, then re-run the strategy on it. It looks equivalent. It is not. For a trend follower with a short block length, resampling prices scrambles the long sequences the signal relies on, so the strategy stops seeing the trend. On the same example, the Sharpe collapses from 0.56 to 0.02. That approach answers a different question, whether the edge survives a reshuffled market, and it is a useful test on its own. It is not a measure of drawdown uncertainty, and the two should not be confused.

The takeaway is the same line we started with. Max drawdown is a draw, not a property. The honest version of a track record is not “the worst loss was 17.9 percent” but “the worst loss was 17.9 percent, and a 27 percent drawdown is well within the normal range of this strategy.” Size the risk on the distribution, not on the single path history happened to hand you.

ML trading or data mining?

Lucas — Fri, 29 May 2026 14:31:25 GMT

Everyone runs ML trading. Almost no one can tell when it quietly turns into data mining.

The code is identical. The tools are the same: XGBoost, cross-validation, feature engineering. Yet one produces alpha, the other produces a statistical illusion that dies in the first week of live trading.

I see this line crossed every day. Not out of incompetence. Out of habit.

1. The line is not technical, it’s methodological

ML trading: you start from an economic hypothesis, you build a model to test it.
Data mining: you start from the data, you search for what works, you rationalize after.

The honest test: can you explain why your feature should work BEFORE seeing the backtest? If the justification shows up after the equity curve, you haven’t found alpha. You’ve found a correlation.

2. The red flags

A few patterns that should trigger an alarm in your research:

Sharpe climbing with every iteration
Hyperparameters tuned on the out-of-sample period
Lookback window chosen through grid search
Holdout set “checked” more than once
Sharpe above 3 on a simple strategy, on a handful of assets
You trade BTC because “it doesn’t work as well on ETH”

Two boxes checked, you’re likely data mining. Three, you definitely are.

3. The validation framework that closes the door

Three tools, no less:

CPCV (Combinatorial Purged Cross-Validation) instead of naive K-fold
Deflated Sharpe Ratio (Bailey and López de Prado), which explicitly adjusts for the number of trials
A holdout set that stays untouched. Truly untouched.

The rule that stings: testing 20 strategies at a 5% threshold is a near-guarantee of one false discovery. If you don’t know how many variants you’ve tested, you’re data mining without knowing it.

Edge comes from the hypothesis, not from the algorithm.

ML is an alpha extraction tool, not an alpha generator. No library, no model, no feature creates an edge from nothing. If the economic thesis isn’t there before the code, the code won’t make it appear.

The discipline that separates the quant from the data miner: pre-registering hypotheses, the way academic research does. Writing down what you’re looking for before you start looking.

It’s uncomfortable. That’s exactly why so few do it.

Live Crypto Data in 15 Lines of Python

Lucas — Fri, 22 May 2026 14:30:43 GMT

You want live crypto data. No account, no key, no cost. Binance gives it away through a WebSocket, and the whole thing fits in 15 lines of Python.

Here is the minimum that works.

This is a quick tutorial. If you want the full stack (ingestion, storage, time travel, alt-data, meteo, satellite images…), QuantLake is live: https://www.quantreo.com/quantlake-program/

1. What you need

One library.

pip install websockets

That’s it. No Binance account. No API key. Public market streams are open.

2. The connection

Binance exposes one URL per stream. Format is simple:

wss://stream.binance.com:9443/ws/@

symbol in lowercase (btcusdt, ethusdt), stream is what you want to receive (trade, kline_1m, depth, aggTrade).

Open the socket, read messages, parse JSON. Done.

import json
import asyncio
import websockets

async def stream_trades(symbol="btcusdt"):
    url = f"wss://stream.binance.com:9443/ws/{symbol}@trade"
    async with websockets.connect(url) as ws:
        async for msg in ws:
            trade = json.loads(msg)
            price = float(trade["p"])
            qty   = float(trade["q"])
            ts    = trade["T"]
            print(f"{ts}  {price:.2f}  {qty:.4f}")

asyncio.run(stream_trades())

Run it. Every trade on BTC/USDT prints in your terminal, the moment it happens.

3. What the fields mean

The JSON payload for @trade looks like this:

{
  "e": "trade",
  "E": 1716384000000,
  "s": "BTCUSDT",
  "p": "67432.10",
  "q": "0.00125",
  "T": 1716383999998,
  "m": false
}

4. Want 1-minute bars instead?

Same code, one line changes. Replace @trade with @kline_1m:

url = f”wss://stream.binance.com:9443/ws/{symbol}@kline_1m”

Binance pushes the current bar on every update, and marks it closed when it’s done. The field k.x is true only on the final tick of the bar. That’s your signal to persist it.`

kline = json.loads(msg)["k"]
if kline["x"]:
    print("closed bar:", kline["t"], kline["o"], kline["h"], kline["l"], kline["c"])

Other useful streams, same pattern: @aggTrade for aggregated trades, @depth for order book diffs, @kline_5m, @kline_1h, etc.

5. The practical rule

This script will run, and it will fail. Sockets disconnect. Binance forces a reset every 24 hours, your network drops, your laptop sleeps. The 15-line version is for learning, not for production.

The real stack needs three things on top: auto-reconnect, persistence to a proper database, and a schema that doesn’t get corrupted on every restart.

Open the socket today. Watch the data come in. Then think about where it goes next.

90% of retail quants don't have a database. Just files.

Lucas — Fri, 15 May 2026 14:30:59 GMT

Here’s the setup nobody questions. You download data and you save it as a .csv, maybe a .parquet if you’re “serious.” Then you add a folder, then ten more. That’s not a database, that’s a drawer you throw things into.

And it breaks your research in three ways.

1. No history

Every update overwrites the last one, so your history is whatever your last script run decided it would be. If your data provider silently revised last month’s values, you’ll never know. And the exact backtest you ran in March? You can’t replay it. That data is already gone.

2. Writes aren’t atomic

Your update script crashes halfway through, and half the data is written while the other half isn't. So you rerun it, and now you've got duplicates. Your base is quietly corrupted, and you won't find out until it's already in your results.

3. No clean incremental updates

Adding today’s data means rewriting the whole file, or bolting on a fragile append that has no idea what’s already there. New rows, existing rows, your files can’t tell the difference, so every update is a small gamble.

A file is not a database.

The fix isn’t expensive. It costs €0. It’s called Delta Lake: a transactional layer on top of Parquet. ACID writes, time travel, clean incremental updates, the things you actually need.

Next week, I’ll show you how to feed it: crypto and CFDs, live, for €0. That’s QuantLake. More in a few days.

The Clip Effect: Is Your Backtest Real, or Just Lucky?

Lucas — Fri, 08 May 2026 14:31:02 GMT

You ran a grid search. Lookback = 20 gives you a Sharpe of 2.1. You smile. You start writing the strategy doc.

Then you check 19 and 21. Both at 0.4.

Welcome to the clip effect.

1. What it actually means

When you optimize a strategy, you scan parameters across a range. If your “best” parameter sits as an isolated peak surrounded by mediocre or losing values, the result is almost certainly noise. A real edge produces a plateau, not a pixel.

A backtest is a sample of one path. With enough parameter combinations, some of them will look great by accident. The clip effect is the visual signature of that accident.

The vocabulary varies in the literature (parameter stability, neighborhood robustness), but the test is the same: does the strategy survive small changes in its inputs?

2. A concrete example

Basic moving average crossover on EUR/USD, daily bars, 2015 to 2024. Fast MA fixed at 5, slow MA varied from 18 to 22. Two strategies, both peaking at lookback = 20.

Same optimum. Opposite stories.

The red line is the trap. Sharpe 2.07 at lookback 20, but 0.42 at 19 and 0.38 at 21. If you stop at “Sharpe 2 on lookback 20”, you have learned nothing about the market. You have learned that a specific number aligned with a specific sequence of trades. Move one step left or right, the alpha is gone.

The blue line is what real structure looks like. Lower peak (1.78), but 1.61 and 1.65 right next to it. The signal exists before, at, and after the optimum.

Compare the peaks in isolation, the red strategy wins. Compare the surfaces, the choice is obvious.

3. Going 2D: the heatmap test

One parameter is rarely enough. Most strategies have at least two: a feature window and a target horizon, two moving averages, a volatility filter and a signal threshold. The proper version of the clip test is a 2D heatmap of Sharpe across the parameter grid.

What you want to see: a connected zone of green, smooth gradients, the optimum sitting inside a region of similar values.

What kills the strategy: scattered green pixels in a sea of red, or a single bright square with nothing around it.

In 2D, the eye does the work. With five parameters and ten possible slices, you need a quantitative metric: average Sharpe in the ±1 neighborhood, peak-to-mean ratio, or fraction of the grid above a threshold. Visualization is a starting point, not the test itself.

4. The practical rule

Before you trust any optimum, ask one question:

Does Sharpe degrade by more than 30 to 40% when I move ±1 on each parameter?

If yes, drop the strategy. If no, you have passed the first filter, not the last one.

The clip test will not save you from every form of overfitting. Scanning 50 parameter pairs and selecting the best plateau is still cherry picking, just better dressed. But this is the cheapest, most visual robustness check you can run, and it kills the majority of bad backtests in seconds.

Run it on every strategy you build. You will discard more than you keep.

That is the point.

→ Speaking of robustness, the next bottleneck most retail quants hit is data. Next week I am opening up Quantlake, the program I have been building on creating quant database for 0$: weather, crypto alt-data, CFD, spot markets. Stay tuned!

Building a Clean Research Environment with Oryon

Lucas — Fri, 01 May 2026 14:30:39 GMT

Exploratory data analysis in quant often becomes messy much faster than expected.

Features are computed in one place, targets in another, and then everything has to be aligned manually before any real analysis can begin. That usually means extra joins, shifting logic, duplicated code, and a higher risk of silent mistakes.

The issue is not computing features or targets. The issue is creating a research dataset where both live together cleanly and can be analyzed immediately.

1. The Oryon Approach

With Oryon, the process stays simple. You build a FeaturePipeline, a TargetPipeline, run both on the same market data, and join the outputs into a single dataframe.

from oryon.datasets import load_sample_bars
from oryon import FeaturePipeline
from oryon.features import Sma, ParkinsonVolatility, Correlation, ShannonEntropy, Adf
from oryon.scalers import RollingZScore
from oryon.adapters import run_features_pipeline_pandas


# Import sample data (OHLCV bars)
df = load_sample_bars()

# Create the features list
features_list = [
    Correlation(inputs=["close", "volume"], window=30, outputs=["close_volume_corr_30"]),
    ParkinsonVolatility(inputs=["high","low"], window=50, outputs=["close_pvol_50"]),
    Adf(inputs=["close"], window=100, outputs=["close_adf_100", "close_adf_pval_100"])
]

# Create the scalers list to apply z-score to all the wanted features
scalers_list = [RollingZScore(inputs=[col], window=2000, outputs=[f"z_{col}"]) for col in [
    "close_volume_corr_30", "close_pvol_50", "close_adf_pval_100"
]]

# Combine both list
features_list.extend(scalers_list)

# Create the pipeline object (that can run to on live trading)
pipe = FeaturePipeline(features_list, input_columns=["close", "high", "low", "volume"])

# Run the pipeline on the sample data
df_features = run_features_pipeline_pandas(pipe, df)

from oryon import TargetPipeline
from oryon.adapters import run_targets_pipeline_pandas
from oryon.targets import FutureReturn, FutureLinearSlope

# Essential for the FutureLinearSlope (x=time, y=price)
df["constant_time"] = [i for i in range(len(df))] # Essential for the FutureLinearSlope (x=time, y=price)

# Create the targets list
targets_list = [
    FutureLinearSlope(inputs=["constant_time", "close"], horizon=20, outputs=["future_slope_20", "r2_future_slope_20"]),
    FutureLinearSlope(inputs=["constant_time", "close"], horizon=50, outputs=["future_slope_50", "r2_future_slope_50"]),
    FutureReturn(inputs=["close"], horizon=10, outputs=["future_returns_2"])
]

# Create the target pipeline object
target_pipe = TargetPipeline(targets_list, input_columns=["constant_time", "close"])

# Run the pipeline
df_targets = run_targets_pipeline_pandas(target_pipe, df)

# Ready-to-use dataframe
df_research = df_features.join(df_targets)

The result is a single research table where engineered features and forward-looking targets are already aligned and ready to explore.

3. Why This Matters

Once everything is in the same dataframe, EDA becomes much easier.

You can immediately start testing simple relationships between inputs and outputs, whether through correlation, mutual information, or more advanced selection methods. Instead of spending time preparing the dataset, you can focus on extracting useful structure from it.

That is the real value of a clean research environment: less plumbing, more analysis.

If you want to take this further and connect feature pipelines with AI agents, you can take a look at AI Trading Lab.

Oryon is now available in beta.

If you want to take a closer look at the library and its design, you can explore it directly on GitHub.

And if you find it useful, consider adding a star. it helps more than it seems.

Standardizing Features Without Breaking Your Pipeline

Lucas — Fri, 24 Apr 2026 14:31:05 GMT

In quantitative trading, standardizing features is almost a reflex.

It helps stabilize distributions, improves model convergence, and makes signals comparable across time. Whether you work with returns, volatility, or engineered indicators, scaling is everywhere.

In practice, most pipelines rely on StandardScaler from scikit-learn.

That is perfectly acceptable in research. But as soon as a pipeline becomes more serious, this setup starts to create unnecessary friction.

1. The Real Issue Is Not the Math

The problem is not standardization itself. The problem is fragmentation.

A typical quant stack quickly becomes a mix of unrelated components. Features may come from one library, transformations from another, and scaling from yet another. You might compute signals with Oryon, apply an operator from SciPy, then standardize everything with scikit-learn.

Each piece works in isolation. But the overall system becomes harder to reason about.

You end up mixing different APIs, different update models, and different assumptions about how computations should behave through time. In a notebook, this may still feel manageable. In a real feature pipeline, it becomes cumbersome.

2. A More Coherent Design

The idea behind the scalers added in Oryon is straightforward: scaling should behave exactly like any other feature. Instead of treating it as a separate preprocessing step, it becomes part of the pipeline itself, built on the same assumptions, with the same interface.

This means:

same update logic
same state management
same behavior in research and live

A rolling standard deviation, for instance, is no longer something you compute “outside” your pipeline. It is a feature, updated incrementally.

3. From Raw Prices to Standardized Signal

Below is a pipeline that computes several market features and standardizes them directly inside Oryon using a rolling z-score.

from oryon.datasets import load_sample_bars
from oryon import FeaturePipeline
from oryon.features import Sma, ParkinsonVolatility, Correlation, ShannonEntropy, Adf
from oryon.scalers import RollingZScore
from oryon.adapters import run_features_pipeline_pandas


# Import sample data (OHLCV bars)
df = load_sample_bars()

# Create the features list
features_list = [
    Correlation(inputs=["close", "volume"], window=30, outputs=["close_volume_corr_30"]),
    ParkinsonVolatility(inputs=["high","low"], window=50, outputs=["close_pvol_50"]),
    Adf(inputs=["close"], window=100, outputs=["close_adf_100", "close_adf_pval_100"])
]

# Create the scalers list to apply z-score to all the wanted features
scalers_list = [RollingZScore(inputs=[col], window=2000, outputs=[f"z_{col}"]) for col in [
    "close_volume_corr_30", "close_pvol_50", "close_adf_pval_100"
]]

# Combine both list
features_list.extend(scalers_list)

# Create the pipeline object (that can run to on live trading)
pipe = FeaturePipeline(features_list, input_columns=["close", "high", "low", "volume"])

# Run the pipeline on the sample data
df_features = run_features_pipeline_pandas(pipe, df)

What matters here is not only the rolling normalization itself. It is the fact that the scaler is defined exactly like any other component in the pipeline.

There is no handoff to another library, no separate preprocessing object to maintain, and no conceptual break between feature computation and feature standardization.

The chart says it all: before scaling, the variables do not really live in the same space. After scaling, they do.

4. Why This Matters

A quant pipeline should not feel like an assembly of disconnected utilities.

As soon as features, operators, and scalers obey different conventions, every extension becomes slightly more fragile. The code becomes harder to read, harder to debug, and harder to move from research to something more robust.

Keeping standardization inside Oryon solves a very practical problem: it removes one more external dependency from the feature workflow and replaces it with a component that behaves exactly like the rest of the system.

The gain is not only speed, even if the implementation is also significantly faster. The deeper gain is consistency.

If you are interested in building cleaner research pipelines, with feature engineering components that are designed to work together rather than coexist by accident, Oryon is exactly the kind of framework worth exploring.

Oryon is now available in beta.

If you want to take a closer look at the library and its design, you can explore it directly on GitHub.

And if you find it useful, consider adding a star. it helps more than it seems.

Build once. Run live.

Lucas — Fri, 17 Apr 2026 14:30:51 GMT

Most pipelines break when they leave research.

Not because they are incorrect, but because they were never designed to run incrementally. They assume full history, they assume recomputation, and they assume a static environment where everything can be rebuilt at each step.

Live systems operate under very different constraints. Data arrives sequentially, latency accumulates, and every inefficiency becomes persistent. In that context, the feature pipeline is no longer a preprocessing step. It becomes part of the system itself.

1. A pipeline that works in both worlds

With Oryon, the same pipeline object can be used across both research and production environments without modification.

It can be applied on full historical datasets in batch mode, and it can continue updating in a streaming setting using the exact same logic. There is no distinction between the two modes, no hidden adaptation layer, and no change in behavior.

This is not a convenience feature. It is a design constraint. The pipeline is built to behave identically regardless of how it is fed.

To keep the example simple, we can start from one of the sample datasets available directly in Oryon.

from oryon.datasets import load_sample_bars

df = load_sample_bars()
historical_bars, live_bars = df.iloc[:-10, :], df.iloc[-10:, :]

For the rest of this example, the idea is straightforward: we use the historical portion to initialize and validate the pipeline in research conditions, then we use the last ten bars to simulate a live stream and observe the updates one step at a time.

2. Defining a feature pipeline

We start by defining a set of features as a structured pipeline. Each component is stateful and designed to update incrementally.

from oryon import FeaturePipeline
from oryon.features import Sma, ParkinsonVolatility, Correlation, ShannonEntropy, Adf


features_list = [
    Sma(inputs=["close"], window=10, outputs=["close_sma_10"]),
    Sma(inputs=["close"], window=50, outputs=["close_sma_50"]),
    ParkinsonVolatility(inputs=["high", "low"], window=20, outputs=["close_pvol_20"]),
    ParkinsonVolatility(inputs=["high", "low"], window=100, outputs=["close_pvol_100"]),
    Correlation(inputs=["close", "volume"], window=30, outputs=["close_volume_corr_30"]),
    ShannonEntropy(inputs=["close"], window=50, outputs=["close_entropy_50"]),
    Adf(inputs=["close"], window=100, outputs=["close_adf_100", "close_adf_pval_100"])
]

pipe = FeaturePipeline(features_list, input_columns=["high", "low", "close", "volume"])

This is not a collection of independent indicators. It is a coherent system that maintains its own internal state and evolves as new data arrives.

3. Applying the pipeline in research

Once the pipeline is defined, we can apply it on historical data to inspect the outputs in research conditions.

from oryon.adapters import run_features_pipeline_pandas
df_features = run_features_pipeline_pandas(pipe, historical_bars)

Under the hood, the pipeline is designed to consume data as a list of lists rather than as a DataFrame directly. This may look slightly less convenient at first, but it comes from the same architectural choice that makes the system efficient in live trading: the update path is built around a minimal, streaming-oriented input structure.

In practice, this means the research interface stays aligned with the production one. Instead of introducing a separate batch-only abstraction, Oryon keeps the same core design and provides adapters to make research workflows easier to use.

That is exactly what run_features_pipeline_pandas does here. It converts the DataFrame into the internal list-of-lists format and applies the pipeline sequentially, so the historical execution remains fully consistent with the live update model.

The same idea is also available for Polars, which makes it easy to keep the same workflow regardless of the dataframe engine used upstream.

4. Switching to streaming

Once the historical pass is complete, the same pipeline can continue operating in streaming mode.

for bar in live_bars.itertuples():
  input_values = [bar.high, bar.low, bar.close, bar.volume] # pipe needs a list for the inputs
  output = pipe.update(input_values)
  print(output)

# [102.03397795639191, 102.41105040279119, 0.001373007598669601, 0.0018930289783861594, -0.21026851276116704, 0.948176332443655, -1.800248586774662, 0.380344996295653]
# ...
# [102.03397795639194, 102.25925749878635, 0.001373007598669601, 0.0018768715423311845, -0.1494113110563763, 0.8844888723968832, -4.77931213498186, 6.212151160126344e-05]

5. Why this matters

In most trading systems, the pipeline is effectively implemented twice. Once in research, often using batch-oriented tools, and once again in production, using a different stack optimized for streaming.

This duplication introduces divergence. Subtle differences appear, edge cases behave differently, and the system becomes harder to validate as a whole.

Oryon removes that duplication entirely. The pipeline is defined once, validated once, and then used as-is in production.

If you want to take this further and connect feature pipelines with AI agents, you can take a look at AI Trading Lab.

Oryon is now available in beta.

If you want to take a closer look at the library and its design, you can explore it directly on GitHub.

And if you find it useful, consider adding a star. it helps more than it seems.

From Quantreo to Oryon

Lucas — Fri, 10 Apr 2026 14:30:48 GMT

Most trading systems do not lose performance at execution time. They lose it upstream, in the way features are computed, updated, and maintained. In research, full recomputation is acceptable. In production, it becomes a structural inefficiency.

Every unnecessary pass over the data, every oversized state, every hidden dependency in the pipeline introduces latency that compounds over time. Not always visible, but always present.

Oryon was designed to remove that layer entirely.

Not by optimizing it. By redefining it.

1. Rethinking Feature Computation

Oryon is the beta 2 of Quantreo, but more importantly, it introduces a different abstraction. Features are no longer batch transformations. They are stateful, incremental objects.

Each update processes only the newest observation, maintains only the strictly required state, and returns the current value immediately.

from oryon.features import Sma

sma = Sma(["close"], window=3, outputs=["sma_3"])

sma.update([100.0])  # → [nan]
sma.update([101.0])  # → [nan]
sma.update([102.0])  # → [101.0]
sma.update([103.0])  # → [102.0]

The important part is not the indicator itself. A 3-period moving average stores exactly three values. No history, no recomputation, no hidden overhead.

This same design extends consistently across the library:

entropy-based features
ADF-style statistical tests
multiple volatility estimators
rolling and dynamic correlations
operators, transformations, and scalers

All built around the same constraint:

→ Compute once, update incrementally, stay minimal in memory.

2. Performance where it actually matters

Oryon is built on a Rust-backed core, exposed through a minimal Python interface.

This is not an implementation detail. It is a design requirement.

Under production conditions:

fast paths run below 1 microsecond
the slowest feature update remains under 19 microseconds
a full feature state can be refreshed in less than 1 millisecond

This is where the difference becomes tangible.

Not in isolated benchmarks, but in continuous pipelines running live, where every microsecond accumulates.

3. One pipeline, from research to production

The same abstraction is used everywhere. The same objects.
The same update logic. The same behavior. There is no distinction between a research pipeline and a production pipeline. No reimplementation, no hidden mismatch, no translation layer.

What you build is what you run. This removes an entire class of errors that typically appears when moving from notebooks to live systems.

And it forces a discipline. Features must be correct, incremental, and production-ready from the start.

⇒ In the next newsletter, we’ll build a research pipeline that is natively streaming-ready and can be deployed as-is in production.

Oryon is now available in beta.

If you want to take a closer look at the library and its design, you can explore it directly on GitHub.

And if you find it useful, consider adding a star. it helps more than it seems.

Why intra-bar features matter

Lucas — Fri, 03 Apr 2026 14:30:46 GMT

When we work with hourly or daily bars, we usually keep only OHLCV.

But two hourly bars with the same OHLCV can hide very different internal behaviors.

One may trend smoothly.

Another may be noisy, mean-reverting, or highly asymmetric inside the bar.

That is exactly where intra-bar features become useful.

Instead of treating each bar as a simple summary, we can rebuild it from a lower timeframe and compute additional metrics inside the bar, such as slope, skewness, kurtosis, or other microstructural statistics. Quantreo’s bar metrics are designed for that workflow through the `additional_metrics` mechanism, which adds custom columns computed from the data inside each bar.

1. The Core Idea

The idea is simple.

If you trade on a higher timeframe, for example 1H, you can use a lower timeframe, for example 1 minute, to reconstruct each hourly bar and compute what happened inside it.

This gives you extra information that standard OHLCV cannot capture.

In Quantreo, these bar-level metrics are meant to enrich traditional bars with statistical and microstructural information extracted from the data inside each bar.

The documentation explicitly mentions examples such as skewness, kurtosis, Hurst exponent, volume-profile-type features, and other custom indicators.

2. Why This Matters

A bar is a compression of information.

And sometimes, that compression removes exactly the behavior you care about.

With intra-bar metrics, you can start distinguishing:

- smooth vs noisy bars
- directional vs unstable bars
- balanced vs asymmetric internal price action
- ordinary bars vs bars with unusual internal structure

This is often where extra signal can appear, especially when many strategies rely only on the final OHLCV snapshot.

3. Examples Of Useful Intra-Bar Features

Some useful examples are:

- linear slope inside the bar
- skewness
- kurtosis
- volume concentration features
- custom metrics derived from price or price-volume distributions

Quantreo already documents built-in bar metrics such as `skewness`, `kurtosis`, `volume_profile_features`, and `max_traded_volume`, and also allows fully custom metrics through `additional_metrics`.

4. The Practical Workflow

A clean workflow looks like this:

- choose your trading timeframe, for example 1H
- take a lower timeframe or raw tick data
- rebuild the higher-timeframe bars
- compute intra-bar metrics during aggregation
- use these new columns as additional features in your research pipeline

In Quantreo, each metric is attached through an `additional_metrics` tuple that specifies the function, the input source (`price`, `volume`, or `price_volume`), and the output column names.

5. Why I Like This Approach

I like this approach because it adds information without forcing you to redesign your whole pipeline.

You still work with bars.
You still keep a structured dataframe.
But your bars become much richer.

So instead of changing the whole strategy framework, you improve the quality of the representation.

6. Performance Matters Too

This kind of idea is only useful if it remains practical at scale.

Quantreo recommends using Numba for custom metrics because bar building can involve millions of ticks, and the documentation notes that pure Python or Pandas implementations can be 20x to 100x slower than Numba-compiled functions.

7. Who This Is For

This is especially useful for people who:

- already use bar-based workflows
- want more signal without jumping directly to raw tick models
- care about microstructure, but still want a clean research pipeline
- want to enrich higher-timeframe features with lower-timeframe behavior

OHLCV is often a good starting point, but it is still a compression.

And sometimes, alpha lives inside what that compression removes.

That is why intra-bar features are so interesting.

They let you keep the simplicity of bar-based research while injecting part of the information hidden inside each bar.

👉 Want to learn from A to Z how to build and use intra-bar features in a real quantitative research workflow? That’s exactly what the ML4Trading Program is designed for.

Why I always try to simplify my alpha formulas

Lucas — Fri, 27 Mar 2026 15:30:45 GMT

When I work on an alpha, I almost always try to simplify the formula as much as possible.

Not because simple models are always better.
Not because complexity is useless.

But because every unnecessary degree of freedom increases the risk of overfitting.

In practice, I often prefer very simple structures such as:

and sometimes a square or cubic term when it is clearly justified.

Of course, the goal is not to destroy the alpha just to end up with a pretty equation.

The real objective is to find the sweet spot between simplicity and performance.

1. Why Simplicity Matters

A formula that is too flexible can fit noise very easily.

The more parameters, interactions, nonlinearities, and transformations you add, the easier it becomes to explain the past.

But explaining the past is not the same as capturing a repeatable market effect.

A simpler formula usually gives you three major benefits:

- lower overfitting risk
- easier interpretation
- easier monitoring in live trading

And in many cases, that last point matters more than people think.

If an alpha starts degrading, a simple structure is much easier to diagnose than a black box.

2. Simple Does Not Mean Naive

This is where many people get the message wrong.

The goal is not to force every alpha into a linear model just because it looks cleaner.
The goal is to remove unnecessary complexity while preserving the core market logic.

Sometimes the signal is already captured well by a linear combination (with or without linear expansion).

And sometimes the real structure is more complex, and oversimplifying it would simply kill the alpha.

So the real question is not: “Can I make it simple?”

It is: “How much complexity is truly necessary to preserve the signal?”

3. The Types Of Formulas I Prefer

In most cases, I start by testing whether the alpha can be expressed through a very compact structure.

Typically:

- linear terms
- maybe one or two polynomial terms
- very limited interactions
- very few input variables

That gives me a model that is easier to understand, easier to stress test, and much harder to overfit than a more flexible alternative.

Only when this simplification fails do I consider a machine learning model.

And even then, I usually keep that for situations where:

- the number of input variables is small
- the relationship looks real but not easily compressible into a simple equation
- the added complexity is justified by out-of-sample behavior, not just in-sample fit

4. A Case Where Simplification Helps

Imagine a signal that depends mostly on:

- short-term return
- recent volatility
- distance from a local mean

A complex model may build a tangled nonlinear response with many interactions.

But in practice, you may find that something as simple as:

already captures most of the usable signal.

Will it miss some edge cases? Yes.

But if the simplified version keeps most of the predictive power while being much more stable, that is often the better trade-off.

5. A Case Where Simplification Fails

Now imagine a signal where the effect only appears under a specific conditional structure.

For example:

- feature A matters only when volatility is high
- feature B becomes relevant only when liquidity is thin
- the relationship changes sharply across regimes

In that case, forcing everything into one clean linear equation may remove exactly what makes the alpha work.

This is where a slightly more flexible model, or even a small machine learning model, can be justified.

But the key point is that complexity should come from necessity, not from habit.

6. The Real Objective: Find The Sweet Spot

This is how I think about it:

- too simple, and you may throw away real structure
- too complex, and you may fit noise
- somewhere in between, there is a level of complexity that preserves the alpha while keeping it robust

That is the sweet spot.

And in my experience, most people start too far on the complex side.

7. A Practical Checklist I Use

Before keeping a more complex formula, I usually ask:

Does the extra complexity improve out-of-sample results?
Does it survive across periods and regimes?
Can I explain why this extra term should exist economically or behaviorally?
Can I monitor it properly in live trading?
If I simplify it, how much signal do I really lose?
If complexity does not clearly earn its place, I usually remove it.

A good alpha is not the most sophisticated equation you can write.

It is a signal that keeps working when reality stops being friendly.

That is why I usually start with the simplest valid structure I can build, and only add complexity when the data clearly proves it is needed.

👉 If you want to go deeper into the strategy design process and use AI to generate, structure, and refine trading ideas faster, that’s exactly what AI Trading Lab is built for.

Finding Alphas. A solid map of alpha research.

Lucas — Fri, 20 Mar 2026 15:30:58 GMT

I recently read Finding Alphas, edited by Igor Tulchinsky and contributors from WorldQuant.

This is not a deep technical monograph, and it is not a book that will hand you a ready-to-trade edge. It is better understood as a structured overview of how professional alpha research is framed: idea generation, data selection, backtesting, robustness, turnover, correlation, bias control, risk, and portfolio thinking.

That is also its main strength. The book gives a broad and practical map of the research process, with many short chapters written from the perspective of practitioners rather than academics.

1. What The Book Does Well

Its biggest strength is breadth with structure.

The book covers the full alpha research pipeline rather than obsessing over one narrow topic. The strongest part is clearly the “Design and Evaluation” section, where the discussion moves through alpha design, data, turnover, correlation, overfitting, biases, robustness, risk factors, drawdowns, and automated search. For someone building research habits, this is much more valuable than yet another book full of isolated signals.

Another strong point is that the book repeatedly brings the reader back to the same core reality: alpha research is not only about finding a signal, but about testing whether it survives costs, bias, crowding, instability, and implementation constraints. That emphasis appears throughout the book, especially in the chapters on turnover, backtest overfitting, controlling biases, robustness, and risk.

The tone is also pragmatic. The objective is not to impress with theory, but to give a framework for thinking like a systematic researcher.

2. Where The Book Is Weaker

The main limitation is also obvious: this is a WorldQuant-style book.

That makes it useful, but it also gives it a specific lens. You get many high-level principles, many conceptual tools, and many short examples, but not the level of implementation detail needed to go from “good research mindset” to a fully deployable institutional strategy.

In other words, the book is strong on framework, weaker on depth.

It also reads more like a collection of essays than a single tightly argued book. That makes it accessible, but it also means some chapters feel more useful than others, and the overall depth is uneven.

Finally, if you already have a strong background in robust backtesting, data engineering, portfolio construction, and production constraints, a fair part of the material will feel familiar.

3. The Most Useful Chapters

If you read only a subset, I would prioritize these:

- Chapter 4, Alpha Design
- Chapter 6, Data and Alpha Design
- Chapter 7, Turnover
- Chapter 9, Backtest. Signal or Overfitting?
- Chapter 10, Controlling Biases
- Chapter 12, Techniques for Improving the Robustness of Alphas
- Chapter 14, Risk and Drawdowns
- Chapter 15, Alphas from Automated Search

Together, these chapters capture the real value of the book: not “a list of alpha ideas,” but a way of thinking about signal design under real-world constraints.

4. Who Should Read It

This book is especially useful for:

- beginners who want a structured map of alpha research
- intermediate quants who already test ideas but need a cleaner framework
- researchers who think too much about signals and not enough about implementation risk

For very advanced readers, I would see it more as a compact refresher than a game-changing text.

5. Final Take

My overall view is simple.

Finding Alphas is a good professional overview of the alpha research process. Its value is not that it reveals secret signals. Its value is that it helps you think more clearly about how alphas are actually designed, evaluated, stress-tested, and organized inside a systematic research workflow.

So no, this is not the one book that will teach you how to print money.

But yes, it is a book that can help you build a much better research mindset. And in quant, that is often more valuable than one extra idea.

👉 If you want to go deeper into the strategy design process and use AI to generate, structure, and refine trading ideas faster, that’s exactly what AI Trading Lab is built for.

Quantreo 0.2.0. We rebuild of the foundations.

Lucas — Fri, 13 Mar 2026 15:31:18 GMT

Quantreo 0.2.0 Beta is not a simple update. It is a complete rebuild of the foundations behind the library.
The goal was clear: create a stronger base for research, simplify the creation of new features, raise engineering standards, and prepare the project for what comes next.
And yes, the project name will also change soon. More on that at the right time.

1. A New Foundation

A new version, but more importantly, a new foundation.

Quantreo 0.2.0 Beta is not about adding a few new features on top of the existing codebase. It is about rebuilding the foundations properly, so the project can grow with more structure, more robustness, and a much higher standard of development.

This release also opens the door to a broader evolution of the project, including a future name change.

2. Some News

Here are three important changes behind this new beta.

First, the project is now opening its contributing process. The goal is not just to make the library public, but to make it easier to extend, cleaner to navigate, and stronger as an open quantitative research project.

Second, a new pipeline has been introduced to automate and simplify the creation of many features. This makes feature engineering more scalable, more consistent, and much easier to maintain as the library grows.

Third, the library now includes more tests and more safeguards across the codebase. The objective is clear: push the project toward a much higher level of reliability and engineering quality, with standards inspired by what serious research environments require.

And this is only part of the work.

Many other improvements and additions are already on the way. Some are focused on usability, others on architecture, reliability, and the long-term direction of the project.

Quantreo 0.2.0 Beta is not the end of a rebuild. It is the beginning of a much stronger version of the library.

The new version is not available yet, however, I will share more about the release date soon.

👉 If you want to go deeper into each step of the strategy building process, with real-life projects, ready-to-use templates, and 1:1 mentoring, that’s exactly what the Alpha Quant Program is for.

The 7 Mistakes I See Every Week (ML)

Lucas — Fri, 06 Mar 2026 15:30:53 GMT

Most beginners treat machine learning like a magic button.

They train a model, get a prediction, and convert it straight into a buy or sell. The backtest looks clean. Then live trading happens, and the edge evaporates.

Here are the seven mistakes behind most ML misuse in trading, and what experienced quants do instead.

1) Predicting a point instead of modeling uncertainty

A single number feels precise. It is usually fragile.

In markets, what matters is not “the next return.” It is the distribution of outcomes given today’s context.

If you cannot answer “how uncertain is this?” you do not have a signal. You have a guess. (To go further on this point, you should check conformal forecasting)

2) Optimizing accuracy instead of optimizing decisions

High accuracy can still lose money.

Why? Because trading is not a classification contest. Costs, slippage, tail losses, and error clustering dominate. A model that is right slightly more often can still be wrong exactly when it hurts.

What you actually want is a decision rule that survives reality. That means evaluating ML with trading-oriented metrics and stress tests, not just standard ML scores.

3) Thinking “ML model → signal”

This shortcut is the biggest conceptual error.

A model output is not a trade. It is an input to a process.

A robust pipeline looks like this:
Problem → ML model → alpha → position sizing → strategy rules → portfolio

If you skip the middle, you are not trading a model. You are trading noise with confidence.

4) Treating the model as the edge

ML does not create an edge out of thin air. It amplifies what you feed it.

If your data has no stable structure, the model will still fit something. That “something” is often a backtest-only pattern.

In practice, edge comes from market structure, behavior, flows, constraints, and regimes. ML helps you measure or express that edge. It does not replace it.

5) Getting the target wrong

Most ML blowups start here.

A fancy architecture cannot fix a target that is unstable, ambiguous, or mismatched to execution.

Common failures:

horizons that do not match your average holding period
labels that are too close to price noise
targets that drift as volatility regimes change

A clean target is not “more ML.” It is better research.

6) Validation that leaks reality

Random splits, improper scaling, peeking at the future, tuning on the same regime you test on. This is the silent killer.

In trading, the only validation that matters is forward-looking, regime-aware evaluation.

If your process does not answer “does this survive different market conditions?” your performance is a story, not evidence.

7) Ignoring what happens when the model is wrong

This is where senior quants spend most of their time.

The key question is not average performance. It is conditional behavior:

When the model is wrong, how wrong is it?
Do errors cluster?
Does it fail in specific regimes?
What happens to the portfolio in those periods?

A model that is “good on average” but collapses in a few scenarios is not robust. It is a hidden tail bet.

The practical takeaway

Machine learning is not a trading strategy. It is a tool for conditional estimation under uncertainty.

If you want to use ML correctly, stop asking: “What will the market do?”
Start asking: “Given what I know now, what is the distribution, how reliable is it, and how should my risk respond?”

That is the difference between a model that looks smart and a system that survives.

👉 If you want to go deeper, build smarter features, understand signal reliability, and master techniques like features selection, features engineering, or feature conditioning, that’s exactly what we cover in ML4Trading.

Kalman Filter in Trading (2/2)

Lucas — Fri, 27 Feb 2026 15:30:53 GMT

In the first part of this series, we introduced the Kalman filter from a theoretical perspective and discussed why it is a natural tool for online estimation problems.

In this second part, we move from theory to practice and show how a Kalman filter can be used in a real trading context. Not to predict returns, but to control risk dynamically.

The goal is simple: estimate volatility online and transform it into a stable, realistic position sizing rule.

Find all the codes here.

1. What problem are we actually solving?

Volatility is not constant. It changes over time, often abruptly, and usually when it matters the most.

Most traders rely on rolling volatility estimates. These approaches suffer from two major issues:

they react too slowly to regime changes,
they are extremely noisy at short horizons.

In live trading, what we really need is: an online estimate, reasonably smooth, but able to adapt when market conditions change.

This is a risk management problem, not a forecasting one.

2. Volatility as a latent variable

Volatility is not directly observable. What we observe are returns, which are highly noisy at the single-period level.

We therefore model volatility as a latent state variable. More precisely, we work with the log-variance:

This choice has three advantages:

it enforces positivity,
it leads to additive dynamics,
it is compatible with a linear Kalman filter.

3. From returns to an observable proxy

Since volatility cannot be observed directly, we construct a noisy proxy from returns:

Under a conditional Gaussian assumption, this quantity can be interpreted as a noisy observation of the latent log-variance.

This is clearly an approximation. However, our objective is not statistical inference, but stable risk control, which makes this approach both acceptable and effective in practice.

The visual difference already highlights the benefit of filtering.

4. Kalman filtering the log-variance

We model the latent state with a simple random walk:

and the observation equation:

Both noise terms are assumed Gaussian.

The two key parameters of the filter are:

Q, which controls how fast volatility is allowed to evolve,
R, which reflects how noisy the observation log⁡(rt2) is.

In practice, R is kept relatively large, because squared returns are extremely noisy at short horizons.

5. From volatility estimation to position sizing

At this stage, we have an online estimate of volatility:

The key idea is to use this estimate to scale exposure such that portfolio risk remains approximately constant over time.

This leads to the target-volatility sizing rule:

This equation captures the core intuition:

when volatility increases, exposure is reduced,
when volatility decreases, exposure is increased.

Importantly, w_t is not a signal. It does not tell us which direction to trade. It only determines how much risk to take.

6. Practical constraints for live trading

In real trading systems, several safeguards are required.

First, a volatility floor prevents excessive leverage when estimated volatility becomes too small.

Second, exposure is capped to remain within acceptable leverage limits.

Finally, position weights are smoothed to reduce turnover and transaction costs:

This smoothing introduces a small delay, but significantly improves stability in practice.

7. What this approach does (and does not)

This framework works well because it:

stabilizes portfolio risk,
adapts to changing volatility regimes,
is simple, robust, and computationally efficient,
can be combined with any alpha signal.

However, it is important to be clear about its limitations:

it does not predict returns,
it does not perform market timing.

It is a risk controller, not an alpha generator.

Kalman filters are often presented as complex mathematical tools. In practice, their strength lies in their simplicity and flexibility.

Used correctly, they provide a clean and effective way to control risk in environments where volatility is unstable and noisy.

In systematic trading, improving risk control is often more impactful than improving return forecasts.

👉 If you want to go deeper into each step of the strategy building process, with real-life projects, ready-to-use templates, and 1:1 mentoring, that’s exactly what the Alpha Quant Program is for.

Kalman Filter in Trading (1/2)

Lucas — Fri, 20 Feb 2026 15:30:39 GMT

In trading, almost everything you touch is noisy. Prices jump because of microstructure, spreads widen for a few prints, volatility spikes just because your window is short. If you react to every wiggle, you trade randomness. If you smooth too much, you react too late.

The Kalman filter is a clean way to sit in the middle. It estimates a “true signal” behind noisy observations by doing the same two-step loop every time: predict what should happen, then correct using what you observed. The correction is not arbitrary. It is weighted by uncertainty, which is why Kalman feels both smooth and responsive.

1) The problem, in one minute

Markets are messy. Your indicators are messier.

A price series mixes real moves with microstructure noise. A rolling volatility mixes real regime shifts with estimation error.

Same story for any feature built on finite windows. Feed that raw signal into a strategy and you get false triggers, unstable sizing, and lots of noise masquerading as information.

What we want is simple. Keep the information, drop the useless wiggles.

2) The key idea. Two realities

Kalman starts with a clean separation.

What you observe: y_t (noisy)
What you want: x_t (latent state, hidden signal)

You assume:

where v_t is measurement noise.

The “latent state” is not a mystical fair value. It is simply the signal you choose to model: a smoother price level, a latent volatility, a stable trend component, a cleaner spread.

A tiny example (with a simulated “latent signal”)

To make this concrete, we can build a toy signal.

First, we create a latent state x_t. Think of it as the clean underlying level we wish we could observe.
Then we generate what the market actually shows us:
This mimics noisy prints, bid ask bounce, and random micro wiggles.

In a simulation, we can plot both x_t and y_t. This lets us visually check whether the Kalman estimate is actually recovering the hidden signal.

Important: In real trading, we never see x_t. That is the whole point. The filter is valuable precisely because it produces an estimation of x_t, a disciplined estimate of what might be behind y_t.

We do not trade y. We trade an estimate of what is behind y.

3) Kalman is a two-step loop

Kalman is a loop with one question repeated forever:

What did I expect? What did I see? How much should I adjust?

Predict

You start each step with a belief about the hidden signal. Then you move it forward using a simple model.

This is your prior. It is what you believe before seeing the new data point.

t∣t−1 means: estimate at time t using information up to t−1 (before seeing y_t)
My best guess for the hidden level today is yesterday’s filtered estimate.

Update

Then you look at the new observation and measure the surprise:

Finally you adjust toward the observation:

The updated estimate is always “between” the prediction and the observation. The only thing that decides where you land is the gain K_t.

K_t is the trust weight you give to the new observation.

K_t≈1 means “I trust the data”. Fast reaction, little smoothing.
K_t≈0 means “I trust the model”. Slow reaction, strong smoothing.

4) Focus on the Kalman Gain K_t

The gain K_t is the adaptive weight that controls the update. It answers one question:

Do I trust the new observation, or do I trust my current belief?

In the simplest 1D case, the gain is computed from two sources of uncertainty:

Prediction uncertainty (how unsure you are before seeing the new point)
Measurement noise (how noisy the observation is)

Here are the two key equations.

Each step forward adds uncertainty. Q is how much you allow the hidden signal to “wander” between two observations. Bigger Q means “the state can move fast”, so you become more willing to adjust.

The gain compares how uncertain your prediction is versus how noisy the observation is.

Bigger R means noisier data. The gain shrinks. You smooth more.
Bigger Q makes your prediction less certain (through P). The gain grows. You adapt faster.

So the only real knobs you set in practice are Q and R:

R controls how much you distrust the data.
Q controls how flexible the hidden signal is.

In the next newsletter, we will walk through a simple trading example and show how to use the Kalman estimate in a real signal.

Yes, there were a few equations here, but the goal is not to memorize them. The goal is to internalize the intuition: predict, measure the surprise, and adjust by an uncertainty weighted amount.

Quant never predict... They quantify uncertainty!

Lucas — Fri, 13 Feb 2026 15:30:45 GMT

It’s a sentence everyone has heard. It sounds smart. It is often misunderstood.

Many people take it to mean that quants refuse to say anything about the future. Others use it as a kind of intellectual shield, a way to avoid being wrong. Both readings miss the point.

The issue is not prediction itself.
The issue is believing that a single prediction can be treated as truth.

This distinction matters because markets are not difficult just because they are noisy. They are difficult because they punish certainty far more than they punish error.

This is what this newsletter is about. Not repeating a slogan, but explaining what experienced quants actually do when they say they “quantify uncertainty.”

1. The real problem is not prediction, but believing in a prediction

Prediction is not the enemy.

When people say “the market will go up” or “this strategy works”, they are not just making an estimate. They are implicitly choosing a single future and treating it as if it were reliable.

Even when numbers are involved, the mental model is usually the same. One central scenario. One expected outcome. Everything else pushed to the background.

The human brain is very good at constructing a coherent story. It is much worse at holding multiple incompatible futures at the same time.

This is where most mistakes start.
Not because the forecast is wrong, but because its uncertainty is ignored.

A trader who knows they can be wrong by a wide margin will size positions differently, manage risk differently, and survive longer. A trader who believes their prediction will hold tends to discover the tails the hard way.

For a quant, the question is never “what will happen?”.
It is “how wrong can I be, and what happens to my system when I am?”.

2. What experienced quants do instead

An experienced quant replaces a point estimate with a distribution.

This is a critical shift. Not cosmetic. Structural.

A single metric, a Sharpe ratio, a Calmar ratio, a CAGR, always looks clean in isolation. It gives the illusion of precision. But taken alone, it tells you almost nothing about how fragile the result is.

One number cannot tell you whether performance is repeatable or accidental.
It cannot tell you whether the strategy is robust or just lucky once.

This is why mature quant workflows never stop at a single backtest outcome. They generate distributions of outcomes.

Different subsamples. Different start dates. Different regimes. Different perturbations.

What matters is no longer the best result, but the shape of the distribution.

Is performance concentrated around a stable core, or carried by a few extreme runs?
Are drawdowns consistently manageable, or occasionally catastrophic?
Does the strategy degrade gracefully, or collapse when conditions shift?

A strategy with a lower average metric but a tight, well-behaved distribution is often far superior to one with a spectacular headline number driven by lucky randomness.

This is what quantifying uncertainty looks like in practice.
You stop asking “how good is this strategy?” and start asking “how often does this strategy behave acceptably?”

Once you see the distribution, you cannot unsee it.
And from that point on, single-number metrics feel dangerously incomplete.

3. Why this matters in real trading and machine learning

In trading, an edge is not a guarantee.
It is a small statistical bias that only exists under specific conditions.

A machine learning model does not “predict” the market. It outputs a conditional estimate based on past data. Treating that output as a forecast is where most mistakes begin.

What really matters is not average accuracy. It is how the model behaves when it is wrong.

Take a simple example.
Two strategies show the same Sharpe ratio on a backtest. One delivers steady, repeatable performance across many subsamples. The other makes most of its profits in a handful of runs and collapses in the rest.

On paper, they look identical. In reality, one is robust. The other likely driven by luck or highly risky.

This is why experienced quants care more about distributions than point metrics. They want to know how often a strategy behaves acceptably, not how good it looks in the best case.

“A model is not a crystal ball. It is a noisy sensor in an unstable environment.
The goal is not to remove uncertainty. The goal is to build systems that can live with it.”

Quantifying uncertainty is not about being cautious. It is about being realistic.

Markets are not difficult because they are unpredictable. They are difficult because they punish systems that are built around fragile assumptions and single-scenario thinking.

Experienced quants do not avoid prediction out of intellectual modesty. They move past it because they understand that robustness matters more than being right, and that survival comes before precision.

Once you start thinking in distributions, your entire process changes. How you backtest. How you size risk. How you interpret models. How you react when things break, which they inevitably do.

False negatives are better than false positives...

Lucas — Fri, 06 Feb 2026 15:30:34 GMT

In trading, not all mistakes are equal. Machine learning metrics often treat false positives and false negatives as symmetric errors. Markets do not…

A false positive creates a trade where no real opportunity exists. A false negative simply means staying out.

This asymmetry changes how models should be evaluated and how signals should be used in practice.

Understanding this difference is essential when dealing with noisy markets and rare trading opportunities.

1. False positives and false negatives are different by nature

In any decision system, two types of errors can occur.

A false positive happens when a system signals that something is happening, while in reality nothing is happening.
A false negative happens when something is happening, but the system fails to detect it.

These two errors are not interchangeable.

Their impact depends entirely on the context and on what happens after the decision is made.

In some domains, false positives are preferred:

In natural disaster monitoring, issuing a tsunami alert that turns out to be unnecessary is often acceptable compared to missing a real one.
In medical screening, early detection systems may tolerate false alarms to avoid missing critical conditions.

In other domains, false negatives are preferred:

In credit risk decisions, rejecting a good borrower is often less costly than approving a bad one.
In legal and compliance systems, a false accusation can cause irreversible damage, while a missed case can often be reviewed later.

The key point is simple.
The cost of an error is not symmetric, and the preferred type of error depends on the consequences of acting or not acting.

Understanding this distinction is essential before choosing metrics, thresholds, or optimization objectives.

2. In Trading, acting has a cost

The critical difference between false positives and false negatives in trading appears after the signal is generated. A signal is not just a prediction. It is a decision to allocate capital.

When a false positive occurs, the system acts: a position is opened, costs are paid, capital is exposed to randomness, risk accumulates over time. Each false positive adds friction and noise to the strategy. These effects compound.

By contrast, when a false negative occurs, nothing happens: no position is taken, no cost is paid, no drawdown is created. The opportunity is missed, but the system remains stable.

This is why trading systems are fundamentally different from pure prediction systems.
The cost is not attached to being wrong. The cost is attached to acting when you should not.

This is also why being more selective often leads to more robust strategies, even if fewer opportunities are captured.

In trading, stability comes from controlling actions, not from predicting more events.

3. Why evaluation metrics push trading systems in the wrong direction

The problem is not only how trading systems behave, but also how they are evaluated.

Most machine learning metrics are designed for prediction tasks, not for decision systems with asymmetric costs.

Accuracy is the most common example. When events are rare, accuracy is largely driven by true negatives. A model can appear excellent simply by predicting “no signal” most of the time.

This gives a false sense of reliability.

Recall can be more informative, as it measures how many real opportunities are detected.

However, recall ignores what happens when the model is wrong in the other direction.

By construction, increasing recall often increases false positives. In trading, this trade-off is rarely neutral. Optimizing a metric in isolation pushes the system toward more actions, not necessarily toward better decisions.

This is why trading systems must be evaluated with metrics that reflect when capital is exposed, not just how often predictions are correct.

In trading, the objective is not to maximize precision or recall.
It is to trade only when acting is justified, and to act often enough for the strategy to matter.

Bayes in Trading (3/3)

Lucas — Fri, 30 Jan 2026 15:30:32 GMT

In the previous newsletters, we applied the exact Bayes computation for a rare-event trading signal.

Now, instead of adding more math, we will vary the key parameters one by one and observe what really moves the probability that a signal is correct.

The goal is simple: identify which levers matter in practice, and which ones are often overestimated.

For reference, the model parameters are fixed to the same values as in Bayes in Trading (2/3):

The event occurs 2% of the time. P(E)=2%.
When the event is present, the model detects it 90% of the time. P(S∣E)=90%.
When the event is not present, the model still triggers a signal 12% of the time. P(S∣E bar)=12%.

In the following sections, each parameter will be varied one at a time to isolate its effect.

1. False positives dominate everything

This first plot isolates the impact of the false positive rate, while keeping all other parameters fixed. The effect is immediate and severe. Even small increases in P(S∣E bar) lead to a sharp collapse in the probability that a signal is actually correct.

When the event is rare, most observations belong to the “no-event” regime.
As a result, false positives quickly outnumber true signals, even if the model performs well when the event is present.

This is why many strategies fail not because they miss opportunities, but because they trigger too often when nothing is happening.

Reducing false positives is usually far more important than improving accuracy.

2. Event rarity is a structural constraint

In this second experiment, we keep the model behavior fixed and vary only the frequency of the event itself. The detection accuracy remains at 90%, and the false positive rate at 12%, exactly as before. Only the base rate P(E) changed.

The result is unambiguous.

When the event is extremely rare, even a well-behaved model produces signals that are mostly noise. As the event becomes more frequent, the same model suddenly appears far more reliable.

Nothing about the model changed. Only the market context did.

This highlights an important limitation of rare-event strategies.
Below a certain frequency, signal quality is structurally capped, regardless of how good the classifier is.

Sometimes the main problem is not the model, but the rarity of the event being traded.

3. Accuracy helps, but far less than expected

In this final experiment, we vary only the detection accuracy of the model.

The event frequency is fixed at 2%, and the false positive rate remains at 12%, exactly as in the previous sections. Only P(S∣E) is allowed to change.

The improvement is real, but modest. Even large increases in detection accuracy translate into relatively small gains in the probability that a signal is actually correct.
This is a direct consequence of event rarity and persistent noise.

This explains why strategies that aggressively optimize accuracy often fail to deliver meaningful improvements in live trading.

Accuracy is rarely the dominant lever in rare-event strategies.

Conclusion. What to remember

For rare events, accuracy is not the right metric.
Signal quality is driven first by false positives, then by event frequency.
Detection accuracy helps, but far less than most people expect.
A strong model can still produce mostly noise if the base rate is low.
Bayes is not optional. It defines the ceiling of what a signal can achieve.

Before trusting any trading signal, always ask:

How rare is the event?
How often does the model fire when nothing happens?
What is the resulting probability that a signal is actually correct?

If these questions are not answered explicitly, the signal is likely misleading.

Bayes in Trading (2/3)

Lucas — Fri, 23 Jan 2026 15:31:00 GMT

This newsletter is a direct continuation of the previous one.

Last week, we discussed a very common problem in trading and machine learning: predicting a rare event.

We saw that even models with high reported accuracy can be misleading when the event itself occurs infrequently.

Now, we will go through the computation explicitly and show how Bayes quantifies this effect.

1. Restating the problem and identifying the quantities

We consider a binary classification problem applied to trading. A model attempts to detect a specific market pattern that occurs rarely.

We define two events:

E: the pattern is truly present in the market
S: the model triggers a signal

The available information is the following:

When the pattern truly occurs, the model detects it with a 90% reliability.
The pattern occurs 2% of the time.

The model still produces signals even when the pattern is not present with a probability of 12%.

Our goal is not to evaluate the model conditionally,
but to answer the trading-relevant question:

When the model triggers a signal, what is the probability that the pattern is actually real?

This corresponds to computing a posterior probability. In other words, with a prior probability (2% that it can occur), how knowing that the model predict a signal will impact the posterior probability.

2. What information is required and why

At this point, it is important to pause and clarify something.

With only:

the base rate of the event,
and the model’s detection rate when the event occurs,

the problem is not solvable.

To compute the probability that a signal is actually correct, we need three distinct pieces of information:

How often the event occurs in general.
How often the model detects the event when it is truly present.
How often the model produces a signal when the event is not present.

The last quantity is usually overlooked.
Yet it is the one that dominates the result when events are rare.

This is why reported accuracy alone is insufficient in trading applications.

3. Bayes formula

Bayes’ theorem provides a direct relationship between these quantities.

It allows us to invert the question from “how good is the model when the event happens?” to “how likely is the event when the model fires?”

Formally, Bayes’ rule gives:

4. Applying the formula

We now substitute the values from the problem:

The result is simple to interpret. Even with a high detection rate when the pattern is real, and a seemingly reasonable false positive rate, only about 1 signal out of 7 corresponds to a true pattern.

This outcome is not caused by a bad model. It is caused by the combination of:

a low base rate,
and non-negligible false positives.

In the next newsletter, we will stop computing and start reasoning. We will vary the base rate, the false positive rate, and the detection rate, to understand which quantities truly matter, and which ones are often overemphasized.

This is where Bayes becomes a practical trading reflex rather than a formula.