Why I always try to simplify my alpha formulas
Simpler formulas will miss some cases. But in many situations, that is exactly what makes them more robust.
When I work on an alpha, I almost always try to simplify the formula as much as possible.
Not because simple models are always better.
Not because complexity is useless.
But because every unnecessary degree of freedom increases the risk of overfitting.
In practice, I often prefer very simple structures such as:
and sometimes a square or cubic term when it is clearly justified.
Of course, the goal is not to destroy the alpha just to end up with a pretty equation.
The real objective is to find the sweet spot between simplicity and performance.
1. Why Simplicity Matters
A formula that is too flexible can fit noise very easily.
The more parameters, interactions, nonlinearities, and transformations you add, the easier it becomes to explain the past.
But explaining the past is not the same as capturing a repeatable market effect.
A simpler formula usually gives you three major benefits:
- lower overfitting risk
- easier interpretation
- easier monitoring in live trading
And in many cases, that last point matters more than people think.
If an alpha starts degrading, a simple structure is much easier to diagnose than a black box.
2. Simple Does Not Mean Naive
This is where many people get the message wrong.
The goal is not to force every alpha into a linear model just because it looks cleaner.
The goal is to remove unnecessary complexity while preserving the core market logic.
Sometimes the signal is already captured well by a linear combination (with or without linear expansion).
And sometimes the real structure is more complex, and oversimplifying it would simply kill the alpha.
So the real question is not: “Can I make it simple?”
It is: “How much complexity is truly necessary to preserve the signal?”
3. The Types Of Formulas I Prefer
In most cases, I start by testing whether the alpha can be expressed through a very compact structure.
Typically:
- linear terms
- maybe one or two polynomial terms
- very limited interactions
- very few input variables
That gives me a model that is easier to understand, easier to stress test, and much harder to overfit than a more flexible alternative.
Only when this simplification fails do I consider a machine learning model.
And even then, I usually keep that for situations where:
- the number of input variables is small
- the relationship looks real but not easily compressible into a simple equation
- the added complexity is justified by out-of-sample behavior, not just in-sample fit
4. A Case Where Simplification Helps
Imagine a signal that depends mostly on:
- short-term return
- recent volatility
- distance from a local mean
A complex model may build a tangled nonlinear response with many interactions.
But in practice, you may find that something as simple as:
already captures most of the usable signal.
Will it miss some edge cases? Yes.
But if the simplified version keeps most of the predictive power while being much more stable, that is often the better trade-off.
5. A Case Where Simplification Fails
Now imagine a signal where the effect only appears under a specific conditional structure.
For example:
- feature A matters only when volatility is high
- feature B becomes relevant only when liquidity is thin
- the relationship changes sharply across regimes
In that case, forcing everything into one clean linear equation may remove exactly what makes the alpha work.
This is where a slightly more flexible model, or even a small machine learning model, can be justified.
But the key point is that complexity should come from necessity, not from habit.
6. The Real Objective: Find The Sweet Spot
This is how I think about it:
- too simple, and you may throw away real structure
- too complex, and you may fit noise
- somewhere in between, there is a level of complexity that preserves the alpha while keeping it robust
That is the sweet spot.
And in my experience, most people start too far on the complex side.
7. A Practical Checklist I Use
Before keeping a more complex formula, I usually ask:
Does the extra complexity improve out-of-sample results?
Does it survive across periods and regimes?
Can I explain why this extra term should exist economically or behaviorally?
Can I monitor it properly in live trading?
If I simplify it, how much signal do I really lose?
If complexity does not clearly earn its place, I usually remove it.
A good alpha is not the most sophisticated equation you can write.
It is a signal that keeps working when reality stops being friendly.
That is why I usually start with the simplest valid structure I can build, and only add complexity when the data clearly proves it is needed.
👉 If you want to go deeper into the strategy design process and use AI to generate, structure, and refine trading ideas faster, that’s exactly what AI Trading Lab is built for.


