5 tips to Choose the Right Features (Seriously)
How to choose the right inputs for the model, the target, and the strategy.
“The difficulty is not to see what no one has seen before, but to think like no one has thought about what everyone sees.” Arthur Schopenhauer.
Everyone sees the same price, volume, candles, indicators. But not everyone extracts value the same way.
Most traders build models by stacking indicators they’ve heard of: RSI, MACD, Bollinger Bands… Hoping they’ll “just work.” But choosing features isn’t about collecting tools. It’s about aligning them.
In this newsletter, you’ll learn how to choose features with intent.
Not based on trends. Not based on gut feeling. But based on your goal, your target, and your model structure.
Because real edge doesn’t come obviously from having more data.
It comes from thinking differently about the data everyone sees.
Let’s dive in.
1. Alignment with the Target
💡 A directional target (up/down) doesn’t require the same features as a magnitude target (how far) or a volatility target (how noisy).
👉 Example:
If your goal is to predict future volatility, moving averages won’t help much.
You’ll get better results using features like:
Parkinson or Yang-Zhang volatility estimators
Autocorrelation to detect persistence or regime shifts
Market regime indicators (e.g., trending vs. ranging)
🔧 Method:
Ask yourself: “What exactly am I trying to predict?”
(direction, size of move, variability…)Then ask: “What typically precedes that in the data?”
(range expansion, momentum bursts, structural breaks…)
Don’t choose features out of habit. Choose them based on what your target is really about.
2. Alignment with the Horizon
💡 The horizon of your target is a proxy for the reaction speed your model needs to operate at.
👉 Example:
If your target horizon is 5 bars, your model needs short-term signals like:
price slope
micro-volatility
short EMA crossovers
But if your horizon is 200 bars, you should favor long-term features like:
long moving averages
rolling stability metrics
slow-changing volatility measures
🔧 Method:
Define the time window in which your model is expected to act (your target horizon),
then calibrate your feature windows accordingly.
Mixing timeframes is fine, but keep it proportional: one short-term feature won’t distort a long-term model, same the other way around.
3. Alignment with the Model
💡 A feature that works well for a tabular model (like XGBoost or SVM) may be useless — or even counterproductive — in a sequential model (like LSTM or CNN).
👉 Example: A sequence of 20 rolling volatility values adds no value in a tabular setup, but could reveal a strong pattern when processed by a 1D CNN.
🔧 Method:
For 2D/tabular models, focus on summarized features like
volatility_30d
,hurst
, or regime-based scores.For 3D/sequential models, allow feature sequences, even noisy or raw ones — the model can extract structure from the temporal patterns.
4. Information Overlap / Target Leakage
⚠️ If a feature already contains information about the target (e.g. future return), your model will “cheat.”
This is one of the most common — and dangerous — mistakes in quantitative modeling. It leads to inflated backtests and poor live performance.
👉 Example:
Using future_return_5
as a feature while also trying to predict the next 10-bar return will lead to a circular logic — the model learns what it's supposed to predict.
🔧 How to avoid it:
Check causality: ask yourself, “Would I have had access to this feature at the moment of the signal?”
💡 Always ensure your features are based strictly on past and present information relative to the signal — never the future.
5. Regime-Dependence
⚠️ Some features are powerful… but only in specific market conditions (e.g. trend, range, high volatility).
A feature might seem useless when tested globally, but extremely predictive in the right regime.
👉 Example:
A trend-based slope may perform well during trending markets, but become noise in ranging conditions.
🔧 Solutions:
Create regime-aware features, like volatility-adjusted slopes or trend-strength indicators.
Test your features by regime, either by segmenting data or adding regime labels, and evaluate how their performance shifts.
💡 Market behavior isn’t static — neither should your feature assumptions be.
🎯 Conclusion: Feature Selection Is Not a Checklist — It’s a Design Process
Most traders pick features by default: “I saw someone use RSI, so I’ll add it too.”
But real edge doesn’t come from stacking technical indicators.
It comes from choosing features that are:
Aligned with your target (direction, magnitude, volatility),
Adapted to your horizon (fast vs. slow reactions),
Compatible with your model (summary vs. sequential data),
Free from leakage (cutoff windows, causal logic),
Robust across regimes (or specifically designed for one).
If you apply this logic, every feature you use has a reason to be there — and your models become sharper, cleaner, and more generalizable.
💬 Now it’s your turn:
What’s your favorite technique for feature selection in trading? I’d love to hear it in the comments.
🚀 Whether you're coding your first models or scaling a live strategy, ML4Trading gives you the tools, templates, and theory to build robust and intelligent trading systems.
Love the article. Concise and insightful at the same time.
As a veteran data scientist, I've seen many people make the mistakes you explained here.
Thanks for sharing Lucas.