What I’ve Learned About Creating Trading Targets That Actually Work

Avoid the common traps and learn a structured process for defining your labels.

Jun 20, 2025

Creating a target is one of the most important parts of building a trading model and also one of the most overlooked.

Most people spend time tweaking features or optimizing models, but they forget to ask the most basic question: what exactly am I trying to predict?

Over time, I’ve built my own checklist to avoid creating noisy, inconsistent, or misleading targets. This newsletter is a condensed version of that checklist, the same steps I use when building real strategies, whether for myself or clients.

It’s not a universal method. But it’s a solid base if you want your targets to make sense, match your features, and actually survive in production.

1. Define the Right Objective

When I start working on a strategy, I never just say “let’s predict target X or Y.” That’s not how it works, at least not if you want something that actually works.

The real starting point is always: what problem am I trying to solve?

Let’s say I’m building a strategy and I notice that it breaks down every time volatility spikes. My signals stop making sense, the performance drops, and it feels like I’m trading blind.

That’s when I think: “What if I could detect in advance when volatility is about to jump?”

Now I have a real problem. My objective becomes: anticipate future volatility.
And depending on what I need from the model, I can build different types of targets:

If I just want to know whether volatility will increase or not → binary classification
If I need a precise estimate of future vol → regression (e.g. log of realized volatility)
If I care about detecting regime shifts → multi-class targets or threshold-based labeling

The key is: the target comes from the problem, not from a tutorial, not from a random idea.

And often, just taking the time to ask “what exactly am I trying to predict?” makes the whole strategy clearer.

2. Choose the Right Horizon

Once I know what I’m trying to predict, the next question is: when do I want to know it?

That’s where the target horizon comes in. And honestly, this part is too often chosen by default, 10 bars, 1 hour, next day, without thinking if it actually fits the strategy.

But the horizon changes everything.

Let’s go back to the volatility example.
If I want to know whether the volatility will spike in the next 3 candles, I’ll need a short-term horizon, like 3 to 5 bars ahead.
If I care about the volatility trend over the next few days, then I might stretch that to 10 or 15 bars.

There’s no “correct” number.
What matters is that it matches the decision delay of your strategy: how quickly you can (and want to) react.

And that changes the kind of features I’ll need later too. A 200-bar volatility target doesn’t make sense to predict the next 3-candles volatility spike.
So I always take the time to align my horizon with the speed of the strategy.

If you don’t know your horizon, you don’t know what you’re predicting.

3. Condition Your Target (optional)

Financial data is some of the hardest to model in the world.
It’s noisy, unstable, full of randomness and it changes constantly.

Your job isn’t to model everything.
It’s to reduce the complexity of the problem until it becomes solvable.

That’s why I condition my targets.

Let’s say I’m trying to predict future returns. If I ask my model to predict the next return for every single bar, across every possible market condition, I’m setting it up to fail. That’s just too chaotic.

So instead, I narrow the scope.

Maybe I’ll only look at candles where the RSI is above 70. Or when the ATR is above a certain level. Or after a breakout. Or during high volatility.

I don’t always know in advance what the best conditioning is. But I experiment, I test different environments, and I try to find situations where the target becomes more predictable.

That’s the whole point.

The only non-negotiable rule: “Condition using only the past, never the future.”
Otherwise, you won’t be able to recreate the same target in production.

This step alone can take your model from “confused and noisy” to “focused and learnable.”

4. Check If Your Target Actually Makes Sense

Before you train anything, you need to sanity-check your target. Just because the numbers are there doesn’t mean the signal is good.

Here’s what I look for every time:

→ Is the distribution balanced?
If 90% of your labels are “0” and only 10% are “1”, your model might just learn to always predict the majority class. That’s not learning, that’s just playing the odds.
If it’s imbalanced, plan for techniques like undersampling, oversampling, or weighted losses.

→ Is the label stable over time?
If your target is all 1s in 2020 and all -1s in 2022, you’re not modeling a repeatable pattern, you’re just capturing a market phase.
You want a target that generalizes across regimes, not one that breaks at the first structural shift.

→ Does the label make economic sense?
Could a trader (even without ML) look at the data and say “Yeah, I get why this would be labeled +1”?
If your label doesn’t align with any intuitive logic or price behavior, you’re probably building noise on top of noise.

A good target is balanced, stable, and explainable. Without that, even the best model won’t help you.

5. Filter the Extremes

One question I always ask before training is:
“Did I filter out extreme cases that might confuse the model?”

Sometimes, your data will contain absurd values:

A return of +300% in a minute
A label of +1 right before a delisting
A spike that only happened once in a decade

And yes, these can happen in live trading. But if they represent just 0.05% of your dataset, do you really want your model to “learn” from them?

Training on rare, extreme events can make your model unstable. It starts overfitting to outliers instead of capturing patterns that actually repeat. So most of the time, I filter them.

By the way, I filter them only in the training sets, not in the test sets.

But, and this is important, I never filter blindly.

If they’re too rare to learn from, I exclude them from training, but I still keep track.
I might handle them separately, with another logic or risk overlay.

The key is not just to clean the data.
It’s to understand what you’re removing, and why.

If you want to go deeper, build smarter features, understand signal reliability, and master techniques like triple-barrier labeling, model understanding, or feature conditioning, that’s exactly what we cover in ML4Trading.

🚀 Whether you're coding your first models or scaling a live strategy, ML4Trading gives you the tools, templates, and theory to build robust and intelligent trading systems.

Thanks for reading, now it’s your turn to build the brain of your strategy.

Discussion about this post

Ready for more?