Build once. Run live.

A unified approach to feature pipelines across research and production

Apr 17, 2026

Most pipelines break when they leave research.

Not because they are incorrect, but because they were never designed to run incrementally. They assume full history, they assume recomputation, and they assume a static environment where everything can be rebuilt at each step.

Live systems operate under very different constraints. Data arrives sequentially, latency accumulates, and every inefficiency becomes persistent. In that context, the feature pipeline is no longer a preprocessing step. It becomes part of the system itself.

1. A pipeline that works in both worlds

With Oryon, the same pipeline object can be used across both research and production environments without modification.

It can be applied on full historical datasets in batch mode, and it can continue updating in a streaming setting using the exact same logic. There is no distinction between the two modes, no hidden adaptation layer, and no change in behavior.

This is not a convenience feature. It is a design constraint. The pipeline is built to behave identically regardless of how it is fed.

To keep the example simple, we can start from one of the sample datasets available directly in Oryon.

from oryon.datasets import load_sample_bars

df = load_sample_bars()
historical_bars, live_bars = df.iloc[:-10, :], df.iloc[-10:, :]

For the rest of this example, the idea is straightforward: we use the historical portion to initialize and validate the pipeline in research conditions, then we use the last ten bars to simulate a live stream and observe the updates one step at a time.

2. Defining a feature pipeline

We start by defining a set of features as a structured pipeline. Each component is stateful and designed to update incrementally.

from oryon import FeaturePipeline
from oryon.features import Sma, ParkinsonVolatility, Correlation, ShannonEntropy, Adf


features_list = [
    Sma(inputs=["close"], window=10, outputs=["close_sma_10"]),
    Sma(inputs=["close"], window=50, outputs=["close_sma_50"]),
    ParkinsonVolatility(inputs=["high", "low"], window=20, outputs=["close_pvol_20"]),
    ParkinsonVolatility(inputs=["high", "low"], window=100, outputs=["close_pvol_100"]),
    Correlation(inputs=["close", "volume"], window=30, outputs=["close_volume_corr_30"]),
    ShannonEntropy(inputs=["close"], window=50, outputs=["close_entropy_50"]),
    Adf(inputs=["close"], window=100, outputs=["close_adf_100", "close_adf_pval_100"])
]

pipe = FeaturePipeline(features_list, input_columns=["high", "low", "close", "volume"])

This is not a collection of independent indicators. It is a coherent system that maintains its own internal state and evolves as new data arrives.

3. Applying the pipeline in research

Once the pipeline is defined, we can apply it on historical data to inspect the outputs in research conditions.

from oryon.adapters import run_features_pipeline_pandas
df_features = run_features_pipeline_pandas(pipe, historical_bars)

Under the hood, the pipeline is designed to consume data as a list of lists rather than as a DataFrame directly. This may look slightly less convenient at first, but it comes from the same architectural choice that makes the system efficient in live trading: the update path is built around a minimal, streaming-oriented input structure.

In practice, this means the research interface stays aligned with the production one. Instead of introducing a separate batch-only abstraction, Oryon keeps the same core design and provides adapters to make research workflows easier to use.

That is exactly what run_features_pipeline_pandas does here. It converts the DataFrame into the internal list-of-lists format and applies the pipeline sequentially, so the historical execution remains fully consistent with the live update model.

The same idea is also available for Polars, which makes it easy to keep the same workflow regardless of the dataframe engine used upstream.

4. Switching to streaming

Once the historical pass is complete, the same pipeline can continue operating in streaming mode.

for bar in live_bars.itertuples():
  input_values = [bar.high, bar.low, bar.close, bar.volume] # pipe needs a list for the inputs
  output = pipe.update(input_values)
  print(output)

# [102.03397795639191, 102.41105040279119, 0.001373007598669601, 0.0018930289783861594, -0.21026851276116704, 0.948176332443655, -1.800248586774662, 0.380344996295653]
# ...
# [102.03397795639194, 102.25925749878635, 0.001373007598669601, 0.0018768715423311845, -0.1494113110563763, 0.8844888723968832, -4.77931213498186, 6.212151160126344e-05]

5. Why this matters

In most trading systems, the pipeline is effectively implemented twice. Once in research, often using batch-oriented tools, and once again in production, using a different stack optimized for streaming.

This duplication introduces divergence. Subtle differences appear, edge cases behave differently, and the system becomes harder to validate as a whole.

Oryon removes that duplication entirely. The pipeline is defined once, validated once, and then used as-is in production.

If you want to take this further and connect feature pipelines with AI agents, you can take a look at AI Trading Lab.

Oryon is now available in beta.

If you want to take a closer look at the library and its design, you can explore it directly on GitHub.

And if you find it useful, consider adding a star. it helps more than it seems.

Discussion about this post

Ready for more?