Quant Backtest Harness50K parameter combos · 3 engines · one CLI
A strategy backtesting and walk-forward analysis layer over Backtrader, Zipline, and QuantConnect — one CLI, one data contract, three engines underneath, and results you can actually compare.
The diagram, walked through in plain language
- 1One command starts a test
A simple CLI takes a strategy name, a date range, and a parameter grid (e.g. 'test these 50,000 combinations of stop-loss and take-profit values').
- 2The harness translates inputs for each engine
Backtrader, Zipline, and QuantConnect each want data in their own shape. A small adapter per engine converts our standard format into theirs, so the strategy code does not have to change.
- 3Tests run in parallel across many machines
Each parameter combination is a Ray task that runs in its own process, so a 50K-combo sweep finishes in a fraction of the single-machine time.
- 4Walk-forward built in
Instead of testing on all history at once (which flatters strategies), the tool can 'train on 24 months, test on the next 3, slide forward by 1 month' repeatedly — the way real trading must work.
- 5All results land in one warehouse
Whether the test ran on Backtrader, Zipline, or QuantConnect, the output is normalised into a shared schema and saved to DuckDB.
- 6Comparing strategies becomes a SQL query
Researchers ask 'which parameter combination had the best risk-adjusted return last quarter?' against DuckDB instead of stitching Excel sheets together by hand.
The brief
A small prop team was running strategies across three different backtesting engines for three different reasons: Backtrader because it was what the senior quant knew, Zipline for its Pipeline DSL, and QuantConnect when they needed a venue adapter someone else had already maintained.
The problem: the three engines had three different data contracts, three different result shapes, and no shared reporting layer. A “compare these two strategies across engines” question took half a day of manual Excel alignment to answer.
The constraints
- Strategy code could not change. Rewriting 40 strategies to fit a new abstraction was a non-starter.
- Results had to come out in a single schema, regardless of which engine produced them.
- Walk-forward analysis had to be a first-class citizen, not something the user had to stitch together with cron jobs.
- Parameter sweeps had to scale horizontally — 50K-combo runs should not take a weekend.
The shape we built
A thin Python harness with three responsibilities: normalize inputs (one data catalog, engine-specific feeders fan out), dispatch execution (each engine runs in its own process with a well-defined result protocol over Unix sockets), and persist results (everything lands in DuckDB with a common schema, indexed by strategy hash and parameter signature).
Walk-forward is expressed declaratively — “train on 24 months, test on 3, step 1 month” — and compiled down to engine-specific calls. Parameter sweeps run under Ray; the harness submits one Ray task per combo and streams results back into DuckDB as they complete.
What was hard
- Result alignment. Each engine defines “trade” differently — Backtrader aggregates at position close, Zipline emits on every fill, QuantConnect does whatever you tell it. Normalizing required engine-specific adapters with a shared post-processing layer.
- Time handling across timezones. One engine was UTC, one was exchange-local, one was naive. The harness enforces UTC at the boundary and raises loudly on ambiguity.
- Reproducibility. Sweep results must be byte-identical across reruns. Pinning random seeds, sorting input data deterministically, and eliminating clock-based branches took the last 10% of the project and was worth all of it.
What it does today
A single CLI runs the team's entire strategy suite across three engines with identical output. Sweep runtime is 85% lower than the single-engine baseline thanks to parallelism. 14 strategies have been shipped from harness to paper-trading since launch. The DuckDB result warehouse has become the team's analytical workbench — every research question now starts as a SQL query, not a Jupyter notebook.
What I'd do differently
I'd add probabilistic Sharpe ratio and deflated Sharpe ratio to the default result schema from day one. Regular Sharpe is the number everyone asks for; the corrected versions are the number that tells you whether to trade. The team converged on computing them by hand for a year before I gave up and made them first-class.
- Python
- Backtrader · Zipline · QuantConnect LEAN
- Pandas · Polars
- DuckDB (result warehouse)
- Typer (CLI)
- Ray (parallel sweeps)
Continue the tour
Have a similar problem?
If this shape of engagement fits what you're working on, I'd be happy to scope it.