Every quant I know has a graveyard of abandoned backtesters behind them. Mine has seven headstones. The reason isn’t that the libraries are bad — most of them are remarkably good — it’s that the question “which backtester should I use?” has no library-level answer. It has a problem-shape answer. Match the shape of your strategy to the shape of the framework and everything is quiet. Mismatch them and you’ll spend six months fighting an abstraction that was never going to help you.
What follows is a practitioner’s tour of the Python backtesting landscape as it actually stands in 2026 — NautilusTrader, VectorBT, Backtrader, Zipline-Reloaded, QuantLib, bt, LEAN, Jesse, Backtesting.py, and FreqTrade — with an opinionated take on when each is the right tool and when it quietly becomes the wrong one. I’ve run real capital on most of these. The trade-offs below are from the bruises, not the README.
First, the single distinction that matters
Before the library comparison, the fork in the road. Every backtester sits on one of two opposite bets about how to simulate the market.
- Event-driven backtesters march time forward tick by tick (or bar by bar), dispatch each event to your strategy, route orders through a simulated matching engine, and track state. Slow. Honest. Behaves like live trading because it is live trading, minus the network.
- Vectorized backtesters express your signal as an array operation — masks, shifts, rolling windows — then compute the equity curve as a function over the whole series at once. Orders-of-magnitude faster. Lies comfortably about fills, slippage, and order-of-events issues unless you’re careful.
This distinction is the axis the rest of this article organizes around. Any library that refuses to take a side on it is lying.
Rust core. Python API. The same strategy code runs in backtest and in live — and that one property is worth the entire learning curve.
NautilusTrader is the library I’ve moved most of my live execution onto over the past eighteen months. It ships with a Rust core that handles the matching engine, order book, and event bus, and exposes a Python API that feels like Backtrader’s more capable younger sibling. The killer feature is the same code runs in backtest and in live trading. Not “with minor adjustments.” The same code.
That property sounds like a marketing bullet until you’ve spent a weekend debugging why your strategy worked in backtest and silently no-ops in production. With Nautilus, your strategy subclass doesn’t know which world it’s in. The venue adapters behind it change; your code doesn’t.
from nautilus_trader.trading.strategy import Strategy
from nautilus_trader.model.data import QuoteTick
from nautilus_trader.model.orders import MarketOrder
from nautilus_trader.model.enums import OrderSide
class MeanReversion(Strategy):
def on_start(self) -> None:
self.subscribe_quote_ticks(self.config.instrument_id)
def on_quote_tick(self, tick: QuoteTick) -> None:
mid = (tick.bid_price + tick.ask_price) / 2
if self.indicators.z_score(mid) < -2.0:
self.submit_order(self.order_factory.market(
instrument_id=tick.instrument_id,
order_side=OrderSide.BUY,
quantity=self.config.size,
))Where it stings. The Python API is still young in places — error messages occasionally come from the Rust side and require a translation layer in your head. The ecosystem of example strategies is smaller than Backtrader’s. And the mental model of actors, messaging, and the event loop is a real investment. You do not pick up Nautilus in an afternoon.
Pick it when: you’re building something you intend to run with real money, especially in execution-sensitive markets. Or when you want to stop maintaining two codebases.
Skip it when: you’re doing pure research on daily-bar equity strategies. It’s more engine than you need.
Vectorized NumPy + Numba acceleration. Tens of thousands of parameter combinations before lunch.
VectorBT is what you reach for when you need to test ten thousand variants of a strategy before lunch. The whole philosophy is that if your signal can be expressed as array operations over a price matrix, the backtest should take the same shape. Entries, exits, and stops become boolean arrays. Equity curves become reductions.
The speedup versus an event-driven framework is not subtle. Parameter sweeps that would take Backtrader a full day finish in minutes. You can genuinely afford to walk-forward optimize across a meaningful parameter space, which changes what kinds of research questions you can answer.
import vectorbt as vbt
import numpy as np
price = vbt.YFData.download("SPY", start="2015-01-01").get("Close")
fast = np.arange(5, 25)
slow = np.arange(30, 70, 2)
fast_ma = vbt.MA.run(price, fast, short_name="fast")
slow_ma = vbt.MA.run(price, slow, short_name="slow")
entries = fast_ma.ma_crossed_above(slow_ma)
exits = fast_ma.ma_crossed_below(slow_ma)
pf = vbt.Portfolio.from_signals(
price, entries, exits,
init_cash=100_000, fees=0.0005, slippage=0.0002,
)
print(pf.sharpe_ratio().unstack().round(2))VectorBT Pro, the paid successor, adds proper portfolio simulation, better order-fill modeling, intra-bar stop handling, and a mountain of indicators. It’s worth the license if you do this full-time. The free is enough for 90% of research workflows.
Pick it when: you’re doing strategy research, feature exploration, or any work where you need to sweep parameters at scale.
Skip it when: your strategy’s edge lives in order-level behavior. Vectorization is the wrong abstraction for a market-making quote-ladder problem.
A decade of battle-tested event-driven backtesting. The default answer from 2016–2022; still the one most tutorials use.
Backtrader was, for a long time, the correct answer to “which backtester should I use?” It has a sane object model (a subclass, a Cerebro engine, indicator chains), excellent documentation, and roughly a decade of forum answers that unstick you when you’re lost. It also has the largest collection of community strategy examples of any framework in this list.
import backtrader as bt
class SmaCross(bt.Strategy):
params = dict(fast=10, slow=30)
def __init__(self):
self.fast = bt.ind.SMA(period=self.p.fast)
self.slow = bt.ind.SMA(period=self.p.slow)
self.signal = bt.ind.CrossOver(self.fast, self.slow)
def next(self):
if not self.position and self.signal > 0:
self.buy()
elif self.position and self.signal < 0:
self.close()
cerebro = bt.Cerebro()
cerebro.addstrategy(SmaCross)
cerebro.adddata(bt.feeds.YahooFinanceData(dataname="SPY", fromdate=..., todate=...))
cerebro.broker.setcash(100_000)
cerebro.run()What’s changed is that the project has been effectively unmaintained since 2023, Python 3.12+ compatibility is patchy in places, and the performance gap versus the Rust-backed entrants has widened. It still works. It may work for another decade. But it’s no longer the default; it’s the safe, familiar option you pick when the risk of something newer outweighs the performance ceiling.
Pick it when: you’re a team that already knows it, you’re teaching, or you’re shipping bar-level equity strategies where performance doesn’t bite.
Skip it when: you need the same code to run live, or you’re working with intraday-tick data on a full universe.
Quantopian's backtester, reborn. Pipeline — the cross-sectional factor DSL — has no equivalent anywhere else.
Zipline was Quantopian’s backtester, and when Quantopian shut down in 2020 it was taken up by the community as . Its distinguishing feature is Pipeline, a DSL for expressing cross-sectional factor logic — “for every stock in the universe, at every date, compute the z-score of the 60-day momentum” — that is genuinely elegant and has no equivalent in any other framework.
from zipline.api import attach_pipeline, pipeline_output, order_target_percent
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import Returns
def make_pipeline():
momentum = Returns(window_length=60)
top = momentum.top(50)
return Pipeline(columns={"mom": momentum}, screen=top)
def initialize(context):
attach_pipeline(make_pipeline(), "longs")
def handle_data(context, data):
longs = pipeline_output("longs").index
for asset in longs:
order_target_percent(asset, 1.0 / len(longs))If your strategies look like cross-sectional factor models — long-short equity, factor tilts, anything that ranks a universe — Zipline’s Pipeline is the cleanest API in the ecosystem. Writing the same logic in Backtrader or VectorBT is possible; it’s just not as nice.
Where it hurts: data ingestion is a project in itself. The asset metadata model assumes US equities with a particular kind of bundle; bending it to crypto or FX is painful.
Pick it when: you’re doing equity factor research and you want Pipeline.
Skip it when: you’re not doing US equities or you’re not doing cross-sectional work.
Not a backtester. The industrial-grade quant library for pricing, curves, and derivatives — pair it with one of the above.
QuantLib gets lumped into backtesting discussions because people Google “quant python library” and it’s the first hit. It is not a backtester. It is the serious, industrial-grade library for pricing derivatives, bootstrapping yield curves, modeling credit, and running Monte Carlo simulations on stochastic processes.
If your work touches options, swaps, bonds, or any fixed-income instrument, QuantLib is not optional — there is genuinely no substitute in the open-source world. The Python bindings () expose most of the C++ engine.
import QuantLib as ql
today = ql.Date(16, 4, 2026)
ql.Settings.instance().evaluationDate = today
spot = ql.SimpleQuote(100.0)
vol = ql.BlackConstantVol(today, ql.NullCalendar(), 0.20, ql.Actual365Fixed())
rate = ql.FlatForward(today, 0.04, ql.Actual365Fixed())
process = ql.BlackScholesProcess(
ql.QuoteHandle(spot),
ql.YieldTermStructureHandle(rate),
ql.BlackVolTermStructureHandle(vol),
)
option = ql.VanillaOption(
ql.PlainVanillaPayoff(ql.Option.Call, 105.0),
ql.EuropeanExercise(today + ql.Period(6, ql.Months)),
)
option.setPricingEngine(ql.AnalyticEuropeanEngine(process))
print(f"price={option.NPV():.4f} delta={option.delta():.4f}")Pair it with a backtester, don’t expect it to be one. I typically compute option prices and greeks in QuantLib and hand them to Nautilus or VectorBT as features.
The engine behind QuantConnect. Institutional plumbing in a container — frameworks for everything, which helps or hurts depending on your mood.
LEAN is the open-source engine behind QuantConnect. It’s the most institutional-feeling framework in this list — proper brokerage integrations, real tick data, realistic fills, portfolio construction frameworks, risk management frameworks, universe selection frameworks. Frameworks all the way down. That can feel heavy on a small strategy and liberating on a complex one.
from AlgorithmImports import *
class Momentum(QCAlgorithm):
def Initialize(self):
self.SetStartDate(2022, 1, 1)
self.SetCash(100_000)
self.symbol = self.AddEquity("SPY", Resolution.Daily).Symbol
self.mom = self.MOM(self.symbol, 60, Resolution.Daily)
def OnData(self, data):
if not self.mom.IsReady:
return
if self.mom.Current.Value > 0 and not self.Portfolio.Invested:
self.SetHoldings(self.symbol, 1.0)
elif self.mom.Current.Value < 0 and self.Portfolio.Invested:
self.Liquidate()Running it locally is a Docker situation, not a pip situation. The primary workflow is really QuantConnect’s cloud, where you’re renting their infra and their data. That’s fine for a lot of people; it’s a non-starter if you need your research to stay on your machine.
Pick it when: you’re running a multi-asset, multi-strategy operation and you want someone else’s data and brokerage stack. Also a legitimate choice for a solo quant who wants to stop being a DevOps engineer.
Skip it when: you want full local control, or your strategy doesn’t justify the framework overhead.
The cleanest portfolio-level backtester in Python. Strategies are trees of composable algos — SelectAll → WeighEqually → Rebalance.
(from the author of ) is the cleanest portfolio-level backtester in Python. It expresses strategies as a tree of algos — SelectAll, WeighEqually, Rebalance — that compose naturally into asset-allocation logic. If your strategy is fundamentally “hold these assets in these weights, rebalance monthly,” is the library that makes expressing it feel pleasant instead of verbose.
import bt
strategy = bt.Strategy("equal-weight", [
bt.algos.RunMonthly(),
bt.algos.SelectAll(),
bt.algos.WeighEqually(),
bt.algos.Rebalance(),
])
backtest = bt.Backtest(strategy, prices)
result = bt.run(backtest)
result.display()It’s not suitable for order-level execution research, and it doesn’t pretend to be. Within its lane, it’s excellent.
Specialists worth knowing
Crypto-native, beautiful API, opinionated about structure. If you only trade crypto perpetuals, it's a serious Nautilus alternative with a gentler learning curve.
Tiny, single-file, quietly perfect for one-off sketches and teaching examples. Don't build a production system on it — that isn't what it's for.
A crypto trading bot with a backtest mode rather than a backtester with a live mode. Strong retail crypto tooling; overkill for research.
The comparison, at a glance
If you want the whole landscape on one screen, this is the cheat sheet I keep in the wiki of every team I’ve advised.
| Library | Model | Speed | Live parity | Learning | Best for |
|---|---|---|---|---|---|
| NautilusTrader | Event-driven | Very fast | Same code | Steep | Intraday FX / crypto / futures |
| VectorBT | Vectorized | Fastest | None | Moderate | Research, parameter sweeps |
| Backtrader | Event-driven | Slow | Separate | Gentle | Teaching, daily-bar equities |
| Zipline-Reloaded | Event-driven | Moderate | None | Moderate | US equity factor models |
| QuantLib | Pricing lib | C++ fast | N/A | Steep | Derivatives, fixed income |
| LEAN | Event-driven | Fast | Same code | Steep | Institutional multi-asset |
| bt | Bar-level | Moderate | None | Gentle | Asset allocation, rebalancing |
| Jesse | Event-driven | Fast | Same code | Gentle | Crypto perps only |
| Backtesting.py | Event-driven | Fast | None | Minimal | Sketches, teaching |
| FreqTrade | Event-driven | Moderate | Same code | Gentle | Retail crypto bots |
A decision framework, explicit
The one metric to always compute, regardless of library
Every backtester will give you an equity curve, a Sharpe, and a drawdown. None of these, on their own, tells you whether the strategy is real. The number I care about most — and that is embarrassingly easy to compute in any framework — is the probabilistic Sharpe ratio:
The standard Sharpe ratio quietly assumes Gaussian returns. The probabilistic Sharpe doesn’t. Strategies with the same nominal Sharpe but different tail behavior — one with frequent small wins and rare large losses, another with steady returns — produce wildly different PSRs.
import numpy as np
from scipy.stats import norm, skew, kurtosis
def probabilistic_sharpe(returns: np.ndarray, sr_threshold: float = 0.0) -> float:
sr = returns.mean() / returns.std(ddof=1)
T = len(returns)
g3 = skew(returns)
g4 = kurtosis(returns, fisher=False)
num = (sr - sr_threshold) * np.sqrt(T - 1)
den = np.sqrt(1 - g3 * sr + (g4 - 1) / 4 * sr ** 2)
return float(norm.cdf(num / den))
# Usage (daily returns):
# psr = probabilistic_sharpe(daily_returns, sr_threshold=1.0 / np.sqrt(252))
# Interpret as: probability that the true Sharpe exceeds SR*.This number alone will tell you whether your backtest is a real edge or a lottery ticket in disguise. Compute it. In whichever library you picked from the list above.
The closing thought
“Backtesting is not a research tool. Backtesting is the final validation before deployment. Feature importance, walk-forward, and regime analysis should have convinced you long before the equity curve does.”
The library you pick matters less than the discipline with which you use it. A careful researcher with Backtrader will outperform an impatient one with the Rust-backed, hot-from-GitHub alternative. All of these libraries are good enough. The ones in active development — Nautilus, VectorBT, LEAN — are slightly better than good enough. Pick the shape that fits your problem, port your best research into an event-driven framework before it sees real money, and measure your results with more than a Sharpe.
That’s the job. Everything else is marketing.
Get the next essay in your inbox.
Tuesday weekly. Mathematics, finance, and AI — written like an engineer, not a marketer.
Free. Weekly. One click to unsubscribe. Hosted on Buttondown.