Essential Mathematics You Need to Know as a Quant

A quant’s life is an unusual one: you are paid to reason, not to feel. And the reasoning is mathematical — not the flashy sort that writes proofs for undergraduate textbooks, but the quieter, older tools that accrue into an intuition for how uncertainty moves through a system. The job is to turn noise into structure, and structure into a number you are willing to stake capital against. The Python, the Rust, the execution venues, the Kubernetes cluster breathing in a Frankfurt data centre — all of it is plumbing. The mathematics is the thing.

What follows is the working mathematician’s toolkit that the profession actually uses, section by section, with the equations rendered properly. For each tool I’ve tried to name the algorithms it makes possible — because an equation without a use is a trick, and the quant trade has very little room for tricks.

1 · Probability — the grammar of uncertainty

Before anything else, a quant must speak probability fluently. A random variable $X$ is a mapping from outcomes to numbers, and almost every piece of financial machinery — PnL, risk, return, drawdown — is a functional of such a mapping. Two numbers dominate: the expectation, which tells you where the mass sits, and the variance, which tells you how reluctantly it sits still.

E [X] = \int x f_{X} (x) d x Var (X) = E [(X - E [X])^{2}]

The first two moments. Most of quant finance is an argument about higher ones.

Between two variables, the relationship you will meet most is covariance — the unsung protagonist of portfolio theory, factor modelling, and risk attribution. The Pearson correlation that every chart displays is simply covariance scaled into a polite interval:

Cov (X, Y) = E [(X - μ_{X}) (Y - μ_{Y})] ρ_{X, Y} = \frac{Cov ( X , Y )}{σ _{X} σ _{Y}}

Two results carry more of the load than any other. Bayes’ theorem is the formal statement of learning from data; it tells you how a prior belief should shift once the world speaks. Every calibration routine, every filter, every Bayesian signal-combiner you will ever write is a special case of it.

P (H ∣ D) = \frac{P ( D ∣ H ) P ( H )}{P ( D )}

Bayes' theorem. The engine under every adaptive model you will build.

The second is the Central Limit Theorem — the theorem a quant quotes more than any other, and abuses more than any other. It promises that sums of independent, finite-variance variables approach a normal law, which is why Gaussians lurk inside so many trading models:

\frac{1}{n} i = 1 \sum n (X_{i} - μ) d N (0, σ^{2})

The CLT, stated carefully. The word that does the work is independent.

2 · Linear algebra — risk as a matrix

A portfolio is a vector. A covariance structure is a matrix. A factor model is a factorisation. The step from single-asset intuition to an institutional book is a step into linear algebra, and the quants who cannot think in matrices are forever limited to toy problems.

Given portfolio weights $w \in R^{n}$ and an asset-level covariance matrix $Σ \in R^{n \times n}$ , the variance of the portfolio return is a quadratic form — a single, compact expression that every risk engine is built around:

σ_{p}^{2} = w^{⊤} Σ w

Portfolio variance. Every risk engine in finance boils down to evaluating this number, quickly and correctly.

The matrix $Σ$ is symmetric and positive semidefinite, which means it admits an eigendecomposition. That factorisation is the door into principal component analysis — the tool that turns a correlated mess of hundreds of assets into a handful of orthogonal factors that actually explain the variance:

Σ = Q Λ Q^{⊤} = i = 1 \sum n λ_{i} q_{i} q_{i}^{⊤}

Spectral decomposition. The eigenvectors are latent factors; the eigenvalues are how much each factor matters.

For rectangular data — a return matrix of $T$ days by $n$ assets — the generalisation is the singular value decomposition, the single most useful factorisation in applied mathematics:

R = U S V^{⊤}

3 · Calculus and optimisation — finding the best trade

Almost every quantitative question ends at an optimiser. Which portfolio maximises return for a given variance? Which parameters make the model fit the market? Which execution schedule minimises cost? The language of these questions is multivariate calculus, and the answer, inevitably, involves a gradient and a constraint.

\nabla f (x) = (\frac{\partial f}{\partial x _{1}}, \dots, \frac{\partial f}{\partial x _{n}}) H_{ij} = \frac{\partial ^{2} f}{\partial x _{i} \partial x _{j}}

The gradient points to steeper ground; the Hessian tells you about curvature, which is how an optimiser knows it has arrived.

The archetypal quant optimisation is Markowitz’s mean–variance portfolio. You want return, you dislike variance, and a single parameter $λ$ trades one against the other:

w max w^{⊤} μ - \frac{λ}{2} w^{⊤} Σ w s.t. 1^{⊤} w = 1

Markowitz, 1952. Sixty-four years later, every institutional book in the world is still fundamentally solving a descendant of this problem.

When constraints enter — and they always do — the Lagrangian is the correct way to think. The KKT conditions that follow from it are the backbone of every convex solver in the quant stack, from cvxpy to the commercial workhorses:

L (x, ν) = f (x) + i \sum ν_{i} g_{i} (x)

4 · Stochastic calculus — when $d t$ is not enough

The instant you model a price path, ordinary calculus fails you. Prices are rough — their sample paths have infinite variation over any interval — so the Riemann integral that sufficed for physics gives way to the Itô integral, and with it a second-order term that ordinary Taylor expansions do not prepare you for.

The starting object is Brownian motion $W_{t}$ : continuous, nowhere-differentiable, Gaussian in its increments. Built on top of it, the canonical model of a stock price is geometric Brownian motion, the stochastic differential equation that Black and Scholes reached for in 1973:

d S_{t} = μ S_{t} d t + σ S_{t} d W_{t}

Geometric Brownian motion. The single most taught — and most argued with — SDE in finance.

The tool for manipulating such an equation is Itô’s lemma — the stochastic chain rule, and the single result every derivatives quant can write from memory at three in the morning:

df (t, S_{t}) = (\frac{\partial f}{\partial t} + μ S_{t} \frac{\partial f}{\partial S} + \frac{1}{2} σ^{2} S_{t}^{2} \frac{\partial ^{2} f}{\partial S ^{2}}) d t + σ S_{t} \frac{\partial f}{\partial S} d W_{t}

Itô's lemma. The unfamiliar term is the second-order one; it is the price of admission to stochastic calculus.

Applied to a European contingent claim and combined with a no-arbitrage hedging argument, it collapses into the Black–Scholes PDE — a deterministic equation whose solution is the fair value of the derivative:

\frac{\partial V}{\partial t} + \frac{1}{2} σ^{2} S^{2} \frac{\partial ^{2} V}{\partial S ^{2}} + r S \frac{\partial V}{\partial S} - r V = 0

The Black–Scholes PDE. A model, not a truth — but one whose reflexes are hard-wired into every options trader's instincts.

5 · Time series — the shape of data in time

Markets produce the most stubborn kind of data: serially correlated, heteroskedastic, rarely stationary, occasionally deranged. The sub-discipline that deals with this is time-series econometrics, and its first concept — stationarity — is the one every beginner silently assumes and every veteran carefully verifies.

x_{t} = c + i = 1 \sum p ϕ_{i} x_{t - i} + j = 1 \sum q θ_{j} ε_{t - j} + ε_{t}

The ARMA(p,q) model. An innocent-looking expression that underlies a surprising amount of short-horizon forecasting.

Returns are approximately uncorrelated, but their squares are not — volatility clusters. The model that formalises this, and the one every risk team still reaches for first, is Bollerslev’s GARCH(1,1):

σ_{t}^{2} = ω + α ε_{t - 1}^{2} + β σ_{t - 1}^{2}

GARCH(1,1). Three parameters that capture more of realised volatility than most multi-million-dollar models.

Two non-stationary series can nonetheless share a stationary combination — cointegration, the single most exploited statistical structure in pairs trading and statistical arbitrage. The spread $z_{t} = y_{t} - β x_{t}$ is the tradable object; the rest is execution.

6 · Information theory and the Kelly criterion

A signal is not tradable until you decide how much to stake on it, and the mathematics of sizing is older than most people realise. Shannon’s entropy measures the uncertainty in a distribution; Kelly took that mathematics and turned it into a rule for capital:

H (X) = - i \sum p_{i} lo g p_{i}

Shannon entropy. The same log that will appear in the Kelly fraction — that is not a coincidence.

For a simple binary bet with win probability $p$ and payoff odds $b$ , the fraction of capital that maximises the log-growth rate is startlingly clean:

f^{*} = \frac{p ( b + 1 ) - 1}{b} = \frac{p}{1} - \frac{1 - p}{b}

Kelly, 1956. The only sizing rule with a formal optimality proof; the only sizing rule practitioners reliably under-use.

In the continuous, Gaussian limit — the regime most quant strategies actually live in — the Kelly fraction collapses into the shape a portfolio manager will recognise immediately:

f^{*} = \frac{μ - r}{σ ^{2}}

Continuous Kelly. The numerator is edge; the denominator is variance; the ratio is how much to bet. Everything else is risk management.

7 · Numerical methods — because closed forms are rare

The mathematics above will carry you only until the model becomes interesting, at which point closed-form solutions evaporate and numerical methods step in. Two families dominate: Monte Carlo, which samples; and grid-based schemes, which discretise.

E [f (X)] \approx \frac{1}{N} i = 1 \sum N f (X_{i}), X_{i} \sim P

The Monte Carlo estimator. Embarrassingly parallel, brutally general, and the reason GPUs found their way into quant research.

For PDE-based pricing — the Black–Scholes equation, Heston, rates models — the usual tool is a finite-difference scheme. The key idea is equally humble: replace derivatives with divided differences.

\frac{\partial V}{\partial S} \approx \frac{V ( S + Δ S ) - V ( S - Δ S )}{2 Δ S}

These are the unglamorous workhorses. You will spend more of your career tuning them — variance reduction, quasi-random sequences, implicit solvers, convergence diagnostics — than you will proving theorems.

8 · The subject is one, not seven

It is tempting, after a tour like this, to file each branch in a separate cabinet — probability here, linear algebra there, stochastic calculus under a locked drawer. The quant who actually ships systems learns the opposite habit. The branches cohere; a single trade idea flows through all of them in the space of a few hundred lines of code.

A feature is a transformation of past prices — a number drawn from a probability distribution whose moments you have estimated using time-series tools.
A model maps features to forecasts, and the fitting is multivariate optimisation over a loss whose derivatives you take by hand or by autograd.
A forecast becomes an edge, and the covariance matrix decides how that edge translates into positions — linear algebra all the way down.
A position becomes a size via Kelly or a fraction thereof, constrained by a convex optimiser that balances return, risk, and turnover.
A size becomes an execution plan, solved by a stochastic optimal-control problem whose closed form, when one exists, was derived using Itô calculus.

“A mathematician, like a painter or a poet, is a maker of patterns.”

— G. H. Hardy, A Mathematician's Apology

That description holds for the quant as well — with the inconvenient caveat that the patterns have to price, hedge, and not lose money. The mathematics in this article is the sketchbook. The algorithms are the paintings. And the market, for all its noise, is the stubborn, honest critic whose verdicts all of us are obliged to accept.

#mathematics#quant-finance#probability#stochastic-calculus#linear-algebra#optimisation#information-theory#kelly

Get the next essay in your inbox.

Tuesday weekly. Mathematics, finance, and AI — written like an engineer, not a marketer.

Free. Weekly. One click to unsubscribe. Hosted on Buttondown.

Found this useful?

Share it — it helps the next person find this work.

X LinkedIn

Essential Mathematics You Need to Know as a Quant

1 · Probability — the grammar of uncertainty

2 · Linear algebra — risk as a matrix

3 · Calculus and optimisation — finding the best trade

4 · Stochastic calculus — when $d t$ is not enough

5 · Time series — the shape of data in time

6 · Information theory and the Kelly criterion

7 · Numerical methods — because closed forms are rare

8 · The subject is one, not seven

Get the next essay in your inbox.

If this resonated, let's talk.

Continue reading

Why Algorithmic Trading and Machine Learning Are the Same Problem in Different Clothes

The Eloquent Math Behind the Top Five Trading Strategies

Essential Mathematics You Need to Know as a Quant

1 · Probability — the grammar of uncertainty

2 · Linear algebra — risk as a matrix

3 · Calculus and optimisation — finding the best trade

4 · Stochastic calculus — when dt is not enough

5 · Time series — the shape of data in time

6 · Information theory and the Kelly criterion

7 · Numerical methods — because closed forms are rare

8 · The subject is one, not seven

Get the next essay in your inbox.

If this resonated, let's talk.

Continue reading

Why Algorithmic Trading and Machine Learning Are the Same Problem in Different Clothes

The Eloquent Math Behind the Top Five Trading Strategies

4 · Stochastic calculus — when $d t$ is not enough