[{"content":"Why This Matters The previous article on finite difference methods solved the heat equation by brute force: lay down a grid, step it through time, let the solution emerge step by step. It works, and for many pricing problems it is the practical choice. But can we instead solve the equation analytically?\nFor a class of PDEs, the heat equation among them, we can. The idea is to stop viewing a function as a shape over space and instead see it as a combination of frequencies. That change of view is the Fourier transform. In this article, we explore it on the heat equation, where the diffusion structure shows through without the variable coefficients of Black-Scholes to clutter it, and then turn to its application in option pricing, where it works even when the distribution of prices has no closed form.\nThe Heat Equation Again Take the same setup as before: a metal rod of length $L$, insulated along its sides so heat only flows along its length. Its temperature $u(x, t)$ depends on position $x \\in [0, L]$ and time $t$, and evolves under\n$$\\frac{\\partial u}{\\partial t} = \\kappa \\frac{\\partial^2 u}{\\partial x^2},$$where $\\kappa$ is the thermal diffusivity. We know the temperature profile at $t = 0$, call it $u(x, 0) = f(x)$, and we want the profile at any later time.\nThe numerical method took $f(x)$ on a grid and advanced it through time. To solve it analytically we have nothing yet, so the honest first move is just to look for solutions, any solutions, and see what the equation will allow.\nLooking for Solutions What makes this equation hard to solve directly is that it ties space and time together: the rate of change in time at a point depends on the curvature in space around that point, so the whole profile evolves as one coupled object. Rather than attack that coupling head-on, watch what the rod actually does as it cools, and look for solutions that behave the same way.\nHeat tends to fade in place. A hump of warmth in the middle of the rod stays a hump in the middle; it does not drift to one end or sprout a second peak. The shape of the profile holds while its height drains away. If we look for solutions that do exactly this, hold a fixed shape and simply shrink, we are looking for a spatial profile $X(x)$ multiplied by a time-dependent amplitude $T(t)$ that scales it up or down:\n$$u(x, t) = X(x)\\, T(t).$$The figure below shows one such shape cooling: it loses height while holding its form, every point cooling in the same proportion. This is the behavior we are looking for.\nA single mode cooling The shape stays fixed; only its height changes. Every point loses heat in the same proportion.\nTime t = 0.00 Drag to advance time. The faint outline marks the starting profile; the bold curve is the profile now. It shrinks toward the axis without shifting its peak or changing its form.\nWith the form in hand, substitute it into the heat equation. The left side differentiates only $T$, the right side only $X$:\n$$X(x)\\, T'(t) = \\kappa\\, X''(x)\\, T(t).$$Move everything in $T$ to one side and everything in $X$ to the other:\n$$\\frac{T'(t)}{\\kappa\\, T(t)} = \\frac{X''(x)}{X(x)}.$$A function of $t$ alone can equal a function of $x$ alone, for all $x$ and $t$, only if both are the same constant. Call it $-\\lambda$ (the sign is a convention chosen because we expect decay):\n$$\\frac{X''(x)}{X(x)} = -\\lambda, \\qquad \\frac{T'(t)}{\\kappa\\, T(t)} = -\\lambda.$$Two ordinary differential equations, each in one variable, which we can now solve independently.\nThe Spatial Part and the Role of Boundary Conditions The spatial equation is $X''(x) = -\\lambda X(x)$. For $\\lambda \u003e 0$ its solutions are oscillations,\n$$X(x) = A \\cos(\\omega x) + B \\sin(\\omega x), \\qquad \\omega = \\sqrt{\\lambda}.$$Both sine and cosine solve the equation. Which combination we keep is not a mathematical preference; it is forced by what we hold fixed at the ends of the rod.\nConsider two different boundary conditions.\nEnds held at zero temperature. If the rod is clamped to ice baths at both ends, then $u(0, t) = u(L, t) = 0$ for all time, so $X(0) = X(L) = 0$. The condition $X(0) = 0$ kills the cosine term, since $\\cos(0) = 1$ and the cosine cannot vanish at the origin without disappearing entirely; only the sine, with $\\sin(0) = 0$, survives. The condition $X(L) = 0$ then forces $\\sin(\\omega L) = 0$, so $\\omega L$ is a multiple of $\\pi$. The allowed shapes are\n$$X_n(x) = \\sin\\left(\\frac{n \\pi x}{L}\\right), \\qquad n = 1, 2, 3, \\dots$$Ends insulated. If instead the ends are insulated so no heat flows across them, the condition is on the slope, not the value: $u_x(0, t) = u_x(L, t) = 0$. Now it is the sine that dies, because its derivative $\\cos$ fails to vanish at the origin, and the cosine survives:\n$$X_n(x) = \\cos\\left(\\frac{n \\pi x}{L}\\right), \\qquad n = 0, 1, 2, \\dots$$Same equation, same family of oscillations, opposite selection. Fixed-temperature ends give a sine basis; insulated ends give a cosine basis. A mix of conditions (one end fixed, one insulated) gives a basis of shifted oscillations that are part sine and part cosine. The general lesson: the boundary conditions choose which oscillations are admissible. We will carry the fixed-temperature (sine) case forward for simplicity, but everything that follows has a cosine twin.\nIn every case the admissible frequencies are discrete: a countable list $\\omega_n = n\\pi/L$, spaced $\\pi/L$ apart.\nThe Time Part The time equation $T'(t) = -\\lambda \\kappa\\, T(t)$ says the rate of change is proportional to the current value, which is solved by an exponential. The spatial side already fixed $\\lambda = \\omega_n^2$, so\n$$T_n(t) = T_n(0)\\, e^{-\\kappa \\omega_n^2 t}.$$Each spatial shape decays exponentially, and the decay rate is $\\kappa \\omega_n^2$. Higher frequency means faster decay, and the dependence is on the square of the frequency. This single expression states the diffusion behavior outright: sharp features, which are built from high-frequency shapes, decay fast; smooth features, built from low frequencies, persist.\nA single separated solution is therefore the chosen spatial shape times its decay, with the amplitude set aside for now,\n$$u_n(x, t) = \\sin\\left(\\frac{n \\pi x}{L}\\right) e^{-\\kappa \\omega_n^2 t}.$$In the insulated-end problem the sine here is a cosine and nothing else changes: the time factor depends only on $\\omega_n$, which is the same list of frequencies either way. Whatever the boundary conditions select for the spatial shape, that shape decays at rate $\\kappa\\omega_n^2$. This solution satisfies the heat equation and the boundary conditions. What it does not yet satisfy is the initial condition: at $t = 0$ it is a single sine, and our actual starting profile $f(x)$ is some arbitrary shape.\nFourier\u0026rsquo;s Bold Assumption A single sine cannot match an arbitrary profile, so Fourier proposed adding many: that with enough sines and the right amplitudes, the sum can match essentially any starting profile we would meet in practice. For the ice-bath rod we are carrying forward, that means\n$$f(x) = \\sum_{n=1}^{\\infty} b_n \\sin\\left(\\frac{n \\pi x}{L}\\right),$$and the insulated case is the same statement with cosines, $f(x) = \\tfrac{1}{2}a_0 + \\sum_{n=1}^{\\infty} a_n \\cos(n\\pi x/L)$. This expansion of a function into its sine and cosine components is the Fourier series.\nAt the time this was a genuinely audacious claim: that a function with corners, jumps, and flat stretches, the kind no single formula describes, could be reconstructed entirely from smooth oscillations.\nPart of the claim is straightforward. Adding solutions is allowed: if $u_1$ and $u_2$ each solve the heat equation, then because each derivative acts on the two pieces separately,\n$$\\frac{\\partial (u_1 + u_2)}{\\partial t} = \\frac{\\partial u_1}{\\partial t} + \\frac{\\partial u_2}{\\partial t} = \\kappa\\frac{\\partial^2 u_1}{\\partial x^2} + \\kappa\\frac{\\partial^2 u_2}{\\partial x^2} = \\kappa\\frac{\\partial^2 (u_1 + u_2)}{\\partial x^2},$$so the sum solves it too, as does any amplitude-weighted sum of sines. The hard part is the reach of the claim: that the sum can be made to equal an arbitrary $f(x)$ in the first place. Fourier asserted this but did not prove it; the first rigorous proof came decades later from Dirichlet, who gave sufficient conditions on the function for the series to converge.\nThose conditions are that the function has to be bounded, have at most finitely many jumps and corners, and a finite number of maxima and minima over the interval. A profile like $1/x$ near the origin, which runs off to infinity, cannot be represented. Physical temperature profiles satisfy all of this, so the conditions are no real restriction here.\nGranting the assumption, the problem reduces to one question: given $f(x)$, what are the coefficients $b_n$?\nExtracting a Coefficient Is a Projection Fourier\u0026rsquo;s contribution was the assumption that the decomposition exists; that the sines are orthogonal was already known before him. Orthogonality is what turns the assumption into a recipe, making each coefficient a clean projection.\nWe will borrow the analogy from vectors. Two vectors are independent when their dot product is zero. For $\\mathbf{u} = (u_1, u_2, \\dots, u_k)$ and $\\mathbf{v} = (v_1, v_2, \\dots, v_k)$ the dot product multiplies them component by component and adds, $\\mathbf{u}\\cdot\\mathbf{v} = \\sum_i u_i v_i$.\nA function is like a vector with a component at every point $x$ rather than finitely many, so the sum over components becomes an integral:\n$$\\langle g, h \\rangle = \\int_0^L g(x)\\, h(x)\\, dx.$$Apply it to two different sines and the integral is zero:\n$$\\int_0^L \\sin\\left(\\frac{n \\pi x}{L}\\right) \\sin\\left(\\frac{m \\pi x}{L}\\right) dx = \\begin{cases} 0 \u0026 n \\neq m \\\\ L/2 \u0026 n = m. \\end{cases}$$The sines are orthogonal in the same sense as perpendicular vectors, and a sine dotted with itself gives a nonzero length, here $L/2$.\nBeyond the integral, the orthogonality of two sines is visible in a picture.\nWhy two sine modes are perpendicular The product of sin(\u0026pi;x) and sin(2\u0026pi;x) is positive over some stretches and negative over others. The two areas are equal, so they cancel: the integral is zero.\nsin(\u0026pi;x) sin(2\u0026pi;x) product positive area negative area The product curve sits above the axis where the two modes share a sign and below where they differ. Summing it across the rod, the shaded regions above and below the axis exactly offset, which is what the integral evaluating to zero means.\nTake $\\sin(\\pi x/L)$ and $\\sin(2\\pi x/L)$. Their product is positive where both have the same sign and negative where they differ, and over the rod those regions cancel exactly: for every stretch where the product is positive there is a matching stretch where it is equally negative. The integral adds up the product, so the cancellation drives it to zero.\nThis orthogonality is what lets us recover a single coefficient. To get $b_m$, dot both sides of the decomposition with the sine we want: multiply by $\\sin(m\\pi x/L)$ and integrate over the rod.\n$$\\int_0^L f(x) \\sin\\left(\\frac{m \\pi x}{L}\\right) dx = \\sum_{n=1}^{\\infty} b_n \\int_0^L \\sin\\left(\\frac{n \\pi x}{L}\\right)\\sin\\left(\\frac{m \\pi x}{L}\\right) dx.$$Orthogonality collapses the entire sum on the right to its single $n = m$ term, which carries the factor $L/2$. Solving,\n$$b_m = \\frac{2}{L} \\int_0^L f(x) \\sin\\left(\\frac{m \\pi x}{L}\\right) dx.$$The coefficient is the projection of $f$ onto the $m$-th sine, divided by that sine\u0026rsquo;s self-overlap $L/2$. The cosine coefficients come out the same way, projecting onto cosines instead.\nSolving the Heat Equation: Bounded Case We now have all three parts: the sine shapes, their decay rates, and the coefficients that match the initial profile. Combining them, the solution is the decomposition with each term carrying its time factor:\n$$u(x, t) = \\sum_{n=1}^{\\infty} b_n \\sin\\left(\\frac{n \\pi x}{L}\\right) e^{-\\kappa (n\\pi/L)^2 t}.$$This is the ice-bath rod. The insulated rod has the same form with cosines, and a rod with mixed conditions carries both sines and cosines together; the boundary conditions decide which shapes appear, not whether the two kinds can share a solution.\nRead it left to right and the physics is in plain view. At $t = 0$ the exponentials are all $1$ and we recover $f(x)$. As time advances, every sine shrinks, and the high-$n$ sines shrink fastest because the rate goes as $n^2$. The profile loses its sharp features first and relaxes toward smoothness. The grid used by the finite difference method would approach this decay one time step at a time; the analytical form gives the entire time evolution in closed form, with each term\u0026rsquo;s decay rate $\\kappa(n\\pi/L)^2$ appearing explicitly in its exponent.\nA concrete profile makes this tangible. Take a rod sitting at a uniform temperature $u_0$, then clamped to ice at both ends at $t = 0$, so $f(x) = u_0$. The coefficients are a single elementary integral,\n$$b_n = \\frac{2}{L}\\int_0^L u_0 \\sin\\frac{n\\pi x}{L}\\,dx = \\frac{2u_0}{n\\pi}\\left(1 - \\cos n\\pi\\right),$$which is $4u_0/(n\\pi)$ for odd $n$ and zero for even $n$. The solution is then\n$$u(x,t) = \\frac{4u_0}{\\pi}\\sum_{n \\text{ odd}} \\frac{1}{n}\\sin\\frac{n\\pi x}{L}\\,e^{-\\kappa(n\\pi/L)^2 t}.$$ A uniform profile relaxing to a half-sine The flat start is a sum of sines (faint). The sharper ones decay fastest, so within a short time only the slowest sine survives and the rod settles into a single arch.\ntemperature u(x,t) sine components Time Drag to advance time. Each faint curve is one odd term of the series; the bold curve is their sum, the actual temperature. The sharper sines vanish almost at once, leaving the smooth slowest one.\nThe amplitudes fall off as $1/n$, so the $n=1$ sine is the largest from the start, and the $e^{-\\kappa(n\\pi/L)^2 t}$ factors then shrink the higher-$n$ sines far faster than it. Within a short time only the $n=1$ term carries any weight, and the rod relaxes toward a single smooth arch bowing down to the cold ends.\nWhat Happens When the Rod Grows Infinitely Long? Everything so far rests on the rod being finite. The ends at $0$ and $L$ are what forced the frequencies into a discrete list $\\omega_n = n\\pi/L$. But what happens if the rod grows infinitely long, with no boundary conditions at all? Lengthen it and the spacing $\\pi/L$ shrinks; the admissible sines crowd closer together. In the limit $L \\to \\infty$ the spacing goes to zero and the allowed frequencies fill in to a continuum: every $\\omega$, not just the integer multiples of $\\pi/L$, is now in play.\nTwo things change in that limit. The first is mechanical. With the frequencies packed infinitely densely, the sum over the discrete list $\\sum_n$ becomes an integral over the continuous variable $\\omega$, and the sequence of coefficients $b_n$, one number per allowed frequency, becomes a function $b(\\omega)$ defined for every real $\\omega$. A decomposition into countably many sines becomes a decomposition into a continuum of them:\n$$f(x) = \\int_0^{\\infty} b(\\omega)\\sin(\\omega x)\\,d\\omega.$$The second change is subtler. On the rod $[0,1]$, two near-equal frequencies are nearly the same direction: $\\int_0^1 \\sin(\\pi x)\\sin(1.0001\\pi x)\\,dx = 0.49997$, almost a sine\u0026rsquo;s full overlap with itself. Over the whole line, though, any two distinct frequencies are exactly orthogonal, the integral understood as the limit of the overlap over $[-L, L]$ as $L \\to \\infty$:\n$$\\int_{-\\infty}^{\\infty} \\sin(\\omega_1 x)\\sin(\\omega_2 x)\\,dx = 0 \\quad \\text{whenever } \\omega_1 \\neq \\omega_2.$$The intuition is that on the infinite interval, as long as $\\omega_1 \\neq \\omega_2$, the faster sine gradually slides relative to the slower one, spending exactly as much of the line moving in step with it as moving against it. The matched and opposed stretches are equal, so the overlap cancels to zero.\nExtracting the amount of a given frequency is still the same overlap integral as on the rod, now run over the whole line for every $\\omega$ rather than each integer $n$. This decomposition into a continuum of orthogonal oscillations is the Fourier transform. In sine and cosine form it reads\n$$f(x) = \\int_0^{\\infty} \\big[\\,a(\\omega)\\cos(\\omega x) + b(\\omega)\\sin(\\omega x)\\,\\big]\\,d\\omega,$$with the weights recovered by the same overlap as before, now integrated over the whole line:\n$$a(\\omega) = \\frac{1}{\\pi}\\int_{-\\infty}^{\\infty} f(x)\\cos(\\omega x)\\,dx, \\qquad b(\\omega) = \\frac{1}{\\pi}\\int_{-\\infty}^{\\infty} f(x)\\sin(\\omega x)\\,dx.$$ Solving the Heat Equation: Unbounded Case Earlier we solved the heat equation on a rod with fixed ends. Now take a rod with no boundaries at all and a concrete starting profile: a quantity of heat $Q$ deposited at the single point $x = 0$, a hot needle touched to the origin of an infinite rod, with the rest of the rod cold. Concentrating the heat at a point forces one shift in reading: $u(x,t)$ is now the heat density, the amount of heat per unit length, rather than the temperature.1 Written as an initial condition, all the heat sits at one point, $u(x, 0) = Q\\,\\delta(x)$, where $\\delta$ is the spike at the origin that integrates to one, so the total heat on the rod is $Q$. We never need a formula for $\\delta$ itself, only the rule that defines it: integrated against any function it returns that function\u0026rsquo;s value at the origin,\n$$\\int_{-\\infty}^{\\infty} \\delta(x)\\,g(x)\\,dx = g(0).$$The plan is the same three steps as the bounded case: decompose the initial profile into frequencies, let each frequency decay at its own rate, reassemble.\nDecompose first. The coefficients come from the same projection as on the rod, now run over the whole line. Projecting the point source onto each frequency, the delta collapses the integral to the integrand\u0026rsquo;s value at the origin:\n$$a(\\omega) = \\frac{1}{\\pi}\\int_{-\\infty}^{\\infty} Q\\,\\delta(x)\\cos(\\omega x)\\,dx = \\frac{Q}{\\pi}\\cos(0) = \\frac{Q}{\\pi}, \\qquad b(\\omega) = \\frac{1}{\\pi}\\int_{-\\infty}^{\\infty} Q\\,\\delta(x)\\sin(\\omega x)\\,dx = \\frac{Q}{\\pi}\\sin(0) = 0.$$The cosine weight is the same constant at every $\\omega$, so the spike projects equally onto all frequencies, and the sine weight is zero everywhere. The profile is built from cosines alone. Each frequency decays as $e^{-\\kappa\\omega^2 t}$, falling off from $\\omega = 0$ as the high frequencies die away fastest. Reassemble by integrating these decayed cosines back into a profile in $x$:\n$$u(x,t) = \\frac{Q}{\\pi}\\int_0^{\\infty} e^{-\\kappa\\omega^2 t}\\cos(\\omega x)\\,d\\omega.$$This is a Gaussian integral, and it evaluates in closed form to2\n$$u(x,t) = \\frac{Q}{2\\sqrt{\\pi\\kappa t}}\\,e^{-x^2/(4\\kappa t)}.$$This bell curve is exactly a normal distribution, scaled by the total heat $Q$. Reading its exponent $-x^2/(4\\kappa t)$ against the normal\u0026rsquo;s $-x^2/(2\\sigma^2)$ sets the variance at $\\sigma^2 = 2\\kappa t$, growing linearly in time. So the whole evolution collapses to one picture: we begin with all the heat at a single point, the zero-variance limit of a normal, and it spreads as a normal whose variance widens as $2\\kappa t$. At small $t$ the curve is tall and narrow, almost all the heat still near $x = 0$; as $t$ grows the variance grows with it, spreading the heat wider while the area underneath, the total heat $Q$, stays fixed.\nHeat spreading from a point All the heat starts at one spot. As time passes it spreads outward as a Gaussian, widening and flattening while the total heat stays fixed.\nTime Drag to advance time. The faint curve marks the earliest profile. The peak falls as the curve broadens; the area underneath, the total heat, never changes.\nThis spreading Gaussian is the heat kernel, the exact closed form for a point source on the infinite rod. Had the heat been deposited at $x_0$ rather than the origin, the same spreading would produce the identical Gaussian centered there, $u(x,t) = \\frac{Q}{\\sqrt{4\\pi\\kappa t}}\\,e^{-(x-x_0)^2/(4\\kappa t)}$, since a uniform rod diffuses the same way wherever the source sits.\nThe point source was a special initial condition, but it unlocks the general solution. Because the heat equation is linear, a general starting profile $f(x)$ can be read as a sum of point sources, one at each location $y$ carrying heat $f(y)$, and the response to the whole is the sum of the responses to each. Every source at $y$ spreads into a kernel centered on $y$, so the solution is the initial profile weighted against the kernel at each point,\n$$u(x,t) = \\int_{-\\infty}^{\\infty} f(y)\\,\\frac{1}{\\sqrt{4\\pi\\kappa t}}\\,e^{-(x-y)^2/(4\\kappa t)}\\,dy.$$This integral is the general solution on the infinite rod: the heat at each point is the initial profile averaged against a Gaussian of variance $2\\kappa t$ centered there, the nearby heat counting most. The point source is the case $f = Q\\delta$, where the integral collapses back to the single kernel above.\nThe Fourier Transform in Complex Exponentials So far we have built the transform from sine and cosine, the way Fourier originally worked, but the modern representation is almost always written with the complex exponential $e^{i\\omega x}$ instead. The two are equivalent by Euler\u0026rsquo;s formula,\n$$e^{i\\omega x} = \\cos(\\omega x) + i\\sin(\\omega x),$$which rearranges to give the sine and cosine back as exponentials,\n$$\\cos(\\omega x) = \\frac{e^{i\\omega x} + e^{-i\\omega x}}{2}, \\qquad \\sin(\\omega x) = \\frac{e^{i\\omega x} - e^{-i\\omega x}}{2i}.$$Therefore\n$$a(\\omega)\\cos(\\omega x) + b(\\omega)\\sin(\\omega x) = c(\\omega)\\,e^{i\\omega x} + c(-\\omega)\\,e^{-i\\omega x},$$where $c(\\omega) = \\tfrac{1}{2}\\big(a(\\omega) - i\\,b(\\omega)\\big)$ and its partner is the complex conjugate, $c(-\\omega) = \\tfrac{1}{2}\\big(a(\\omega) + i\\,b(\\omega)\\big) = \\overline{c(\\omega)}$ for real $f$.\nThe sine-and-cosine decomposition and the complex-exponential decomposition therefore hold the same information. In exponential form the transform and its inverse read\n$$c(\\omega) = \\int_{-\\infty}^{\\infty} f(x)\\, e^{-i\\omega x}\\, dx, \\qquad f(x) = \\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty} c(\\omega)\\, e^{i\\omega x}\\, d\\omega,$$where the complex weight $c(\\omega)$ carries both the amplitude and the phase of frequency $\\omega$. A single frequency is one oscillation described by two numbers, how large it is and where its peaks sit along the axis; a sine and a cosine of the same $\\omega$ are this one oscillation in two shifted positions. The complex $c(\\omega)$ records both: its size and its angle in the plane are the two numbers that pin down the oscillation.\nTwo things make this form worth adopting. The first is bookkeeping: sines and cosines are tracked together as one complex exponential, instead of as two separate real families. The second matters more but is not obvious at this point: the complex exponential is the language of the characteristic function, $\\phi(\\omega) = \\mathbb{E}[e^{i\\omega X}]$, which we will draw on in the discussion of option pricing that follows.\nOption Pricing Black-Scholes The Black-Scholes equation is the heat equation in disguise. A change of variables turns it into exactly the diffusion we just solved, so the same machinery prices an option. We carry this out in full in the appendix, reducing the equation to the heat equation and recovering the Black-Scholes formula.\nThis is included to illustrate the technique on a familiar case, not because the transform is needed here: the Black-Scholes density is known in closed form, lognormal, so the pricing integral can be done directly. Nor is this how Fourier methods are used in practice. Their real value appears when no such closed-form density is available, such as in the Heston stochastic volatility model below.\nHeston A price is the discounted expected payoff, the payoff $g$ integrated against the density $q$ of the log price $y = \\ln S_T$,\n$$V = e^{-rT}\\int_{-\\infty}^{\\infty} g(y)\\, q(y)\\, dy.$$Stochastic volatility models, Heston among them, leave $q$ with no closed form, so this integral is stuck.\nThe Fourier transform offers a way around this. Write the payoff as its own inverse transform, $g(y) = \\frac{1}{2\\pi}\\int \\hat g(\\omega)\\, e^{i\\omega y}\\, d\\omega$, and exchange the order of integration. The density now meets the exponential, and that pairing is the characteristic function,\n$$\\int_{-\\infty}^{\\infty} e^{i\\omega y}\\, q(y)\\, dy = \\mathbb{E}\\!\\left[e^{i\\omega y}\\right] = \\phi(\\omega),$$the Fourier transform of the density. So the price turns into an integral against $\\phi$ rather than $q$,\n$$V = \\frac{e^{-rT}}{2\\pi}\\int_{-\\infty}^{\\infty} \\hat g(\\omega)\\,\\phi(\\omega)\\, d\\omega.$$Heston withholds the density but provides $\\phi$ in closed form, so this last integral is computable and the density is never reconstructed. The Heston characteristic function and the mechanics of this inversion are the subject of a future article on stochastic volatility.\nReflection Working through this, there are three things I find really fascinating.\nThe first is Fourier\u0026rsquo;s original leap. To take an arbitrary profile, a temperature with a corner in it, a pulse with outright jumps, and assert that it is the sum of infinitely many smooth, endless sine and cosine waves is not an obvious thing to believe. His contemporaries did not believe it; the claim that discontinuous shapes could be assembled from perfectly smooth ones met skepticism from mathematicians as formidable as Lagrange. That Fourier saw it anyway, and was essentially right, is a conviction I find hard to reconstruct after the fact. How he came to see a function as a spectrum is the part I cannot quite explain, only admire.\nThe second is what happens to orthogonality in the limit. On the finite rod, two waves of unrelated frequency are not orthogonal: their overlap over the interval is some bounded number that doesn\u0026rsquo;t vanish. Yet let the rod grow without bound and that overlap collapses to exactly zero, and distinct frequencies become perfectly orthogonal. Passing to a limit can lead to a new property, qualitatively different from the finite case.\nThe third is the one I keep returning to. The general solution of the heat equation on the infinite line, which we built by decomposing, evolving, and reassembling, can also be written with no Fourier machinery at all, as the expected value of the initial profile sampled at the endpoint of a diffusing particle. Two pictures that share no obvious vocabulary, a deterministic field smoothing itself and a single random walker wandering, produce the identical formula, because the heat kernel is both the spreading weight of diffusion and the transition density of Brownian motion. Feynman-Kac is the statement that these are the same object, and it is what lets an option price be read either as a PDE to solve or an expectation to take. That a hard analytic problem can always be recast as an average over random paths still strikes me as quietly powerful.\nA last point on where this method applies, since it is not universal. The transform needs three things: a linear equation, constant coefficients, and an unbounded domain. The heat equation has all three, which is why it yielded cleanly; Black-Scholes gains them once the log substitution makes its coefficients constant. Many real pricing problems do not, and there the clean decomposition into independent frequencies breaks down. That is where the finite difference method takes over, handling what the transform cannot at the price of approximating rather than solving in closed form.\nAppendix: Solving Black-Scholes with the Fourier Transform The value $V(S,t)$ of a European call satisfies\n$$\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 V}{\\partial S^2} + rS\\frac{\\partial V}{\\partial S} - rV = 0,$$with the terminal condition that at expiry the option is worth its payoff, $V(S,T) = \\max(S - K, 0)$.\nWe apply the following transformations to reduce it to the heat equation.\nLog price, $x = \\ln S$. This makes the variable coefficients constant: the powers of $S$ in $\\tfrac{1}{2}\\sigma^2 S^2$ and $rS$ cancel the powers produced when a derivative hits a log, since $S\\,\\partial_S = \\partial_x$ and $S^2\\partial_{SS} = \\partial_{xx} - \\partial_x$. Backward time, $\\tau = T - t$. An option is anchored by its payoff at expiry, while the heat equation runs from an initial condition, so we reverse time and let $\\tau$ measure time to expiry. This flips the sign, $\\partial_t = -\\partial_\\tau$. Exponential factor, $V = e^{ax + b\\tau}\\,u$. After the first two steps the equation still carries a first-derivative (drift) term and the $-rV$ term. This factor clears both: the spatial part $e^{ax}$ removes the drift, and the time part $e^{b\\tau}$ removes the discount term together with the constant that killing the drift leaves behind, so $b$ is more than the bare rate. The required values are $$a = -\\frac{r - \\tfrac{1}{2}\\sigma^2}{\\sigma^2}, \\qquad b = -r - \\frac{(r - \\tfrac{1}{2}\\sigma^2)^2}{2\\sigma^2}.$$With these the equation becomes exactly the diffusion we solved on the infinite rod,\n$$\\frac{\\partial u}{\\partial \\tau} = \\frac{1}{2}\\sigma^2 \\frac{\\partial^2 u}{\\partial x^2}, \\qquad \\kappa = \\tfrac{1}{2}\\sigma^2.$$At $\\tau = 0$ the factor is $e^{ax}$ and $V(S,T)$ is the payoff, so the initial condition is $u(x, 0) = e^{-ax}\\max(e^{x} - K, 0)$.\nThis is the unbounded heat equation, so its solution is the initial profile convolved with the heat kernel of variance $2\\kappa\\tau = \\sigma^2\\tau$, the Gaussian we assembled frequency by frequency in the body, centered at $x$,\n$$u(x,\\tau) = \\int_{-\\infty}^{\\infty} e^{-ay}\\max(e^{y} - K, 0)\\,\\frac{1}{\\sqrt{2\\pi\\sigma^2\\tau}}\\,e^{-(y - x)^2/(2\\sigma^2\\tau)}\\, dy.$$The payoff vanishes below $y = \\ln K$, so the lower limit becomes $\\ln K$ and the integrand splits into two exponential-times-Gaussian pieces,\n$$u(x,\\tau) = \\int_{\\ln K}^{\\infty} \\big[\\,e^{(1-a)y} - K e^{-ay}\\,\\big]\\,G(y)\\,dy, \\qquad G(y) = \\frac{1}{\\sqrt{2\\pi\\sigma^2\\tau}}\\,e^{-(y - x)^2/(2\\sigma^2\\tau)}.$$Each piece has the form $\\int_{\\ln K}^{\\infty} e^{cy}\\, G(y)\\,dy$, which completing the square evaluates as\n$$\\int_{\\ln K}^{\\infty} e^{cy}\\, G(y)\\,dy = e^{\\,cx + \\frac{1}{2}c^2\\sigma^2\\tau}\\,N\\!\\left(\\frac{x - \\ln K + c\\sigma^2\\tau}{\\sigma\\sqrt\\tau}\\right),$$applied with $c = 1 - a$ and $c = -a$. The two arguments simplify, using $(1-a)\\sigma^2 = r + \\tfrac{1}{2}\\sigma^2$ and $-a\\sigma^2 = r - \\tfrac{1}{2}\\sigma^2$, to exactly\n$$d_{1,2} = \\frac{\\ln(S/K) + \\big(r \\pm \\tfrac{1}{2}\\sigma^2\\big)\\tau}{\\sigma\\sqrt\\tau}.$$Restoring $V = e^{ax + b\\tau}u$, the chosen $a, b$ make the exponential constants in front of each term simplify, since $b + \\tfrac{1}{2}(1-a)^2\\sigma^2 = 0$ and $b + \\tfrac{1}{2}a^2\\sigma^2 = -r$, leaving the two terms as $S\\,N(d_1)$ and $K e^{-r\\tau} N(d_2)$. With $\\tau = T - t$,\n$$V(S,t) = S\\,N(d_1) - K e^{-r(T-t)}\\,N(d_2).$$The Black-Scholes call price, recovered by reducing the equation to the heat equation and averaging the payoff against its kernel.\nAll the heat sits at a single point of zero width. Reading $u$ as a temperature would put finite heat into zero width, which is infinite at that point. As a density it is fine: an infinitely tall spike whose integral, the total heat, is still the finite $Q$.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe identity $\\int_0^\\infty e^{-a\\omega^2}\\cos(b\\omega)\\,d\\omega = \\tfrac{1}{2}\\sqrt{\\pi/a}\\,e^{-b^2/(4a)}$ can be derived by differentiation under the integral sign (Feynman\u0026rsquo;s trick): differentiating in $b$ and integrating by parts turns the integral into a first-order differential equation in $b$, which solves to the Gaussian above.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/fourier_transform/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eThe previous article on \u003ca href=\"/articles/quant-foundations/fdm/\"\u003efinite difference methods\u003c/a\u003e solved the heat equation by brute force: lay down a grid, step it through time, let the solution emerge step by step. It works, and for many pricing problems it is the practical choice. But can we instead solve the equation analytically?\u003c/p\u003e\n\u003cp\u003eFor a class of PDEs, the heat equation among them, we can. The idea is to stop viewing a function as a shape over space and instead see it as a combination of frequencies. That change of view is the Fourier transform. In this article, we explore it on the heat equation, where the diffusion structure shows through without the variable coefficients of Black-Scholes to clutter it, and then turn to its application in option pricing, where it works even when the distribution of prices has no closed form.\u003c/p\u003e","title":"Fourier Transform: The Leap into Frequency"},{"content":"Why This Matters A derivative price can be computed two equivalent ways: as a risk-neutral expectation, or as the solution of a PDE. This is the Feynman-Kac result, which I explored in the earlier article. Monte Carlo is the natural way to handle the expectation, and the previous article worked through techniques for making it more efficient. Here I want to look at the other side, where the price is the solution of a PDE and we solve it on a grid.\nWhat drew me to the PDE approach is how much falls out of that single grid solve. The price at any spot is just a point on the grid, so one solve covers a whole range of scenarios at once. The Greeks are numerical derivatives of the same grid, so they come almost for free. Early exercise, as in an American option, and knock-out barriers both slot in naturally, the first as a comparison at each grid point and the second as a boundary condition. Monte Carlo can do all of these too, but on the PDE side they feel less like add-ons and more like things the method was already set up to handle.\nThe workhorse for solving these PDEs numerically is the finite difference method, and that is what the rest of the article is about. Rather than start with the Black-Scholes PDE directly, I will build the method up on the heat equation from physics. The two share the same diffusion structure, but the heat equation has constant coefficients where Black-Scholes has variable ones, so it shows the numerical ideas without the algebraic clutter. Everything we develop carries over to Black-Scholes with those extra terms added back in.\nThe Heat Equation Picture a thin metal rod. Heat a spot on it, then leave it alone. The warm spot cools and spreads, and the rod drifts toward a uniform temperature. We want an equation for how the temperature $u(x, t)$ at position $x$ and time $t$ evolves.\nRather than reach for a known equation, let us ask what properties it must have. Heat conduction is local: a point exchanges heat only with its immediate neighbours, not with the rod as a whole. So the rate of change $\\partial_t u$ at a point should be driven by how that point compares to its neighbours on each side.\nConsider a small segment around a point. If the left neighbour is one degree cooler and the right one degree warmer, the point loses heat to the left at the same rate it gains from the right. The flows balance and the temperature holds steady, despite the clear slope through the point. So the slope $\\partial_x u$ is not what drives the change; a constant slope produces balanced flow and no change at all. What drives the change is an imbalance between the two sides, the slope being steeper on one side than the other. That imbalance is the curvature $\\partial_{xx} u$, and it gives the heat equation:\n$$ \\partial_t u = \\kappa\\, \\partial_{xx} u, \\qquad \\kappa \u003e 0, $$with $\\kappa$ a positive constant, set by the material, controlling how fast heat spreads. The sign of the curvature sets the direction: a point that rises above its neighbours has negative curvature in $x$ and cools, a dip below them has positive curvature and warms, a straight run holds still.\nThe equation governs how each interior point evolves, but it says nothing about the two ends of the rod, which have a neighbour on only one side. On its own it does not determine a unique solution; we also have to say what happens at the ends. The simplest choice is to hold each end at a fixed temperature: imagine the ends sitting in an ice bath, pinned at zero while the interior evolves. This is a boundary condition, and the equation needs one at each end before the problem is complete. It is why every curve in the figure below is tied down to zero at the two edges.\n0.000.250.500.751.00position along rodtemperature t = 0.004t = 0.01t = 0.02t = 0.04t = 0.08t = 0.15 The heat starts concentrated at a single point in the middle of the rod. The figure shows the profile shortly after, spreading outward into a bump that widens and flattens over time, conserving total heat while smoothing it out. This diffusion is the behaviour we will be approximating numerically.\nDiscretising the Equation To solve the heat equation numerically we replace the continuous derivatives with finite differences on a grid. Space is divided into points $x_0, x_1, \\ldots, x_M$ spaced $\\Delta x$ apart, and time into levels $t_0, t_1, \\ldots, t_N$ spaced $\\Delta t$ apart. We write $u_i^n$ for the approximation to $u(x_i, t_n)$.\nFor the time derivative, the natural finite difference uses the values one step apart in time:\n$$ \\partial_t u \\approx \\frac{u_i^{n+1} - u_i^n}{\\Delta t}. $$This is the left side of the equation.\nThe right side is the spatial curvature $\\kappa\\, \\partial_{xx} u$, with the standard central-difference approximation:\n$$ \\partial_{xx} u(x_i) \\approx \\frac{u_{i-1} - 2 u_i + u_{i+1}}{(\\Delta x)^2}. $$This is the discrete form of the curvature: it measures how far $u_i$ sits above or below the average of its two neighbours, the same quantity that drove the heat equation.\nThe equation sets the time difference equal to this curvature, but it leaves one thing open: at which time level do we measure the curvature? At $t_n$, where we already know all the values, or at $t_{n+1}$, where we do not? Nothing so far decides it. We have to choose, and that single choice produces two schemes with very different character.\nThe Explicit Scheme The natural choice is to enforce the equation at $t_n$, the time level we already know. The spatial derivative is computed from known values, and the update reads:\n$$ \\frac{u_i^{n+1} - u_i^n}{\\Delta t} = \\kappa\\, \\frac{u_{i-1}^n - 2 u_i^n + u_{i+1}^n}{(\\Delta x)^2}. $$Solving for the one unknown gives a direct formula:\n$$ u_i^{n+1} = u_i^n + \\lambda\\,(u_{i-1}^n - 2 u_i^n + u_{i+1}^n), \\qquad \\lambda = \\frac{\\kappa\\, \\Delta t}{(\\Delta x)^2}. $$Everything on the right is known, so we read off the new value at each grid point directly. Rewriting slightly shows what the update is doing:\n$$ u_i^{n+1} = \\lambda\\, u_{i-1}^n + (1 - 2\\lambda)\\, u_i^n + \\lambda\\, u_{i+1}^n. $$The value at the next time step is a weighted combination of the current value at that point and its two neighbours. Each point is updated independently, which makes the scheme cheap and easy to implement.\nExplicittime tspace xtₙ₋₁tₙtₙ₊₁uᵢⁿ⁺¹ The stencil shows what the update reaches for: the new value at $t_{n+1}$ is built from the three known values in the previous time column at $t_n$.\nThis update applies only to the interior points. The two end points have a neighbour on just one side, so the three-point stencil would run off the grid. We do not compute them from the update at all: the boundary condition fixes them directly, setting the edge values to zero at every step. The first interior point still reaches for its outer neighbour, but that neighbour is now the known boundary value, so its update is well defined.\nWhen the Explicit Scheme Breaks The explicit scheme works well when the time step is small, but the simplicity hides a trap. Recall that the update writes the next value as a combination of three current values, with weights $\\lambda$, $1 - 2\\lambda$, $\\lambda$ that always sum to one. As long as all three are non-negative, which holds when $\\lambda \\leq 1/2$, the next value is a weighted average of the three, so it lands somewhere between the coolest and warmest of them. A hot point cools toward its neighbours, never dropping below them in a single step, which is how heat should behave. Starting from a single point at temperature 1 between two neighbours at 0, with $\\lambda = 1/3$, the heat simply spreads and fades, every value staying between 0 and 1.\nStart with 0, 1, 0 and set λ = 1/30001000000.330.330.330000.110.220.330.220.11000.110.220.260.220.11000.110.200.230.200.110time step →space (along rod)01234 Once $\\lambda \u003e 1/2$, the central weight turns negative and the update is no longer an average. Now a point can be pushed past its neighbours instead of settling between them. Let us look at the same starting point with $\\lambda = 1$, and watch what the update does step after step.\nStart with 0, 1, 0 and set λ = 10001000001-110001-23-2100-36-76-3009-1619-1690time step →space (along rod)01234 Follow the centre row. The point starts at 1, and a single step sends it to -1, already colder than the neighbours it was meant to settle toward. The next step overshoots it back to 3, then to -7, then to 19. Each step flips the sign and grows the magnitude, in space and in time, producing the intensifying checkerboard above. Left to run, it swamps the real solution within a few steps.\nThe condition for the weights to stay non-negative is\n$$ \\lambda = \\frac{\\kappa\\, \\Delta t}{(\\Delta x)^2} \\leq \\frac{1}{2}, $$which caps the time step at $\\Delta t \\leq (\\Delta x)^2 / (2\\kappa)$. The cost of this is steep. The maximum step shrinks with the square of the spatial spacing, so using ten times as many spatial points forces the time step to shrink by a factor of a hundred. Ten times the spatial points and a hundred times the time steps is a thousandfold increase in total work, all to refine the spatial grid tenfold. To stay stable on a fine grid, the explicit scheme is forced into a very large number of small steps.\nThe Implicit Scheme Given the stability constraint of the explicit scheme, can we find an alternative that does not suffer from it? The explicit scheme derives the future value directly from the present: it reads the current curvature and steps forward, and when the step is too large, that forward step overshoots. Instead of deriving the future value, what if we constrain it, requiring the future value to satisfy the equation? We enforce the equation at $t_{n+1}$, evaluating the spatial derivative at the level we are solving for:\n$$ \\frac{u_i^{n+1} - u_i^n}{\\Delta t} = \\kappa\\, \\frac{u_{i-1}^{n+1} - 2 u_i^{n+1} + u_{i+1}^{n+1}}{(\\Delta x)^2}. $$Now the right-hand side contains three unknowns, so we cannot solve for $u_i^{n+1}$ on its own. Collecting the unknowns on one side:\n$$ -\\lambda\\, u_{i-1}^{n+1} + (1 + 2\\lambda)\\, u_i^{n+1} - \\lambda\\, u_{i+1}^{n+1} = u_i^n. $$ Implicittime tspace xtₙ₋₁tₙtₙ₊₁uᵢⁿ⁺¹ The stencil shows why: the new value connects to its spatial neighbours in the same time column at $t_{n+1}$, which are themselves unknown, rather than only to known values at $t_n$.\nThere is one such equation at every interior grid point, giving a linear system in all the unknowns at $t_{n+1}$. In matrix form, with $\\mathbf{u}^n$ the vector of grid values,\n$$ A\\, \\mathbf{u}^{n+1} = \\mathbf{u}^n, $$where $A$ is tridiagonal: $1 + 2\\lambda$ on the diagonal, $-\\lambda$ on the two off-diagonals. Each step requires solving this system rather than reading off an answer. We solve it directly with a standard tridiagonal algorithm rather than forming the inverse $A^{-1}$: the inverse is dense, and building it would discard the structure that makes the problem cheap. A general dense system of $N$ unknowns costs on the order of $N^3$ operations to solve, but the tridiagonal system, being almost all zeros, can be solved in work proportional to $N$, nearly as cheap as a single sweep over the grid.\nThe payoff is that the implicit scheme is stable for any time step. There is no $\\lambda \\leq 1/2$ constraint, so the step is no longer limited by stability. Each step costs more than an explicit step, but the freedom to take large steps wins comfortably on the fine grids where the explicit scheme would be forced into a crawl.\nWhy the Implicit Scheme Cannot Blow Up The explicit update cools a hot point by an amount that grows with the step size. When the step is large it removes more than the whole gap to the neighbours, so the point overshoots past them. Subtraction can remove too much.\nThe implicit update does not work by subtraction. Isolating the centre term in its equation,\n$$ (1 + 2\\lambda)\\, u_i^{n+1} = u_i^n + \\lambda\\left(u_{i-1}^{n+1} + u_{i+1}^{n+1}\\right), $$the new value carries a factor of $1 + 2\\lambda$, so finding it means dividing by that factor. The denominator is always greater than one, no matter how large $\\lambda$ grows, so the division can only shrink a value toward zero, never flip its sign or amplify it. A sharp feature is divided down harder as the step grows, which is what diffusion should do to it. No step size turns the division into an overshoot.\nThe same fact has a physical reading, and it comes down to which state the equation is enforced on. The implicit scheme enforces the equation on the new state, so the new values are required to satisfy it. This acts as a check: if the solver tried to overshoot a hot point into a dip below its neighbours, that dip would have positive curvature, and the equation says a point with positive curvature warms rather than sits cold. The overshoot would violate the equation the new state must obey, so it cannot be the solution. The explicit scheme enforces the equation only on the old, known state and computes the new value from it directly, with nothing constraining the new value itself. That is the difference: implicit holds the new state to the equation, explicit does not.\nReflection: Marching Forward or Solving Together Looking past the formulas, the two schemes hold different views of how the system evolves.\nExplicit: the known state generates the next one. For the heat equation, where we solve forward from the present, this is literally the present generating the future. Implicit: the step is a constraint. The new state is the one that satisfies the equation at every point at once, reached by solving rather than generating. Both converge to the same continuous solution as the grid is refined, since the gap between $t_n$ and $t_{n+1}$ vanishes and the choice of where to enforce the equation stops mattering. What differs is the stance: marching forward, or solving a constraint. That stance is what made one scheme able to overshoot and the other not.\nCrank-Nicolson So far we have only considered stability. The other concern is accuracy: how close the answer is to the true solution. A Taylor expansion sizes the error. For the time difference, expanding $u^{n+1}$ about $t_n$,\n$$ \\frac{u^{n+1} - u^n}{\\Delta t} = \\partial_t u + \\frac{\\Delta t}{2}\\, \\partial_{tt} u + \\cdots, $$so the error is proportional to $\\Delta t$; the same expansion on the spatial neighbours gives a central difference with error proportional to $(\\Delta x)^2$. Both vanish as the grid is refined. The time error is only first-order, and once we take the large steps the implicit scheme allows, it dominates: driving it down forces us back into many small steps, the very thing going implicit was meant to avoid.\nSo how can we make the scheme converge faster in time? Let us borrow the idea behind computing a delta numerically. To estimate the slope of an option value in the underlying, a one-sided difference $\\frac{f(x+h) - f(x)}{h}$ is first-order, but a centred difference $\\frac{f(x+h) - f(x-h)}{2h}$ is second-order, which is why the centred form is the usual choice: its errors on the two sides cancel. The lesson is that a central estimate beats a one-sided one. Explicit and implicit are both one-sided in time, each evaluating the spatial term at one end of the step, which is what makes their time error first-order.\nCrank-Nicolson applies the same fix in time: it evaluates the spatial term at both ends of the step and averages them, centring it at the midpoint.\n$$ \\frac{u_i^{n+1} - u_i^n}{\\Delta t} = \\frac{1}{2}\\Big( \\underbrace{\\kappa\\, \\tfrac{\\delta^2 u_i^n}{(\\Delta x)^2}}_{\\text{explicit}} + \\underbrace{\\kappa\\, \\tfrac{\\delta^2 u_i^{n+1}}{(\\Delta x)^2}}_{\\text{implicit}} \\Big), \\qquad \\delta^2 u_i = u_{i-1} - 2u_i + u_{i+1}. $$ Crank-Nicolsontime tspace xtₙ₋₁tₙtₙ₊₁uᵢⁿ⁺¹ The stencil reaches into both time levels: the three points at $t_n$ that the explicit scheme uses, and the coupled points at $t_{n+1}$ that the implicit scheme uses. Centring this way makes the time error second-order, so a given accuracy needs far fewer steps. The unknowns at $t_{n+1}$ are still coupled, so each step is a tridiagonal solve, and the scheme keeps the implicit scheme\u0026rsquo;s unconditional stability.\nFrom the Heat Equation to Black-Scholes The same three schemes carry over to the Black-Scholes PDE,\n$$ \\partial_t V + \\tfrac{1}{2}\\sigma^2 S^2\\, \\partial_{SS} V + r S\\, \\partial_S V - r V = 0. $$It has the same diffusion structure as the heat equation, but not in pure form. The drift term $rS\\,\\partial_S V$ and the state-dependent diffusion $\\tfrac{1}{2}\\sigma^2 S^2$ break the symmetry of the heat equation\u0026rsquo;s clean diffusion: the coefficient $\\tfrac{1}{2}\\sigma^2 S_i^2$ now varies from point to point rather than being constant, and the first-derivative drift adds a directional push that the heat equation does not have. Neither changes how the schemes are built; both just enter the grid as additional terms.\nAs before, the choice of scheme is a choice of which level to evaluate the PDE terms at. Explicit reads the second derivative, the first derivative, and the discount term $rV$ all at the known level, so each unknown value is computed directly. Implicit takes all three at the unknown level, which couples the unknowns into a tridiagonal system. The coefficient $S_i$ is the same either way; only the level of the $V$ values changes. Explicit, implicit, and Crank-Nicolson all carry over, with only the matrix entries differing.\nOne thing does change when we move from the heat equation to Black-Scholes. Pricing is a terminal-value problem: we know the payoff at expiry and solve backward toward today, so the computation marches backward in time. This sharpens what the explicit scheme is really doing. It does not march forward in time; it marches from the known level to the unknown one. For the heat equation the known level is the present, so the two look the same. For Black-Scholes the known level is at expiry, so the march runs backward. Time direction is incidental; known to unknown is the point.\nTwo examples show what the PDE approach buys us, each handled naturally on the grid: the early-exercise decision of an American option, and the knock-out level of a barrier option.\nEarly Exercise: American Options An American option can be exercised at any time before expiry, which adds a decision at every point: hold the option, or exercise it now. The PDE handles this naturally because it already solves backward in time.\nAt each time level, the backward solve produces the continuation value, what the option is worth if held, computed from the same finite difference machinery as the European case. Write it $V_i^{\\text{cont}}$. Against this we compare the immediate exercise value, the payoff from exercising now: for an American put with strike $K$, that is $\\max(K - S_i, 0)$. The option value at the node is whichever is larger:\n$$ V_i^n = \\max\\left(\\,V_i^{\\text{cont}},\\ \\max(K - S_i, 0)\\,\\right). $$So $V_i^{\\text{cont}}$, the continuation value, is what the PDE step alone gives, the same value a European option would have. The American value $V_i^n$ is that with the early-exercise choice layered on top. The distinction matters for the next step: it is $V_i^n$, the post-maximum value, that is carried backward into the following solve, so the right to exercise early at a later node feeds into the value at earlier ones.\nThis comparison is applied pointwise at every grid point and time level. The grid already holds the value at every spot and time, so the question \u0026ldquo;is it worth exercising here and now\u0026rdquo; can be asked everywhere at once. The boundary between the hold region and the exercise region falls out of the solve, with no special treatment beyond the pointwise maximum.\nPath Dependence: Barrier Options A down-and-out call is worthless if the underlying ever falls to the barrier $B$ before expiry. Take the barrier to be continuously monitored, live at all times. On the grid this is a boundary condition: the value is held at zero at every point at or below the barrier, at every time level.\n$$ V(S, t) = 0 \\quad \\text{for } S \\leq B. $$This is the ice bath again. Just as the rod\u0026rsquo;s ends were held at zero and the chill spread inward, the value is held at zero along the barrier and the backward solve carries that zero into the living region above it. A spot just above the barrier has a neighbour pinned at zero, which drags its value down, and that influence spreads further up the grid at each step. By the time the solve reaches today, the value at a spot well above the barrier has already been reduced by the chance of drifting down and knocking out, without any path ever being traced.\nThis is where the grid picture and the path picture part ways most sharply. In Monte Carlo the barrier is an event in time: a trajectory crosses $B$ and the option dies. The PDE has no trajectories, and it cannot, since the equation relates each point only to its neighbours, leaving nowhere for a path\u0026rsquo;s past to live. Instead the barrier becomes geometry: the edge of the region the equation is solved on. The question \u0026ldquo;did the path cross the barrier\u0026rdquo; is replaced by \u0026ldquo;where is the edge of the domain,\u0026rdquo; and the backward solve prices the knockout by letting that zero-edge bleed inward.\nThis also avoids the monitoring bias that troubles Monte Carlo, where the barrier is checked only at discrete sample times and a path can dip below and back between checks, the same bias the Monte Carlo article had to correct for on this very option. The grid has no unobserved trajectory between steps; the barrier condition applies across the whole domain at once. The PDE still carries its own time-discretisation error, but not this bias. One practical point: the barrier should land on a grid line. If $B$ falls between two spatial points, the effective barrier is misplaced by up to a grid spacing, so the $S$ grid is usually aligned to put a node exactly on $B$.\nThis clean treatment is for a continuously-monitored barrier. A discretely-monitored barrier, checked only at set dates, is handled by imposing the zero condition only on those dates and letting the region below $B$ stay alive in between, which is closer to what Monte Carlo does and loses some of the tidiness.\nConclusion Explicit and implicit schemes solve the same equation in two ways: marching forward off the values we know, or solving for a level jointly consistent with them. The first is simple but carries a stability limit; the second costs a linear solve per step and removes the stability constraint. Crank-Nicolson sits between them and gains an order of accuracy. It all traces back to one choice, at which time level the equation is enforced, the same choice that reappears, flipped in direction, when we move from the forward-solved heat equation to the backward-solved Black-Scholes PDE. And what makes the approach worth the trouble is everything that comes after the price: Greeks fall out of the grid as finite differences, a single solve values the option across every spot, early exercise reduces to a pointwise maximum, and level-based barriers to a boundary condition. The grid holds the whole solution surface, and most of the questions worth asking can be read directly off it.\nOne last thread, left for you to pull. The explicit scheme is stable only when $\\Delta t$ scales with $(\\Delta x)^2$. If that relationship looks familiar, it is the same scaling that turns a random walk into Brownian motion, the one explored in the Brownian motion article: step variance grows like $\\Delta t$, step size like $\\Delta x$, and the two must balance as $(\\Delta x)^2 \\sim \\Delta t$ for the walk to converge to a diffusion. That is probably not a coincidence. By Feynman-Kac, the heat equation is the deterministic face of a Brownian motion, so a scheme for it is, in some sense, simulating that process. Write the stable update with its weights summing to one, and read them as the chances of stepping up, staying, and stepping down. Then ask what the stability condition is really demanding of those weights, and whether a scheme that violates it is still describing a random walk at all. I think the answer is worth finding yourself.\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/fdm/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eA derivative price can be computed two equivalent ways: as a risk-neutral expectation, or as the solution of a PDE. This is the \u003ca href=\"/articles/quant-foundations/feynman_kac/\"\u003eFeynman-Kac result\u003c/a\u003e, which I explored in the earlier article. Monte Carlo is the natural way to handle the expectation, and the \u003ca href=\"/articles/quant-foundations/mc_variance_reduction/\"\u003eprevious article\u003c/a\u003e worked through techniques for making it more efficient. Here I want to look at the other side, where the price is the solution of a PDE and we solve it on a grid.\u003c/p\u003e","title":"Finite Difference Methods: Marching Forward or Solving Together"},{"content":"Why This Matters In the article on the Feynman-Kac theorem, we saw that the price of a derivative can be expressed equivalently as the solution to a deterministic PDE or as the expectation of a discounted payoff under the risk-neutral measure. This gives us two complementary numerical approaches to pricing. For low-dimensional problems with smooth payoffs, finite difference methods on the PDE side are efficient and accurate. For high-dimensional problems, path-dependent payoffs, or models where the PDE is hard to derive, Monte Carlo (MC) on the expectation side becomes the natural choice.\nAnyone who has implemented MC in a production pricer notices a challenge immediately: it converges slowly. The standard error of the estimator scales in proportion to $1/\\sqrt{N}$, so gaining one extra digit of precision requires a hundredfold increase in computational cost. For a risk system that needs to revalue thousands of trades overnight, or for an intraday hedging system that revalues frequently as the market moves, the computation time adds up quickly.\nIn this article we walk through five common variance reduction techniques used in practice. Before getting to them, we take a quick look at where the variance comes from and how it shapes the way we approach the problem.\nWhere the Variance Comes From Consider the MC estimator for a European-style payoff:\n$$\\hat{V}_N = \\frac{e^{-rT}}{N}\\sum_{i=1}^N g(S_T^{(i)})$$where $g$ is the payoff function and $S_T^{(i)}$ are independent samples of the terminal price under the risk-neutral measure. The variance of this estimator is:\n$$\\text{Var}(\\hat{V}_N) = \\frac{e^{-2rT}\\,\\text{Var}(g(S_T))}{N}$$The expression points to two distinct levers we can pull. We can reshape what we average by replacing $g(S_T)$ with a related quantity that has the same expectation but lower variance. Or we can change how we sample by drawing the paths in a smarter way, without touching the payoff itself.\nEach of the five techniques in this article fits one of these two patterns:\nReshape what we average: control variates, conditional MC, importance sampling1. Change how we sample: antithetic variates, quasi-MC. The techniques look different on the surface and involve different machinery, but each is just a different way of applying one of these two ideas.\nThe Running Example: A Continuously-Monitored Down-and-Out Call We use the same example throughout the article so the variance reduction from each technique can be compared on a like-for-like basis. The example is a continuously-monitored down-and-out call on a single underlying. The payoff is:\n$$g(S) = \\max(S_T - K, 0) \\cdot \\mathbb{1}\\{\\min_{0 \\leq t \\leq T} S_t \u003e B\\}$$The option pays the standard call payoff at expiry if and only if the underlying stays strictly above the barrier $B$ at every instant of the path. In practice, barriers can be monitored continuously or at discrete observation dates such as daily or weekly fixings, and the choice is a contract specification. Most of the techniques in this article apply to both; conditional MC is the exception and is best suited to continuous monitoring, as discussed in its section.\nThe parameters for the example:\nParameter Value Spot $S_0$ 100 Strike $K$ 100 Barrier $B$ 85 Volatility $\\sigma$ 30% Risk-free rate $r$ 5% Maturity $T$ 1 year Time grid $M$ 252 steps Number of paths $N$ 100,000 The barrier sits 15% below spot, close enough that a meaningful fraction of paths knock out but far enough that a meaningful fraction survive. This is the regime where the option payoff has meaningful variance and where variance reduction techniques can make a real difference.\nTwo caveats on the example. In practice, barrier options are typically priced under local vol or stochastic vol models rather than constant-vol GBM; the simplification here is to keep the focus on variance reduction. Under this simplified GBM the barrier option also has a closed-form price (around 11.87 for our parameters), which lets us check our Monte Carlo estimates against the exact value.\nNaive Monte Carlo Baseline The baseline simulates $N$ paths of the underlying under Geometric Brownian motion on a discrete grid of $M$ steps, checks the barrier at each grid point, and averages the surviving payoffs.\nimport numpy as np def simulate_paths(S0, sigma, r, T, M, N): dt = T / M Z = np.random.standard_normal((N, M)) increments = (r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z log_paths = np.log(S0) + np.cumsum(increments, axis=1) return np.exp(log_paths) # shape (N, M) def naive_mc(S0, K, B, sigma, r, T, M, N): paths = simulate_paths(S0, sigma, r, T, M, N) survived = np.min(paths, axis=1) \u0026gt; B payoffs = np.maximum(paths[:, -1] - K, 0) * survived price = np.exp(-r * T) * np.mean(payoffs) se = np.exp(-r * T) * np.std(payoffs) / np.sqrt(N) return price, se Note that the naive simulation only checks the barrier at the $M$ grid points, but the contract specifies continuous monitoring. The estimator therefore carries a small upward bias from missing between-grid breaches: on our parameters the naive estimator converges to roughly 12.19 rather than the true value of 11.87, an overprice of about 0.32. This article focuses on variance reduction so we set this bias aside, but we will see that one of our techniques removes the bias directly.\nTechnique 1: Antithetic Variates Where does the variance in the MC estimator come from? Each path is an independent realisation of the underlying stochastic process, and thus produces a random payoff. The estimator averages these independent draws, and the variance comes from the dispersion of the payoff across paths. If we could pair each path with a second one whose payoff tends to land on the opposite side of the mean, the dispersion within each pair would partially cancel and the estimator would be tighter.\nThe simplest way to construct such a pair is to negate the random draws. If a path is generated from $Z_1, \\ldots, Z_M$, we generate a partner from $-Z_1, \\ldots, -Z_M$. Both draws are valid samples from the standard normal, so each path on its own is a legitimate MC path. The two paths are mirror images of each other: when the original drifts up, the partner drifts down.\nDoes this actually reduce variance? Let $g(S^+)$ and $g(S^-)$ be the payoffs on the original and negated paths. For each pair we compute the average:\n$$\\bar{g}^{(i)} = \\frac{g(S^+) + g(S^-)}{2}$$The variance of a single pair average is:\n$$\\text{Var}(\\bar{g}) = \\frac{1}{4}\\left[\\text{Var}(g(S^+)) + \\text{Var}(g(S^-)) + 2\\,\\text{Cov}(g(S^+), g(S^-))\\right]$$Since $Z$ and $-Z$ have the same distribution, the paths $S^+$ and $S^-$ are both valid samples of the same GBM dynamics, so $\\text{Var}(g(S^+)) = \\text{Var}(g(S^-)) = \\text{Var}(g(S))$. The expression simplifies to:\n$$\\text{Var}(\\bar{g}) = \\frac{1}{2}\\,\\text{Var}(g(S)) + \\frac{1}{2}\\,\\text{Cov}(g(S^+), g(S^-))$$Compare this against two independent paths, whose average has variance $\\frac{1}{2}\\,\\text{Var}(g(S))$. The antithetic average beats the independent average whenever the covariance is negative, i.e., when $g(S^+)$ and $g(S^-)$ move in opposite directions. This is exactly what happens when $g$ is monotonic in the underlying: a path that ends up high gives a large payoff and its negated partner gives a small one. Monotonically decreasing payoffs work by the same argument.\nIf $g$ is non-monotonic the argument breaks down. The covariance can be positive, and the technique can actually increase variance compared to independent sampling. This is the main caveat: antithetic variates is safe on calls and puts, useful on barrier options where the payoff is mostly monotonic away from the barrier, and potentially harmful on payoffs with strong non-linearities such as straddles or butterflies.\nImplementation.\ndef antithetic_mc(S0, K, B, sigma, r, T, M, N): dt = T / M Z = np.random.standard_normal((N // 2, M)) Z_full = np.concatenate([Z, -Z], axis=0) increments = (r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z_full paths = np.exp(np.log(S0) + np.cumsum(increments, axis=1)) survived = np.min(paths, axis=1) \u0026gt; B payoffs = np.maximum(paths[:, -1] - K, 0) * survived # pair averages pair_avg = (payoffs[:N//2] + payoffs[N//2:]) / 2 price = np.exp(-r * T) * np.mean(pair_avg) se = np.exp(-r * T) * np.std(pair_avg) / np.sqrt(N // 2) return price, se Result. On our example antithetic variates gives a per-path variance reduction of around 1.5x. There is also a wall-clock benefit: because we only need to draw $N/2$ independent normal vectors and negate them to produce $N$ paths, the random number generation cost is roughly halved. The wall-clock efficiency (tighter estimate in less time) comes out around 1.8x. The technique is essentially free to implement and combines well with the others, which makes it a sensible default to layer on top of any other variance reduction approach.\nTechnique 2: Control Variates The down-and-out call shares most of its structure with a vanilla European call. They have the same strike, the same maturity, the same underlying. The only difference is the knock-out feature, which kills some paths that would otherwise have paid off. On every path where the barrier is not breached, the two options pay exactly the same amount.\nThat suggests a natural question: since we know the vanilla call price analytically from Black-Scholes, can we use it to help price the barrier option? The simplest thing to try is a linear approximation:\n$$g_{barrier}(S) = \\alpha + \\beta \\, g_{vanilla}(S) + \\varepsilon(S)$$where $\\alpha$ and $\\beta$ are constants we want to find and $\\varepsilon$ is the residual error on each path.\nThis is exactly the setup of a linear regression of $g_{barrier}$ on $g_{vanilla}$, with $\\alpha$ as the intercept and $\\beta$ as the slope. The optimal $\\beta$ is the standard regression coefficient:\n$$\\beta = \\frac{\\text{Cov}(g_{barrier}, g_{vanilla})}{\\text{Var}(g_{vanilla})}$$It measures how much the barrier payoff moves, on average, per unit movement in the vanilla payoff.\nTaking expectations of the linear approximation under $\\mathbb{Q}$:\n$$\\mathbb{E}[g_{barrier}] = \\alpha + \\beta \\, \\mathbb{E}[g_{vanilla}]$$We know $\\mathbb{E}[g_{vanilla}]$ from Black-Scholes (it is $e^{rT} C_{BS}$, the undiscounted vanilla price). So if we can estimate $\\alpha$, we recover the barrier price directly.\nThe advantage of estimating $\\alpha$ rather than $\\mathbb{E}[g_{barrier}]$ comes from its variance. The intuition is that $g_{barrier}$ is noisy across paths, but a large portion of that noise is shared with $g_{vanilla}$ (the two payoffs move together on most paths). By subtracting the scaled vanilla payoff $\\beta\\,g_{vanilla}$, we cancel out the shared noise and are left with the residual that is specific to the barrier feature itself. The more correlated the two payoffs are, the more noise is cancelled. Working out the variance of $g_{barrier} - \\beta\\,g_{vanilla}$ at the optimal $\\beta$ gives:\n$$\\text{Var}(\\alpha) = \\text{Var}(g_{barrier})\\,(1 - \\rho^2)$$where $\\rho$ is the correlation between the barrier and vanilla payoffs. The closer $\\rho$ is to 1, the larger the variance reduction.\nWe estimate $\\alpha$ as the sample mean of $g_{barrier} - \\beta\\,g_{vanilla}$ over $N$ paths:\n$$\\hat{\\alpha} = \\frac{1}{N}\\sum_{i=1}^N \\left[ g_{barrier}(S^{(i)}) - \\beta\\,g_{vanilla}(S^{(i)}) \\right]$$and then recover the discounted barrier price by adding back $\\beta\\,\\mathbb{E}[g_{vanilla}] = \\beta\\,e^{rT}C_{BS}$ and discounting:\n$$\\hat{V}_{CV} = e^{-rT}\\left(\\hat{\\alpha} + \\beta\\,e^{rT}C_{BS}\\right)$$where $C_{BS}$ is the analytical Black-Scholes price. In practice $\\beta$ is estimated from the simulation itself by computing the sample covariance and variance.\nImplementation.\nfrom scipy.stats import norm def bs_call(S0, K, sigma, r, T): d1 = (np.log(S0/K) + (r + 0.5*sigma**2)*T) / (sigma*np.sqrt(T)) d2 = d1 - sigma*np.sqrt(T) return S0*norm.cdf(d1) - K*np.exp(-r*T)*norm.cdf(d2) def control_variate_mc(S0, K, B, sigma, r, T, M, N): paths = simulate_paths(S0, sigma, r, T, M, N) survived = np.min(paths, axis=1) \u0026gt; B barrier_payoffs = np.maximum(paths[:, -1] - K, 0) * survived vanilla_payoffs = np.maximum(paths[:, -1] - K, 0) # estimate beta beta = np.cov(barrier_payoffs, vanilla_payoffs)[0, 1] / np.var(vanilla_payoffs) # known mean of vanilla payoff under Q vanilla_mean = np.exp(r*T) * bs_call(S0, K, sigma, r, T) adjusted = barrier_payoffs - beta * (vanilla_payoffs - vanilla_mean) price = np.exp(-r * T) * np.mean(adjusted) se = np.exp(-r * T) * np.std(adjusted) / np.sqrt(N) return price, se Result. On our example the correlation between the barrier and vanilla payoffs is high, giving a variance reduction factor of around 10x. This is a substantial improvement over antithetic variates and reflects how much of the barrier option\u0026rsquo;s variance is \u0026ldquo;explained\u0026rdquo; by the vanilla call.\nWhen to reach for it. Control variates is most powerful when the correlation between the payoff and the control is high. For the barrier option this works well when the barrier is reasonably far from spot, so most paths survive and the two payoffs move together. When the barrier is very close to spot and most paths knock out, the correlation drops and the technique loses much of its power.\nTechnique 3: Conditional Monte Carlo The variance of the naive per-path payoff $Y = \\max(S_T - K, 0) \\cdot \\mathbb{1}\\{\\text{survived}\\}$ comes from two distinct sources. Some of the variability comes from $S_T$ varying across paths: the call payoff lands at different places, and the survival probability is different at different endpoints. But there is also variation that exists even when $S_T$ is held fixed: two paths with the same terminal value can have very different outcomes if one touches the barrier and the other does not. If we could remove this second source of variation while keeping the first, the estimator would tighten.\nA scatter of $Y$ against $S_T$ makes the two sources visible:\n60 80 100 120 140 160 180 0 25 50 75 100 B = 85 K = 100 within-S_T spread terminal value S_T payoff Y surviving path (Y \u0026gt; 0) knocked-out path (Y = 0) conditional mean E[Y | S_T] For each value of $S_T$ above the strike, paths fall into two groups: surviving paths sit on the diagonal $Y = S_T - K$, and knocked-out paths sit at $Y = 0$. The vertical spread at any given $S_T$ is the within-$S_T$ variance: variation in $Y$ that has nothing to do with where $S_T$ landed. The orange curve is the conditional mean $\\mathbb{E}[Y \\mid S_T]$, which passes between the two groups in proportion to the survival probability. The slope and position of the orange curve as $S_T$ varies is the between-$S_T$ variance.\nThe law of total variance is the formal version of this split. For any random payoff $Y$ and any random variable $X$,\n$$\\text{Var}(Y) = \\mathbb{E}[\\text{Var}(Y \\mid X)] + \\text{Var}(\\mathbb{E}[Y \\mid X]).$$The first term is the average within-$X$ variance (the vertical spread at each $S_T$ in the picture). The second term is the variance of the conditional mean (how much the orange curve moves as $S_T$ varies). Together they sum to the full variance.\nIf we can compute $\\mathbb{E}[Y \\mid X]$ in closed form, we can sample $X$ and average $\\mathbb{E}[Y \\mid X]$ instead of $Y$ itself. The estimator remains unbiased: $\\mathbb{E}[Y] = \\mathbb{E}[\\mathbb{E}[Y \\mid X]]$, so averaging the conditional mean gives the same answer as averaging $Y$. Its per-path variance drops from $\\text{Var}(Y)$ to just $\\text{Var}(\\mathbb{E}[Y \\mid X])$, the second term. The reduction equals the first term: exactly the within-$X$ spread we identified in the picture. This is conditional Monte Carlo.\nFor our barrier option a natural choice is $X = S_T$. The call payoff depends on $S_T$ directly and the survival indicator is correlated with $S_T$, so $S_T$ absorbs most of the variation in $Y$. The conditional expectation $\\mathbb{E}[Y \\mid S_T]$ is the call payoff times the probability that the path stayed above the barrier given the endpoints. For continuously-monitored barriers on Constant-coefficient GBM the Brownian bridge gives that survival probability in closed form. Applying the general-case bridge survival formula to the log-price $\\ln S_t$:\n$$P(\\min_{0 \\leq t \\leq T} S_t \u003e B \\mid S_0, S_T) = 1 - \\exp\\left(-\\frac{2 \\ln(S_0/B)\\,\\ln(S_T/B)}{\\sigma^2 T}\\right)$$The bridge integrates over all possible continuous paths connecting $S_0$ and $S_T$, returning the survival probability without us ever needing to simulate intermediate points.\nComputationally this changes the simulation in a more fundamental way than the other techniques. The other four techniques all simulate full paths of $M$ steps and modify how the resulting payoffs are combined into the estimator. Conditional MC via the one-step bridge skips the intermediate simulation entirely: we sample only $S_T$, apply the bridge formula once, and combine with the call payoff. The simulation dimension collapses from $M = 252$ to $M = 1$. This is a side effect of the conditional expectation being available in closed form, not the source of the variance reduction itself.\nThe estimator is:\n$$\\hat{V}_{CMC} = \\frac{e^{-rT}}{N}\\sum_{i=1}^N \\max(S_T^{(i)} - K, 0) \\cdot P(\\text{survival} \\mid S_0, S_T^{(i)})$$where $S_T^{(i)}$ is drawn from the lognormal terminal distribution and the survival probability is computed in closed form.\nImplementation.\ndef conditional_mc(S0, K, B, sigma, r, T, N): Z = np.random.standard_normal(N) ST = S0 * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z) above = ST \u0026gt; B log_S0_B = np.log(S0 / B) log_ST_B = np.log(np.maximum(ST / B, 1e-12)) survival_prob = np.where(above, 1 - np.exp(-2 * log_S0_B * log_ST_B / (sigma**2 * T)), 0.0) payoffs = np.maximum(ST - K, 0) * survival_prob price = np.exp(-r * T) * np.mean(payoffs) se = np.exp(-r * T) * np.std(payoffs, ddof=1) / np.sqrt(N) return price, se Notice the implementation does not take $M$ as an argument. There are no intermediate steps to simulate.\nResult. The per-path variance reduction is modest, around 1.2x. This ties back to control variates: the vanilla call (a function of $S_T$ alone) already explained roughly 9/10 of the barrier payoff\u0026rsquo;s variance via linear projection. The full conditional expectation $\\mathbb{E}[Y \\mid S_T]$ that conditional MC averages is itself nearly linear in the vanilla payoff over the in-the-money region, so it captures almost the same variance the linear projection did. Most of $\\text{Var}(Y)$ was already attributable to $S_T$, leaving only a small residual for conditional MC to remove. But because each path now costs roughly $1/M$ of a naive MC path, the wall-clock efficiency improvement is closer to 250x. The right comparison metric for this technique is standard error per unit of computational time, not per path.\nConditional MC also removes the discretisation bias of the naive estimator. The naive simulation monitors the barrier only on the $M$ grid points and misses between-grid breaches, biasing the price upward. The naive, antithetic, control variate, importance sampling, and quasi-MC estimators all converge near 12.19, while conditional MC (which handles continuous monitoring analytically through the bridge) converges to the true value of 11.87.\nWhen to reach for it. Conditional MC via the one-step bridge is most powerful when the conditional expectation has a clean closed form for the full interval. Continuously-monitored barrier options under constant-coefficient GBM are the canonical case. For more complex models (local vol, stochastic vol, time-dependent barriers) the one-step shortcut is not available, and one falls back to a multi-step version where the bridge is evaluated segment by segment. The multi-step version still removes bias and provides modest variance reduction, but the dramatic computational speedup is lost. Outside barriers, the same principle applies to any payoff where the path can be integrated out analytically conditional on a low-dimensional summary.\nFor discretely-monitored barriers, the one-step shortcut also does not apply: there is no discretisation bias to correct (the simulation can check the barrier on the actual fixing dates), the conditional expectation given the terminal value has no closed form, and the dimensional collapse is therefore gone.\nTechnique 4: Importance Sampling In the naive simulation, more than half of paths contribute zero to the payoff: some knock out at the barrier, others survive but finish out-of-the-money. The variance of the estimator comes from the productive paths, those that both survive and end in-the-money, where the payoff varies widely from path to path. We would like to draw more paths from this productive region. But the price we want is an expectation under the risk-neutral measure $\\mathbb{Q}$, so any shift in the sampling distribution must be accompanied by a correction that recovers $\\mathbb{Q}$-expectations.\nThis is exactly the kind of reweighting developed in the Girsanov article. There we showed that drift lives in the probability weights: shifting the drift of a process is the same as reweighting paths by the Radon-Nikodym derivative. We use the same machinery here but for a different purpose. In the pricing context Girsanov takes us from the real-world measure $\\mathbb{P}$ to the risk-neutral measure $\\mathbb{Q}$. In Monte Carlo we already start under $\\mathbb{Q}$; we now reweight from $\\mathbb{Q}$ to some shifted measure $\\tilde{\\mathbb{Q}}$ that makes productive paths more likely, then divide back by the Radon-Nikodym derivative to recover the original $\\mathbb{Q}$-expectation.\nUnder $\\mathbb{Q}$, the underlying follows $dS_t = rS_t\\,dt + \\sigma S_t\\,dW_t$. Under the shifted measure $\\tilde{\\mathbb{Q}}$, it follows $dS_t = (r + \\eta)S_t\\,dt + \\sigma S_t\\,d\\tilde{W}_t$, where $\\eta$ is the drift shift we choose. The Radon-Nikodym derivative is:\n$$\\frac{d\\mathbb{Q}}{d\\tilde{\\mathbb{Q}}} = \\exp\\left(-\\frac{\\eta}{\\sigma}\\tilde{W}_T - \\frac{1}{2}\\frac{\\eta^2}{\\sigma^2}T\\right)$$The importance sampling estimator is:\n$$\\hat{V}_{IS} = \\frac{e^{-rT}}{N}\\sum_{i=1}^N g(S^{(i)}) \\cdot \\frac{d\\mathbb{Q}}{d\\tilde{\\mathbb{Q}}}(S^{(i)})$$where the paths $S^{(i)}$ are now simulated under $\\tilde{\\mathbb{Q}}$.\nChoosing $\\eta$. The choice involves a tradeoff between two effects. A larger shift pushes more paths into the productive region (surviving and in-the-money), which lowers variance. But a larger shift also increases the variance of the likelihood ratio itself. Writing $L$ for the likelihood ratio $d\\mathbb{Q}/d\\tilde{\\mathbb{Q}}$, we have $\\log L = -(\\eta/\\sigma)\\tilde W_T - \\tfrac{1}{2}(\\eta/\\sigma)^2 T$, which has variance $\\eta^2 T / \\sigma^2$. Since $L$ is the exponential of this, the variance of $L$ grows even faster, and the importance-sampled estimator inherits that growth. The optimal $\\eta$ sits where the two effects balance; pushing past it increases estimator variance and the technique can do worse than naive MC. A good rule of thumb for an out-of-the-money option is to shift the drift so that the new expected terminal price lands near the strike. For a barrier option with the barrier below spot, we shift the drift upward to keep paths away from the barrier and to push the terminal distribution toward the in-the-money region.\nImplementation.\ndef importance_sampling_mc(S0, K, B, sigma, r, T, M, N, eta): dt = T / M Z = np.random.standard_normal((N, M)) # simulate under shifted measure with drift r + eta increments = (r + eta - 0.5*sigma**2)*dt + sigma*np.sqrt(dt)*Z paths = np.exp(np.log(S0) + np.cumsum(increments, axis=1)) # likelihood ratio: depends on terminal Brownian motion under shifted measure W_T_tilde = np.sum(sigma*np.sqrt(dt)*Z, axis=1) / sigma likelihood = np.exp(-(eta/sigma)*W_T_tilde - 0.5*(eta/sigma)**2*T) survived = np.min(paths, axis=1) \u0026gt; B payoffs = np.maximum(paths[:, -1] - K, 0) * survived * likelihood price = np.exp(-r * T) * np.mean(payoffs) se = np.exp(-r * T) * np.std(payoffs) / np.sqrt(N) return price, se For our example, $\\eta$ around 0.2 gives a variance reduction factor of about 4.3x. The optimal shift could be found by a pilot simulation, but the result is moderately robust within a sensible range.\nWhen to reach for it. Importance sampling shines when many paths contribute zero to the payoff, so concentrating samples in the productive region tightens the estimator: deep out-of-the-money options, rare event probabilities, and barrier options similar to our example. It requires more care than the other techniques because the choice of new measure can make or break the variance reduction.\nTechnique 5: Quasi-Monte Carlo When we say \u0026ldquo;Monte Carlo,\u0026rdquo; we usually mean simulating random paths and averaging the payoffs. But sampling and averaging are two separate steps. Random sampling is one way to draw paths from the underlying distribution, but it is not the only way. Do the samples themselves need to be random?\nRandom samples have an inefficiency. $N$ independent uniform draws on $[0, 1]$ do not produce $N$ evenly-spaced points; they produce clumps where the algorithm happened to draw close together and gaps where it did not. The clumps waste effort by oversampling the same region; the gaps leave information on the table. This dispersion is what makes the error shrink only as $1/\\sqrt{N}$.\nUnder the hood, every random normal in our simulation comes from a transformation of uniform random numbers; the underlying randomness enters as uniforms in $[0, 1]^d$, where $d$ is the number of random inputs per path. Quasi-MC replaces these uniform draws with a deterministic sequence of points constructed to fill the unit cube evenly, then applies the inverse normal CDF to convert each one into a normal. The resulting normals concentrate where the probability density is high, in the same way random normals do, but without the clumps and gaps that come from independent random draws.\nThe deterministic sequences used in practice are called low-discrepancy sequences. Sobol is the standard choice in finance: each new point in the sequence is placed where the existing points have left the largest gap, so the first $N$ points are well-distributed in the unit cube for any $N$. Because the points are deterministic, convergence is no longer governed by the spread of a random estimator but by how evenly the points cover the unit cube; for smooth integrands this decays faster than $1/\\sqrt{N}$.\nThe intuition behind the faster rate is that random points waste evaluations. With $N$ random points, two might land near each other (the second contributing little new information) while a gap elsewhere is left unsampled (no information about $f$ in that region). The fluctuating spacing means each new point\u0026rsquo;s contribution to the average is itself noisy, and the noise averages out only as $1/\\sqrt{N}$. Deterministic low-discrepancy points avoid both problems: each new point lands where the existing points have left the largest gap, so every evaluation contributes maximally. The estimate gets sharper as $N$ grows in a way the random average cannot match.\nMechanically the estimator looks identical to the naive one. The only change is in how the underlying uniform numbers are generated. Everything downstream is unchanged.\nImplementation.\nfrom scipy.stats import norm, qmc def quasi_mc(S0, K, B, sigma, r, T, M, N): dt = T / M sampler = qmc.Sobol(d=M, scramble=True) U = sampler.random(N) # uniforms in (0,1)^M Z = norm.ppf(U) increments = (r - 0.5*sigma**2)*dt + sigma*np.sqrt(dt)*Z paths = np.exp(np.log(S0) + np.cumsum(increments, axis=1)) survived = np.min(paths, axis=1) \u0026gt; B payoffs = np.maximum(paths[:, -1] - K, 0) * survived price = np.exp(-r * T) * np.mean(payoffs) se = np.exp(-r * T) * np.std(payoffs) / np.sqrt(N) return price, se A note on the standard error. Pure Sobol is fully deterministic: same code, same $N$, same answer every time. There is no run-to-run variation, so the standard error formula does not apply. To recover an error estimate, we use scrambled Sobol: a randomising transformation applied to the Sobol sequence. Each scramble produces a different sequence, but all of them keep Sobol\u0026rsquo;s uniform coverage property. Running several independently-scrambled versions and looking at the spread of their estimates gives the same kind of measurement we get for naive MC, which is what lets us compare quasi-MC against the other techniques on a like-for-like basis. If you only need a point estimate, unscrambled Sobol is a reasonable alternative.\nResult. On our example, randomised Sobol gives a per-path variance reduction of around 8x. The wall-clock efficiency is lower, around 5x, because Sobol sequence generation is more expensive than the pseudorandom number generation used by the other techniques.\nThe mathematical machinery behind quasi-MC\u0026rsquo;s convergence rate, including discrepancy bounds and the role of effective dimension, is beyond the scope of this article.\nComparative Summary Technique Lever Per-path VR Wall-clock efficiency Key requirement Antithetic variates Change how we sample ~1.5x ~1.8x Monotonicity of payoff Control variates Reshape what we average ~10x ~10x Correlated quantity with known mean Conditional MC (one-step bridge) Reshape what we average ~1.2x ~250x Analytical bridge for the full interval Importance sampling Reshape what we average ~4.3x at $\\eta = 0.2$ ~4x Good choice of shifted measure Quasi-MC (randomised Sobol) Change how we sample ~8x ~5x Smooth payoff structure The numbers above are measured on the specific parameter set in this article. Wall-clock figures are measured on a single machine and depend on hardware and library versions; the per-path variance reduction is the more robust quantity to compare. The wall-clock efficiency column matters most for conditional MC, where the simulation dimension collapses from $M = 252$ to $M = 1$ and each path costs a tiny fraction of a naive MC path.\nMost of these techniques can be layered: antithetic with control variates, antithetic with quasi-MC, control variates with quasi-MC. The combined variance reduction is typically less than the product of the individual factors because the techniques exploit overlapping structure.\nConclusion Our running example was deliberately simple, with an analytical price for verification, but the techniques apply much more broadly. Wherever Monte Carlo is used to price a derivative, the same questions arise about variance, bias, and wall-clock efficiency.\nTwo structural observations from the article I found interesting. Control variates and conditional MC are two applications of the same idea: both split the payoff into a piece explained by another variable and a residual, and average the explained piece while dealing with the residual separately. Control variates uses a linear function of a correlated payoff; conditional MC uses the full conditional expectation given some variable. And quasi-MC, despite the name, is not really a Monte Carlo method: pure Sobol is fully deterministic.\nImportance sampling sits at the boundary between the two: it changes the sampling distribution, but the reason it reduces variance is that the reweighted payoff has lower variance than the original. We place it on the payoff side because that is what drives its effectiveness.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/mc_variance_reduction/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eIn the article on the \u003ca href=\"/articles/quant-foundations/feynman_kac/\"\u003eFeynman-Kac theorem\u003c/a\u003e, we saw that the price of a derivative can be expressed equivalently as the solution to a deterministic PDE or as the expectation of a discounted payoff under the risk-neutral measure. This gives us two complementary numerical approaches to pricing. For low-dimensional problems with smooth payoffs, finite difference methods on the PDE side are efficient and accurate. For high-dimensional problems, path-dependent payoffs, or models where the PDE is hard to derive, Monte Carlo (MC) on the expectation side becomes the natural choice.\u003c/p\u003e","title":"Monte Carlo Variance Reduction: What We Average, and How We Sample"},{"content":"Why This Matters In my earlier article on Brownian motion, I worked through the forward view: a process starting at a known value, diffusing into an uncertain future. Sometimes we know more than just the starting point. We also know where the process ended up, and we want to characterise the path in between. The object that answers this is the Brownian bridge: a Brownian motion conditioned on its terminal value.\nMy motivation for writing this is a follow-up article on Monte Carlo variance reduction, where one of the techniques, conditional Monte Carlo, uses the bridge to accelerate the pricing of barrier options. To keep that article focused on variance reduction, I am introducing the Brownian bridge here as a standalone reference. This article is meant as a brief introduction rather than a deep dive: the aim is to define the bridge, construct its form, and derive the survival probability that sits at the core of the variance reduction technique.\nDefinition Formally, the Brownian bridge from $a$ to $b$ on $[0, T]$ is the conditional process\n$$ \\{W_t \\mid W_0 = a,\\ W_T = b\\},\\quad 0 \\leq t \\leq T. $$The definition is abstract. To work with the bridge in practice we need a concrete formula that, given a sample of an ordinary Brownian motion, produces a sample of the bridge.\nDecomposing $W_t$ To get a concrete formula we split $W_t$ into a piece that is predictable from $W_T$ and a piece that is independent of it. The familiar version of this idea, for two standard normals $X$ and $Y$ with correlation $\\rho$, is\n$$ Y = \\rho X + \\sqrt{1 - \\rho^2}\\, Z, $$where $Z$ is a standard normal independent of $X$. The first term is the predictable piece (a multiple of $X$); the second is the independent residual.\nWe want the same kind of split for $W_t$ in terms of $W_T$, but $W_t$ and $W_T$ have variances $t$ and $T$, not 1. The role of $\\rho X$ in the familiar form is to be the best linear predictor of $Y$ given $X$. For variables with arbitrary variance, the best linear predictor of $W_t$ given $W_T$ is $\\text{Cov}(W_t, W_T) / \\text{Var}(W_T) \\cdot W_T$1:\n$$ \\mathbb{E}[W_t \\mid W_T] = \\frac{\\text{Cov}(W_t, W_T)}{\\text{Var}(W_T)} W_T = \\frac{t}{T} W_T. $$For Brownian motion $\\text{Cov}(W_t, W_T) = t$ and $\\text{Var}(W_T) = T$, giving the slope $t/T$.\nSubtracting the projection from $W_t$ leaves the residual\n$$ R_t = W_t - \\frac{t}{T} W_T, $$which is independent of $W_T$ (zero covariance, both Gaussian). So we have the decomposition\n$$ W_t = \\underbrace{\\frac{t}{T} W_T}_{\\text{predictable from } W_T} \\;+\\; \\underbrace{R_t}_{\\text{independent of } W_T}. $$Conditioning on $W_T = b$ Conditioning acts on each piece separately. The first piece is a function of $W_T$, so fixing $W_T = b$ turns it into the deterministic value $\\frac{t}{T} b$. The second piece is independent of $W_T$, so its distribution is unchanged. Putting them back together:\n$$ \\{W_t \\mid W_T = b\\} = \\frac{t}{T} b + \\left(W_t - \\frac{t}{T} W_T\\right). $$This is a bridge from $0$ to $b$. To start at $a$ instead, add the deterministic term $a(1 - t/T)$, which equals $a$ at $t = 0$ and zero at $t = T$:\n$$ B_t = a + \\frac{t}{T}(b - a) + \\left(W_t - \\frac{t}{T} W_T\\right),\\quad 0 \\leq t \\leq T. $$A Word on Notation The $W_T$ inside the bracket on the right is the random terminal value of the unconditioned Brownian motion we started with. It is not equal to $b$. The formula is a recipe: take any sample path of an unconditioned Brownian motion, look at its terminal value $W_T$, and apply the transformation. The subtraction removes the part of the sampled path that is correlated with its own terminal value; the linear interpolation then puts the desired endpoint $b$ back in. At $t = T$ the bracket vanishes by construction (the path\u0026rsquo;s own terminal value cancels itself) and $B_T = b$.\nMean and Covariance The construction expresses $B_t$ as a linear combination of Gaussian variables, so $B_t$ is itself Gaussian at every time, and the joint distribution of $(B_{t_1}, \\ldots, B_{t_n})$ at any finite set of times is multivariate Gaussian. The process is fully characterised by its mean and covariance:\n$$ \\mathbb{E}[B_t] = a + \\frac{t}{T}(b - a), $$$$ \\text{Cov}(B_s, B_t) = \\frac{s(T - t)}{T},\\quad 0 \\leq s \\leq t \\leq T, $$with variance $\\text{Var}(B_t) = t(T - t)/T$, a parabola vanishing at both endpoints and peaking at $T/2$. The uncertainty starts at zero, grows toward the middle of the interval, then shrinks back to zero as the process approaches the fixed endpoint. The bridge is therefore \u0026ldquo;pulled\u0026rdquo; toward the terminal value as maturity approaches.\nSurvival Probability via the Reflection Principle The survival probability of the bridge, the probability that the bridge stays above a barrier, has a closed-form expression. This is the foundation of analytical down-and-out barrier option pricing and of the conditional Monte Carlo technique I cover in the variance reduction article.\nThe bridge is the object whose survival probability we want, but the derivation does not use the bridge construction directly. The conditional probability is computed from unconditional Brownian motion via Bayes\u0026rsquo; rule, with the reflection principle supplying the joint density of the minimum and the terminal value.\nI will derive the simple case here: standard Brownian motion starting at $W_0 = 0$ with unit volatility, conditioned on $W_T = b$, with a lower barrier $L \u003c 0$ and $b \u003e L$ (since the path starts at $0$, the lower barrier must be negative). The question is\n$$ P\\!\\left(\\min_{0 \\leq s \\leq T} W_s \u003e L \\;\\middle|\\; W_T = b\\right) = ? $$The Reflection Principle The tool we need is the reflection principle for standard Brownian motion. For any barrier $L \u003c 0$ and any terminal value $b \u003e L$,\n$$ P\\!\\left(\\min_{0 \\leq s \\leq T} W_s \\leq L,\\ W_T = b\\right) = P(W_T = 2L - b). $$This is a density identity: $P(W_T = b)$ is shorthand for the density $\\phi_T(b)$, and the equation should be read as \u0026ldquo;the density of $W_T$ at $b$ on the event that the path crossed $L$ equals the unconditional density of $W_T$ at $2L - b$.\u0026rdquo; The idea behind it: every path that touches $L$ before time $T$ and ends at $b$ can be reflected about $L$ from its first hitting time onward, producing a path that ends at $2L - b$. The reflection is a bijection between the two sets of paths, and Brownian motion\u0026rsquo;s symmetry means it preserves the density.\nL 2L \u0026#8722; b b \u0026#964; T 0 original path reflected path t Reflection about $L$ from the first hitting time $\\tau$. The original path (solid) ends at $b$; the reflected path (dashed) ends at $2L - b$. The two paths agree up to $\\tau$ and are mirror images thereafter. From Reflection to Survival Reading the reflection identity as a density statement: the density of $W_T$ at $b$ on the event $\\{m_T \\leq L\\}$ is $\\phi_T(2L - b)$, where $m_T = \\min_{s \\leq T} W_s$ and $\\phi_T$ is the density of $W_T \\sim \\mathcal{N}(0, T)$. Dividing by the unconditional density $\\phi_T(b)$ gives the conditional probability of having hit the barrier given the terminal value:\n$$ P\\!\\left(m_T \\leq L \\;\\middle|\\; W_T = b\\right) = \\frac{\\phi_T(2L - b)}{\\phi_T(b)} = \\exp\\!\\left(-\\frac{2L(L - b)}{T}\\right). $$The survival probability is the complement:\n$$ \\boxed{\\;P\\!\\left(\\min_{0 \\leq s \\leq T} W_s \u003e L \\;\\middle|\\; W_T = b\\right) = 1 - \\exp\\!\\left(-\\frac{2L(L - b)}{T}\\right).\\;} $$The survival probability depends on how far each endpoint sits from the barrier. If either endpoint approaches the barrier, the survival probability falls toward zero. If both endpoints move far away from the barrier, the exponent becomes large and negative, pushing the survival probability toward one.\nGeneral Case For Brownian motion starting at $W_0 = a$ with volatility $\\sigma$, conditioned on $W_T = b$, with lower barrier $L$ and $a, b \u003e L$, the general result follows from the simple case by a change of variables. Define\n$$ \\tilde{W}_s = \\frac{W_s - a}{\\sigma}. $$Then $\\tilde{W}$ is standard Brownian motion starting at $0$. The event $\\{W_s \\leq L\\}$ becomes $\\{\\tilde{W}_s \\leq (L - a)/\\sigma\\}$, and the conditioning $\\{W_T = b\\}$ becomes $\\{\\tilde{W}_T = (b - a)/\\sigma\\}$. Substituting into the simple-case formula with barrier $(L - a)/\\sigma$ and terminal value $(b - a)/\\sigma$:\n$$ P\\!\\left(\\min_{0 \\leq s \\leq T} W_s \u003e L \\;\\middle|\\; W_0 = a,\\ W_T = b\\right) = 1 - \\exp\\!\\left(-\\frac{2(a - L)(b - L)}{\\sigma^2 T}\\right). $$The symmetric form in $(a - L)$ and $(b - L)$ reflects the fact that the bridge is time-reversible: running the bridge backwards in time (mapping $s$ to $T - s$) gives another Brownian bridge with the start and end swapped. Reversing time does not change which values the path visits, only the order, so the minimum is the same in both directions. The survival probability must therefore give the same answer whether we call the endpoints $(a, b)$ or $(b, a)$, which forces the formula to be symmetric in the two.\nThis is the least-squares regression slope. Differentiating $\\mathbb{E}[(Y - \\beta X)^2]$ in $\\beta$ and setting to zero gives $\\beta = \\mathbb{E}[XY] / \\mathbb{E}[X^2]$, which for zero-mean variables is $\\text{Cov}(X, Y) / \\text{Var}(X)$.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/brownian_bridge/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eIn my earlier article on \u003ca href=\"/articles/quant-foundations/understanding_brownian_motion/\"\u003eBrownian motion\u003c/a\u003e, I worked through the forward view: a process starting at a known value, diffusing into an uncertain future. Sometimes we know more than just the starting point. We also know where the process ended up, and we want to characterise the path in between. The object that answers this is the Brownian bridge: a Brownian motion conditioned on its terminal value.\u003c/p\u003e","title":"The Brownian Bridge: What Brownian Motion Looks Like When You Know the Endpoints"},{"content":"Why This Matters Many of the world\u0026rsquo;s most actively traded commodities are priced in USD, yet end investors and corporates often operate in other currencies. A Canadian oil producer hedging output, a European airline managing jet fuel costs, or an Asian sovereign wealth fund allocating to commodity exposure all face the same underlying issue: commodity risk does not exist in isolation from FX risk. The standard approach is to hedge the commodity leg with USD-denominated futures or swaps and manage FX separately through forwards or options. This works, but it treats the two risks as independent. Quanto and compo options take a different approach by packaging both risks into a single instrument, but the way each handles FX risk creates some pricing and hedging subtleties that I find are easy to miss.\nThis article works through both structures using WTI/CAD as an example and focuses on a few practical questions:\nWhy does the quanto adjustment exist and what drives its magnitude? How do FX volatility and correlation enter the pricing of each structure, and why do they enter differently? When does each structure suit a given participant? How do dealers hedge each structure, and which risks are hardest to manage? Payoff Structures We work with two assets throughout this article. Let F denote the USD price of WTI crude futures, and let X denote the spot USDCAD exchange rate, quoted as Canadian dollars per one US dollar. A call option struck at K has payoff at expiry T as follows.\nQuanto Call\n$$\\text{Quanto Payoff} = \\bar{X} \\cdot \\max(F_T - K, 0)$$where $\\bar{X}$ is a fixed exchange rate agreed at inception. The buyer receives the intrinsic value of the oil option, converted at a predetermined rate, regardless of where USDCAD trades at expiry. The FX rate is contractually frozen.\nCompo Call\n$$\\text{Compo Payoff} = \\max(X_T F_T - K, 0)$$where K is denominated in CAD. Here the oil price is first converted to CAD at the prevailing spot rate $X_T$, and the option is exercised based on whether that CAD-denominated oil price exceeds the CAD strike. The option is in the money only if the full currency-converted price clears the hurdle. Both oil moves and FX moves determine whether and by how much the option pays.\nThe distinction is obvious. In the quanto, the buyer has pure oil exposure with zero residual FX risk: FX only affects the fixed conversion notional. In the compo, the strike itself is in CAD, so a weakening USD can push the option out of the money even if oil rises in USD terms, and a strengthening USD can push the option into the money even if oil is flat. The exercise decision and the payoff magnitude are both affected by FX. The compo is inherently a two-dimensional product.\nDynamics of F and X Under the CAD Risk-Neutral Measure To price these instruments consistently we must work under a single measure. Since payoffs are denominated in CAD, we use the CAD risk-neutral measure $\\mathbb{Q}^{CAD}$.\nLet $r_d$ be the CAD interest rate and $r_f$ be the USD interest rate. Under $\\mathbb{Q}^{CAD}$:\n$$\\frac{dX}{X} = (r_d - r_f)\\,dt + \\sigma_X\\,dW_X^{CAD}$$This is standard: the drift of USDCAD under the domestic (CAD) measure equals the interest rate differential, and $\\sigma_X$ is the FX volatility.\nFor WTI futures, the natural starting point is the USD risk-neutral measure $\\mathbb{Q}^{USD}$. Under $\\mathbb{Q}^{USD}$, futures are martingales by no-arbitrage, so F is driftless:\n$$\\frac{dF}{F} = \\sigma_F\\,dW_F^{USD}$$The two Brownian motions $W_F^{USD}$ and $W_X^{USD}$ have instantaneous correlation $\\rho$:\n$$dW_F^{USD}\\,dW_X^{USD} = \\rho\\,dt$$Since X is quoted as CAD per USD, a rally in oil that strengthens CAD causes USDCAD to fall. This is not incidental: Canada is one of the world\u0026rsquo;s largest oil exporters, and CAD is widely regarded as a petrocurrency whose value is closely tied to energy prices. Oil returns and USDCAD returns therefore move in opposite directions systematically, giving $\\rho \u003c 0$ for this pair.\nSince measure change only adds a drift and leaves quadratic covariation unchanged, $\\rho$ is invariant under the measure change. The same correlation holds under $\\mathbb{Q}^{CAD}$:\n$$dW_F^{CAD}\\,dW_X^{CAD} = \\rho\\,dt$$Since our payoffs are denominated in CAD, we need to express $F$ under $\\mathbb{Q}^{CAD}$ rather than $\\mathbb{Q}^{USD}$. The key result is that under $\\mathbb{Q}^{CAD}$, $F$ is no longer driftless but acquires a drift of $-\\rho\\,\\sigma_F\\,\\sigma_X$, known as the quanto adjustment. This drift arises entirely from the correlation between $F$ and $X$ and would vanish if the two were independent.\nThe derivation below shows how this drift emerges from the Radon-Nikodym derivative linking the two measures. Readers comfortable with the result can skip ahead to the next section.\nDerivation: measure change from $\\mathbb{Q}^{USD}$ to $\\mathbb{Q}^{CAD}$ To move from $\\mathbb{Q}^{USD}$ to $\\mathbb{Q}^{CAD}$ we need the Radon-Nikodym derivative linking the two measures (See this article for the general change-of-numeraire framework). The USD and CAD risk-neutral measures are both obtained by discounting with their respective money market accounts, and the exchange rate X connects them. The Radon-Nikodym derivative is proportional to the ratio of the USD and CAD numeraires expressed in a common currency:\n$$\\frac{d\\mathbb{Q}^{USD}}{d\\mathbb{Q}^{CAD}}\\bigg|_T = \\frac{X_T / X_0}{e^{(r_d - r_f)T}}$$This is the value at time T of one unit of USDCAD forward, normalized to start at 1. Since X follows geometric Brownian motion under $\\mathbb{Q}^{CAD}$, we can write this Radon-Nikodym derivative explicitly as:\n$$\\left.\\frac{d\\mathbb{Q}^{USD}}{d\\mathbb{Q}^{CAD}}\\right|_T = \\exp\\!\\left(-\\frac{1}{2}\\sigma_X^2 T + \\sigma_X W_X^{CAD}(T)\\right)$$This is the standard Girsanov density for a constant shift $\\sigma_X$. The Radon-Nikodym derivative is driven entirely by $W_X^{CAD}$, the Brownian motion of the exchange rate process. By Girsanov\u0026rsquo;s theorem, $W_X^{CAD}$ acquires a drift when we move to $\\mathbb{Q}^{USD}$:\n$$dW_X^{USD} = dW_X^{CAD} - \\sigma_X\\, dt$$Why F Acquires a Drift\nThe Brownian motions $W_F^{USD}$ and $W_X^{USD}$ have instantaneous correlation $\\rho$, meaning we can always decompose:\n$$dW_F^{USD} = \\rho\\, dW_X^{USD} + \\sqrt{1-\\rho^2}\\, dW_\\perp$$where $W_\\perp$ is a Brownian motion independent of $W_X^{USD}$. Applying the measure change $dW_X^{USD} = dW_X^{CAD} - \\sigma_X\\,dt$:\n$$dW_F^{USD} = \\rho(dW_X^{CAD} - \\sigma_X\\,dt) + \\sqrt{1-\\rho^2}\\,dW_\\perp = -\\rho\\,\\sigma_X\\,dt + \\rho\\,dW_X^{CAD} + \\sqrt{1-\\rho^2}\\,dW_\\perp$$The diffusion part $\\rho\\,dW_X^{CAD} + \\sqrt{1-\\rho^2}\\,dW_\\perp$ has instantaneous correlation $\\rho$ with $W_X^{CAD}$, which is exactly the definition of $dW_F^{CAD}$, so we write:\n$$dW_F^{USD} = -\\rho\\,\\sigma_X\\,dt + dW_F^{CAD}$$Substituting into the SDE for $F$:\n$$\\frac{dF}{F} = \\sigma_F\\,dW_F^{USD} = \\sigma_F\\!\\left(dW_F^{CAD} - \\rho\\,\\sigma_X\\,dt\\right)$$Under $\\mathbb{Q}^{CAD}$, $F$ therefore follows:\n$$\\frac{dF}{F} = -\\rho\\,\\sigma_F\\,\\sigma_X\\,dt + \\sigma_F\\,dW_F^{CAD}$$The drift $-\\rho\\,\\sigma_F\\,\\sigma_X$ is the quanto adjustment. It is a direct consequence of changing the measure of a correlated asset.\nThe Quanto Adjustment: Why It Exists Although the quanto adjustment $-\\rho\\,\\sigma_F\\,\\sigma_X$ emerges naturally from the measure change derivation, it is worth building an intuition for why it exists and why its magnitude takes the form it does. The hedging argument provides a clean economic explanation.\nConsider a bank that has sold a quanto forward to a client: at maturity the bank pays $F_T \\cdot \\bar{X}$ in CAD, where $\\bar{X}$ is the fixed exchange rate agreed at inception. To hedge, the bank goes long a regular WTI futures contract and converts the USD proceeds at the market rate $X_T$ at maturity, receiving $F_T \\cdot X_T$ in CAD.\nThe bank\u0026rsquo;s hedging P\u0026amp;L is:\n$$F_T \\cdot X_T - F_T \\cdot \\bar{X} = F_T(X_T - \\bar{X})$$Setting $\\bar{X} = \\mathbb{E}[X_T]$ to simplify the illustration1, the expected hedging cost becomes:\n$$\\mathbb{E}[F_T(X_T - \\bar{X})] = \\mathbb{E}[F_T X_T] - \\mathbb{E}[F_T]\\,\\mathbb{E}[X_T] = \\text{Cov}(F_T, X_T)$$Since $\\rho \u003c 0$ for this pair, the covariance is negative: the simple hedge bleeds in expectation. To break even, the bank must charge the client a forward price above $F_0$. The required markup is determined by the covariance $\\text{Cov}(F_T, X_T) = \\rho \\sigma_F \\sigma_X T$\nIn continuous time, the bleeding accumulates at rate $\\rho\\,\\sigma_F\\,\\sigma_X$ per unit time proportionally to the current level of $F$, giving the SDE under $\\mathbb{Q}^{CAD}$:\n$$\\frac{dF}{F} = -\\rho\\,\\sigma_F\\,\\sigma_X\\,dt + \\sigma_F\\,dW_F^{CAD}$$Compounding this proportional drift over $T$ gives the quanto-adjusted forward, which we denote $F_0^*$:\n$$F_0^* \\equiv \\mathbb{E}^{\\mathbb{Q}^{CAD}}[F_T] = F_0 \\cdot e^{-\\rho\\,\\sigma_F\\,\\sigma_X\\,T}$$Since $\\rho \u003c 0$, the exponent is positive and $F_0^* \u003e F_0$. The more negative $\\rho$ is, and the larger $\\sigma_F$ and $\\sigma_X$ are, the more the simple hedge bleeds and the greater the adjustment required.\nPricing the Quanto Option The quanto call eliminates FX risk by fixing the conversion rate at $\\bar{X}$. The payoff is linear in F alone. Under $\\mathbb{Q}^{CAD}$, we need to price:\n$$V_{quanto} = \\bar{X} \\cdot e^{-r_d T} \\cdot \\mathbb{E}^{CAD}\\left[\\max(F_T - K, 0)\\right]$$Since F under $\\mathbb{Q}^{CAD}$ is lognormal with forward $F_0^*$ as derived above, applying the Black formula directly:\n$$V_{quanto} = \\bar{X} \\cdot e^{-r_d T} \\cdot \\left[F_0^*\\,N(d_1) - K\\,N(d_2)\\right]$$where:\n$$d_1 = \\frac{\\ln(F_0^*/K) + \\frac{1}{2}\\sigma_F^2\\,T}{\\sigma_F\\sqrt{T}}, \\qquad d_2 = d_1 - \\sigma_F\\sqrt{T}$$This is simply a Black formula with the oil future price replaced by $F_0^*$. The vol input is $\\sigma_F$ alone: FX volatility enters only through the correlation term absorbed into $F_0^*$, and a larger negative covariance results in a higher adjusted forward and a higher call value.\nPricing the Compo Option The compo payoff is $\\max(X_T F_T - K, 0)$ where K is in CAD. The key observation is that $X_T F_T$ is itself a lognormal under $\\mathbb{Q}^{CAD}$, since it is the product of two correlated lognormals. We define the CAD-denominated oil price:\n$$S_T = X_T F_T$$The SDE for $S_T$ follows from Itô\u0026rsquo;s lemma applied to the product of $X_T$ and $F_T$. Under $\\mathbb{Q}^{CAD}$:\n$$\\frac{dS}{S} = (r_d - r_f)\\,dt + \\sigma_X\\,dW_X^{CAD} + \\sigma_F\\,dW_F^{CAD}$$The cross term $\\rho\\sigma_X\\sigma_Fdt$ from Itô\u0026rsquo;s lemma exactly cancels the quanto drift. As a result, $S_T$ drifts at $(r_d - r_f)$. This reflects that $S_t$ is a USD-denominated commodity price expressed in CAD units, whose drift is governed by the relative pricing of USD and CAD under the CAD measure. The diffusion is driven jointly by $W_X^{CAD}$ and $W_F^{CAD}$.\nThe compo call is then simply a call on $S_T$ struck at K, all in CAD:\n$$V_{compo} = e^{-r_d T} \\cdot \\mathbb{E}^{\\mathbb{Q}^{CAD}}\\left[\\max(S_T - K, 0)\\right]$$To apply Black\u0026rsquo;s formula we need the forward and the volatility of $S_T$.\n$$S_0^{fwd} = F_0 \\cdot X_0\\,e^{(r_d - r_f)T}$$$$\\sigma_{compo} = \\sqrt{\\sigma_F^2 + 2\\rho\\,\\sigma_F\\,\\sigma_X + \\sigma_X^2}$$Applying Black\u0026rsquo;s formula directly to $S_T$:\n$$V_{compo} = e^{-r_d T}\\left[S_0^{fwd}\\,N(d_1^c) - K\\,N(d_2^c)\\right]$$where:\n$$d_1^c = \\frac{\\ln(S_0^{fwd}/K) + \\frac{1}{2}\\sigma_{compo}^2\\, T}{\\sigma_{compo}\\sqrt{T}}, \\qquad d_2^c = d_1^c - \\sigma_{compo}\\sqrt{T}$$The compo is a standard Black call on the CAD-denominated oil forward, with a composite vol that blends oil vol, FX vol, and their covariance. FX volatility enters quadratically, unlike through the drift as in the quanto.\nWhile it may seem that a higher FX vol always increases the compo vol, this is not always the case. Taking the derivative of $\\sigma_{compo}$ with respect to $\\sigma_X$:\n$$\\frac{\\partial\\,\\sigma_{compo}}{\\partial\\,\\sigma_X} = \\frac{\\sigma_X + \\rho\\,\\sigma_F}{\\sigma_{compo}}$$This is positive only when $\\sigma_X \u003e -\\rho\\,\\sigma_F$. When $\\rho$ is negative, increasing $\\sigma_X$ can decrease $\\sigma_{compo}$ because the negative cross term $2\\rho\\,\\sigma_F\\,\\sigma_X$ grows in magnitude faster than the $\\sigma_X^2$ term. Intuitively, when $\\rho \u003c 0$, oil and FX move in opposite directions, and large FX moves increasingly offset the oil moves in CAD terms. In an extreme case of large $\\sigma_X$ and very negative $\\rho$, the FX leg almost perfectly hedges the oil leg and $S_T$ barely moves at all. The compo option can therefore be cheaper than a plain oil option when correlation is sufficiently negative, reflecting the natural hedge between oil and CAD that was discussed in the quanto adjustment section.\nWhich Structure Suits Which Participant? Currency of Exposure The most fundamental question is what currency the participant\u0026rsquo;s exposure actually lives in. A Canadian producer or refiner whose budget constraint is a CAD breakeven price is asking \u0026ldquo;is oil above C$\\$130$?\" rather than \"is oil above \\$100 USD?\u0026rdquo;. For that participant the compo is the natural fit: the strike is set directly in their decision currency and the exercise decision aligns with their actual P\u0026amp;L. The quanto can hedge the same oil exposure but the exercise is made in USD terms, introducing a mismatch against a CAD budget that the participant must then manage separately. Conversely, a fund reporting in CAD whose mandate is pure commodity exposure benefits from the quanto\u0026rsquo;s fixed conversion rate, which removes USD/CAD as a P\u0026amp;L variable entirely and keeps the oil attribution clean.\nRelative Cost Once the currency question is settled, cost becomes the next consideration, and neither structure is universally cheaper. At $\\rho = 0$ the compo tends to be more expensive because it embeds FX risk directly into the payoff, raising the total volatility $\\sigma_{compo}$. But as $\\rho$ becomes more negative, the cross term in $\\sigma_{compo}$ works in the buyer\u0026rsquo;s favour, and for WTI/CAD where $\\rho$ is meaningfully negative and $\\sigma_F$ is substantially larger than $\\sigma_X$, the compo can be cheaper than the quanto. The crossover point where the compo premium falls below the quanto premium depends on the interplay between the compo vol reduction and the quanto forward adjustment $F_0^*$, and participants who are indifferent between the two payoff structures should price both under current market inputs before deciding.\nOperational Complexity Operational simplicity favours the quanto, particularly for corporate treasuries and smaller counterparties. Both structures require $\\sigma_F$, $\\sigma_X$, and $\\rho$, none of which are directly observable. But in the quanto, $\\rho$ enters only through the drift adjustment in $F_0^*$, and once that adjusted forward is computed the valuation reduces to a standard single-underlying Black formula. In the compo, $\\rho$ enters $\\sigma_{compo}$ directly and the sensitivity of the option value to correlation is more immediate and material, making independent marking harder for a treasury without a dedicated quant function. Beyond valuation, the cross-gamma and correlation risks discussed in the next section mean dealers may charge a wider bid-offer spread on the compo, which partially offsets any premium saving from the lower composite vol and should be factored into the all-in cost comparison.\nThe pricer below allows direct comparison of both structures under user-specified inputs. The hedging section that follows explains how dealers manage each once the trade is on.\nMarket inputs\nWTI futures price F_0 (USD/bbl) $75 USD/CAD spot X_0 (CAD per USD) 1.36 Fixed FX rate X̄ (quanto, CAD per USD) 1.36 Moneyness (% of adjusted forward) 100% (ATM) Volatility \u0026amp; correlation\nWTI vol σ_F 30% USD/CAD vol σ_X 8% Correlation ρ (F, X) −0.45 Rates \u0026amp; tenor\nDomestic rate r_d (CAD) 3.75% Foreign rate r_f (USD) 4.50% Tenor (months) 12 mo Quanto Compo Comparison Quanto put — payoff: X̄ · max(K − F_T, 0) in CAD\nQuanto-adjusted forward (USD/bbl)\n—\nStrike K (USD/bbl)\n—\nQuanto fwd adjustment\n—\nd_1\n—\nd_2\n—\nPut premium (USD/bbl)\n—\nPut premium in CAD (× X̄)\n—\nAs % of F_0 · X̄\n—\nCompo put — payoff: max(K − X_T · F_T, 0) in CAD\nCAD oil spot S_0 = F_0 · X_0\n—\nCAD oil forward S_0^fwd\n—\nStrike K (CAD/bbl)\n—\nComposite vol σ_compo\n—\nd_1\n—\nd_2\n—\nPut premium (CAD/bbl)\n—\nAs % of S_0\n—\n∂σ_compo / ∂σ_X\n—\nGreeks\nDelta (∂P/∂S_0)\n—\nVega (per 1% σ_F)\n—\nCorr sens (per +0.1 ρ)\n—\nFX delta (∂P/∂X_0)\n—\nBoth premiums in CAD/bbl\nQuanto (CAD/bbl)\n—\nCompo (CAD/bbl)\n—\nDifference (Q − C)\n—\nCheaper by\n—\nWhat drives the difference\nQuanto fwd adj exp(−ρσ_Fσ_X T)\n—\nVol used: quanto (σ_F)\n—\nVol used: compo (σ_compo)\n—\nHedging Each Structure Understanding which structure fits a given exposure is only half the picture. Once a dealer has sold either instrument, the more operationally demanding question is how to manage the risk through the life of the trade. The two structures present meaningfully different hedging problems.\nHedging the Quanto\nThe quanto call has only one source of price risk: the level of WTI. Because the exchange rate is fixed contractually, USD/CAD is not a risk factor and the dealer runs no FX delta. The delta hedge is therefore a straightforward position in WTI futures, sized to the Black delta evaluated at the quanto-adjusted forward. As WTI moves, the futures position is rebalanced in the usual way.\nAlthough there is no FX delta, a USD/CAD forward is still required for currency translation. The WTI futures position generates P\u0026amp;L in USD while the liability to the option holder is in CAD. The dealer enters a USD/CAD forward sized to the expected dollar value of the futures position to convert those proceeds into CAD at a known rate. This forward is updated as the delta is rebalanced. It is a funding hedge rather than a risk-factor hedge. It does not arise from any sensitivity of the option value to the exchange rate, but from the operational mismatch between the currency of the hedging instrument and the currency of the liability.\nThe more subtle risk in the quanto is correlation between WTI returns and USD/CAD returns. Correlation enters the pricing formula through the drift of the oil forward under the CAD measure, and the dealer who sold the option carries residual exposure to shifts in this parameter. This is difficult to hedge because correlation is not directly traded. Pure correlation products such as covariance swaps exist but are often illiquid in the commoidty/FX market. In practice dealers manage correlation exposure within book limits, accepting that residual exposure will sit on the book as a managed risk. The sensitivity is relatively contained, however, because correlation enters only through the drift adjustment rather than through the volatility of the underlying itself.\nHedging the Compo\nThe compo is more complex because both WTI and USD/CAD are live risk factors. The payoff depends on the product $F \\cdot X$, so the dealer must run two delta hedges simultaneously: a WTI futures position and a USD/CAD forward position. The two hedges are coupled: the WTI delta is proportional to the current spot FX rate, and the FX delta is proportional to the current WTI forward. Every time one underlying moves, the hedge notional in the other leg must be updated. This cross-gamma cannot be fully eliminated with vanilla instruments, and is typically treated as a cost of carry priced into the bid-offer spread at inception.\nOn the volatility side, both WTI vol and FX vol enter $\\sigma_{compo}$, so the dealer carries independent vega in each. WTI vega is hedged with WTI options, and FX vega with USD/CAD options. As discussed in the pricing section, the sign of FX vega depends on $\\sigma_X + \\rho\\sigma_F$. For WTI/CAD where $\\rho \u003c 0$, this quantity can be negative, meaning the dealer who sold the option is long rather than short FX vega. Dealers must verify this sign before structuring the USD/CAD options overlay, as the positive-correlation intuition would produce a hedge in the wrong direction.\nCorrelation risk in the compo is more material than in the quanto because $\\rho$ enters $\\sigma_{compo}$ directly rather than just the drift. The instinctive view is that a dealer who sold a call is short correlation, since higher $\\rho$ increases $\\sigma_{compo}$ and raises the option value. But this reasoning imports a positive-correlation assumption silently. For WTI/CAD where $\\rho$ is negative, a move toward more positive correlation does hurt the dealer, while a further decline in correlation reduces $\\sigma_{compo}$ and benefits them. The direction of the exposure is not fixed: it depends on where $\\rho$ currently sits and which way it moves. As with the quanto, residual correlation risk is carried on the book and priced into the spread at inception.\nComparative Summary\nQuanto Compo WTI delta WTI futures at $F_0^*$ WTI futures; notional scales with spot FX FX hedge USD/CAD forward for P\u0026amp;L translation only; not a risk-factor hedge USD/CAD forward as risk-factor delta hedge; notional scales with WTI forward Cross-gamma None Cannot be fully hedged with vanilla instruments; treated as cost of carry and priced into the spread WTI vega WTI options WTI options FX vega Enters only via $F_0^*$; less material than WTI vega for short-dated trades, but grows with tenor and correlation USD/CAD options; sign of exposure determined by $\\sigma_X + \\rho\\sigma_F$ Correlation risk Via drift; relatively contained Via $\\sigma_{compo}$; more material; sign depends on level of $\\rho$ Correlation hedge Residual book risk; priced into spread Residual book risk; priced into spread Overall complexity Moderate Higher; two coupled dynamic hedges A Note on Calibrating $\\rho$ Throughout this article, $\\rho$ appears in every formula and drives many of the key risk management decisions. In practice, calibrating it is less straightforward than calibrating $\\sigma_F$ or $\\sigma_X$, both of which can be implied from liquid option markets. Correlation has no directly quoted instrument. Dealers typically estimate $\\rho$ from historical return series, often using rolling windows of varying length and weighting schemes that emphasise recent observations. The choice of window, frequency, and whether to use spot or futures returns can produce meaningfully different estimates. Some desks supplement historical estimates with implied correlation backed out from traded cross-asset products where available, though for WTI/CAD such instruments are sparse. The resulting uncertainty in $\\rho$ is itself a source of model risk, and given how directly it enters both the quanto drift adjustment and the compo composite volatility, even modest miscalibration can shift prices and hedge ratios materially. This is one of the reasons dealers price correlation risk into the spread rather than attempting to hedge it precisely.\nWithout this simplification, the expected hedging cost contains an additional term $\\mathbb{E}[F_T](\\mathbb{E}[X_T] - \\bar{X})$. This term is purely deterministic since both $\\mathbb{E}[F_T]$ and $\\mathbb{E}[X_T]$ are known at inception and $\\bar{X}$ is a fixed contractual rate, and it vanishes when $\\bar{X} = \\mathbb{E}[X_T]$. The covariance term is the only irreducible source of hedging cost and is the true quanto effect.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/quanto_and_compo/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eMany of the world\u0026rsquo;s most actively traded commodities are priced in USD, yet end investors and corporates often operate in other currencies. A Canadian oil producer hedging output, a European airline managing jet fuel costs, or an Asian sovereign wealth fund allocating to commodity exposure all face the same underlying issue: commodity risk does not exist in isolation from FX risk. The standard approach is to hedge the commodity leg with USD-denominated futures or swaps and manage FX separately through forwards or options. This works, but it treats the two risks as independent. Quanto and compo options take a different approach by packaging both risks into a single instrument, but the way each handles FX risk creates some pricing and hedging subtleties that I find are easy to miss.\u003c/p\u003e","title":"Quanto and Compo Commodity Options: FX's Hidden Role in Pricing and Risk"},{"content":"Why This Matters In the article on Girsanov\u0026rsquo;s Theorem, we studied how the real-world measure $\\mathbb{P}$ and the risk-neutral measure $\\mathbb{Q}$ relate, and showed that switching between them amounts to reweighting paths via the Girsanov exponential. Throughout, the risk-free bond was the numéraire: the asset against which all prices were expressed. But this is a convenient choice, not a fundamental one.\nAny strictly positive self-financing wealth process can serve as a numéraire, and each choice gives a different probability measure under which asset prices, expressed in units of that numéraire, become martingales. The price of a derivative is invariant; what changes is how the problem is represented. So instead of viewing pricing as a fixed-measure expectation problem, it is often more natural to think of it as choosing the numéraire that best matches the structure of the payoff.\nWe will see this concretely through the exchange option, pricing it under two different measures. One choice makes the calculation almost trivial, while the other leaves the true structure of the problem hidden in plain sight.\nA Motivating Example: The Exchange Option An exchange option gives the holder the right to exchange asset $S^2$ for asset $S^1$ at time $T$. Its payoff is:\n$$V_T = \\max(S^1_T - S^2_T, 0)$$Assume both assets follow geometric Brownian motion under $\\mathbb{P}$:\n$$dS^i_t = \\mu_i S^i_t dt + \\sigma_i S^i_t dW^i_t, \\qquad i = 1, 2$$with $d\\langle W^1, W^2 \\rangle_t = \\rho dt$.\nLet us price it two ways: under the risk-neutral measure, and under the stock measure using $S^2$ as numéraire.\nApproach 1: Pricing Under the Risk-Neutral Measure $\\mathbb{Q}$ Under the risk-neutral measure $\\mathbb{Q}$, the drifts are replaced by the risk-free rate $r$, and the terminal values are:\n$$S^i_T = S^i_0 \\exp\\left[\\left(r - \\tfrac{1}{2}\\sigma_i^2\\right)T + \\sigma_i \\sqrt{T} \\xi^i\\right]$$where $(\\xi^1, \\xi^2)$ is a bivariate standard normal with correlation $\\rho$.\nThe price is:\n$$V_0 = e^{-rT} \\mathbb{E}^{\\mathbb{Q}}\\left[\\max(S^1_T - S^2_T, 0)\\right]$$Write $X = \\ln S^1_T$ and $Y = \\ln S^2_T$, and define the log-ratio $Z = X - Y = \\ln(S^1_T / S^2_T)$. Under $\\mathbb{Q}$, $Z$ is normally distributed:\n$$Z \\sim \\mathcal{N}\\left(\\ln\\frac{S^1_0}{S^2_0} + \\left(-\\tfrac{1}{2}\\sigma_1^2 + \\tfrac{1}{2}\\sigma_2^2\\right)T,\\sigma^2 T\\right)$$where $\\sigma^2 = \\sigma_1^2 + \\sigma_2^2 - 2\\rho\\sigma_1\\sigma_2$. The payoff condition $S^1_T \u003e S^2_T$ is simply $Z \u003e 0$, so we can write:\n$$V_0 = e^{-rT}\\mathbb{E}^{\\mathbb{Q}}\\!\\left[e^X \\mathbf{1}_{Z \u003e 0}\\right] - e^{-rT}\\mathbb{E}^{\\mathbb{Q}}\\!\\left[e^Y \\mathbf{1}_{Z \u003e 0}\\right]$$Each term involves a log-normal multiplied by an indicator on a correlated normal, so the expectations do not factorise directly. They can be evaluated using the log-normal identity1, applied to the pairs $(X, Z)$ and $(Y, Z)$, giving:\n$$\\boxed{V_0 = S^1_0 \\, \\Phi(d_1) - S^2_0 \\, \\Phi(d_2)}$$where $d_1 = \\dfrac{\\ln(S^1_0/S^2_0) + \\frac{1}{2}\\sigma^2 T}{\\sigma\\sqrt{T}}$ and $d_2 = d_1 - \\sigma\\sqrt{T}$.\nThe result is correct, but getting there required peeling apart two correlated expectations, computing covariances, and standardising shifted events. And after all that work, it is still not clear why the price depends on two stocks in that particular way. The stock measure will make it obvious.\nApproach 2: Pricing Under the Stock Measure $\\mathbb{Q}^2$ Now use $S^2$ as the numéraire. Under the associated measure $\\mathbb{Q}^2$, any asset price divided by $S^2$ is a martingale. The pricing formula becomes:\n$$V_0 = S^2_0 \\, \\mathbb{E}^{\\mathbb{Q}^2}\\!\\left[\\frac{V_T}{S^2_T}\\right] = S^2_0 \\, \\mathbb{E}^{\\mathbb{Q}^2}\\!\\left[\\max\\!\\left(\\frac{S^1_T}{S^2_T} - 1, \\, 0\\right)\\right]$$Define $R_t = S^1_t / S^2_t$. The option has become a standard call on $R$ with strike 1 under $\\mathbb{Q}^2$.\nWhat does $R_t$ look like under $\\mathbb{Q}^2$? Since $S^1/S^2$ must be a martingale under $\\mathbb{Q}^2$, by Itô\u0026rsquo;s formula:\n$$\\frac{d R_t}{R_t} = (\\ldots) \\, dt + \\sigma_1 \\, d\\widetilde{W}^1_t - \\sigma_2 \\, d\\widetilde{W}^2_t$$where $\\widetilde{W}^i$ are Brownian motions under $\\mathbb{Q}^2$. The drift must be zero. The volatility is unchanged between measures, so:\n$$\\frac{d R_t}{R_t} = \\sigma \\, d\\widetilde{W}_t$$where $\\sigma = \\sqrt{\\sigma_1^2 + \\sigma_2^2 - 2\\rho\\sigma_1\\sigma_2}$ is the relative volatility and $\\widetilde{W}$ is a single Brownian motion under $\\mathbb{Q}^2$.\nSo $R_T$ is log-normal with no drift, volatility $\\sigma$, and initial value $R_0 = S^1_0/S^2_0$. The option is a call on $R$ with strike 1 — exactly a Black-Scholes call:\n$$V_0 = S^2_0 \\left[R_0 \\, \\Phi(d_1) - 1 \\cdot \\Phi(d_2)\\right] = S^1_0\\,\\Phi(d_1) - S^2_0\\,\\Phi(d_2)$$Same answer, and this derivation is more direct. Choosing $S^2$ as numéraire turns the exchange option into a call on the ratio $S^1/S^2$, with no covariance bookkeeping and no shifted events. More importantly, the stock measure reveals what the risk-neutral measure keeps hidden: the exchange option is fundamentally a bet on the relative performance of two assets, and $\\sigma = \\sqrt{\\sigma_1^2 + \\sigma_2^2 - 2\\rho\\sigma_1\\sigma_2}$ is the only risk that matters.\nThe Stock Measure and Its Relation to $\\mathbb{Q}$ What exactly happens to the measure when we switch numéraire from $B_t$ to $S^2_t$? The exchange option showed us that $R_t = S^1_t/S^2_t$ becomes driftless, but not why. Here we make that clear.\n$$\\mathbb{Q} \\xrightarrow{\\text{change of numéraire}} \\mathbb{Q}^2$$The numéraire for $\\mathbb{Q}$ is the bond $B_t = e^{rt}$. The numéraire for $\\mathbb{Q}^2$ is $S^2_t$. To find the Radon-Nikodym derivative between them, recall that any asset $V_t$, expressed in units of $B_t$, must be a $\\mathbb{Q}$-martingale:\n$$\\frac{V_t}{B_t} = \\mathbb{E}^{\\mathbb{Q}}\\!\\left[\\frac{V_T}{B_T} \\,\\bigg|\\, \\mathcal{F}_t\\right]$$We want the same asset, now expressed in units of $S^2_t$, to be a $\\mathbb{Q}^2$-martingale. Using the change of measure formula $\\mathbb{E}^{\\mathbb{Q}^2}[X] = \\mathbb{E}^{\\mathbb{Q}}\\!\\left[X \\cdot \\frac{d\\mathbb{Q}^2}{d\\mathbb{Q}}\\right]$, this requires:\n$$\\frac{V_t}{S^2_t} = \\mathbb{E}^{\\mathbb{Q}^2}\\!\\left[\\frac{V_T}{S^2_T} \\,\\bigg|\\, \\mathcal{F}_t\\right] = \\mathbb{E}^{\\mathbb{Q}}\\!\\left[\\frac{V_T}{S^2_T} \\cdot \\frac{d\\mathbb{Q}^2}{d\\mathbb{Q}} \\,\\bigg|\\, \\mathcal{F}_t\\right]$$Comparing the two expressions, the unique process that makes this hold for every asset $V$ is:\n$$\\left.\\frac{d\\mathbb{Q}^2}{d\\mathbb{Q}}\\right|_{\\mathcal{F}_T} = \\frac{S^2_T / S^2_0}{B_T / B_0} = \\frac{S^2_T}{S^2_0 \\, e^{rT}}$$The ratio of the two numéraires, normalised at time 0, is the only Radon-Nikodym derivative consistent with both martingale conditions simultaneously.\nThis is the discounted price of $S^2$, which is a $\\mathbb{Q}$-martingale as required. Explicitly, since $S^2_T = S^2_0 \\exp\\!\\left[(r - \\frac{1}{2}\\sigma_2^2)T + \\sigma_2 \\sqrt{T}\\, \\xi^2\\right]$ under $\\mathbb{Q}$:\n$$\\frac{d\\mathbb{Q}^2}{d\\mathbb{Q}} = \\exp\\!\\left(-\\frac{1}{2}\\sigma_2^2 T + \\sigma_2 \\sqrt{T}\\, \\xi^2\\right)$$This is the Girsanov exponential with $\\theta = -\\sigma_2$, which shifts the drift of $W^2$ by $\\sigma_2$. Under $\\mathbb{Q}^2$, the Brownian motion becomes $\\widetilde{W}^2_t = W^2_t - \\sigma_2 t$. Substituting into the SDEs, the drift of each asset picks up an additional term from the volatility of the numéraire:\n$$dS^1_t = (r + \\rho\\sigma_1\\sigma_2) S^1_t \\, dt + \\sigma_1 S^1_t \\, d\\widetilde{W}^1_t$$ $$dS^2_t = (r + \\sigma_2^2) S^2_t \\, dt + \\sigma_2 S^2_t \\, d\\widetilde{W}^2_t$$For $S^1$, the extra drift comes through the correlation with $W^2$. For $S^2$, it is exactly $\\sigma_2^2$, the quadratic variation of the numéraire itself. Applying Itô\u0026rsquo;s lemma to the ratio $R_t = S^1_t/S^2_t$, these extra drift terms cancel exactly and we recover the driftless SDE from Section 2:\n$$dR_t = \\sigma R_t \\, d\\widetilde{W}_t$$confirming that $R_t = S^1_t/S^2_t$ is a martingale under $\\mathbb{Q}^2$.\nThe key point is that the measure change is driven entirely by the randomness of the new numéraire. The Radon-Nikodym derivative depends only on the Brownian shocks of $S^2$. Changing numéraire therefore reweights paths according to how $S^2$ fluctuates. Assets that are correlated with those fluctuations inherit a drift adjustment under the new measure. The shift from $\\mathbb{Q}$ to $\\mathbb{Q}^2$ is therefore sourced by the volatility and correlation structure of the numéraire, not by investor risk preferences.\nMeasure $dS^1_t$ drift $dS^2_t$ drift $\\mathbb{Q}$ $r$ $r$ $\\mathbb{Q}^2$ $r + \\rho\\sigma_1\\sigma_2$ $r + \\sigma_2^2$ The General Change of Numéraire Formula The exchange option gave us one instance of a numéraire change. The pattern it revealed generalises. Every time we price a derivative, we are implicitly choosing a numéraire. Choosing it to match the structure of the payoff is what makes the difference between a one-line derivation and a page of algebra.\nDefinition. A numéraire is any strictly positive adapted process $N_t$. The associated measure $\\mathbb{Q}^N$ is the probability measure under which every asset price $V_t$, expressed in units of $N_t$, is a martingale:\n$$\\frac{V_t}{N_t} = \\mathbb{E}^{\\mathbb{Q}^N}\\!\\left[\\frac{V_T}{N_T} \\,\\bigg|\\, \\mathcal{F}_t\\right]$$Theorem (Change of Numéraire). Let $M$ and $N$ be two numéraires. Then the measures $\\mathbb{Q}^M$ and $\\mathbb{Q}^N$ are related by:\n$$\\frac{d\\mathbb{Q}^N}{d\\mathbb{Q}^M}\\bigg|_{\\mathcal{F}_T} = \\frac{N_T / N_0}{M_T / M_0}$$Pricing formula. The price of a claim with payoff $V_T$ at time $T$ is the same under any numéraire:\n$$V_0 = N_0 \\, \\mathbb{E}^{\\mathbb{Q}^N}\\!\\left[\\frac{V_T}{N_T}\\right] = M_0 \\, \\mathbb{E}^{\\mathbb{Q}^M}\\!\\left[\\frac{V_T}{M_T}\\right]$$The practical question is which numéraire makes $V_T / N_T$ easiest to work with. In the exchange option, choosing $N_t = S^2_t$ turned $V_T/N_T = \\max(R_T - 1, 0)$ into a call on a driftless GBM. This pattern shows up across many problems: find the numéraire that makes the key ratio driftless, and the pricing reduces to a standard calculation. In practice, the right numéraire is usually the one already hiding inside the payoff.\nApplications The Forward Measure In the equity world, the risk-free rate is constant and discounting is trivial. In interest rate models, the discount factor $e^{-\\int_0^T r_s ds}$ is stochastic and correlated with the payoff. This is what makes rates derivatives hard under $\\mathbb{Q}$: the expectation $\\mathbb{E}^{\\mathbb{Q}}[e^{-\\int_0^T r_s ds} V_T]$ does not factorise.\nThe fix is to use the zero-coupon bond $P(t, T)$ as numéraire. Under the associated $T$-forward measure $\\mathbb{Q}^T$:\n$$V_0 = P(0, T) \\, \\mathbb{E}^{\\mathbb{Q}^T}[V_T]$$The stochastic discounting disappears entirely, absorbed into the measure change. The forward price $F(t,T) = \\mathbb{E}^{\\mathbb{Q}^T}[X_T \\,|\\, \\mathcal{F}_t]$ is a martingale under $\\mathbb{Q}^T$ by construction, which is exactly the driftless ratio we saw in the exchange option, just with $P(t,T)$ playing the role of $S^2$.\nTake caplets as an example, the payoff on the compounded SOFR rate over $[T,T+\\tau]$, paid at $T+\\tau$, becomes a call on the forward rate under the $T+\\tau$-forward measure. Assuming log-normality, Black’s formula applies immediately.\nThe Annuity Measure An interest rate swap has a more complex structure: its value depends on multiple payment dates $T_1, \\ldots, T_n$. The natural numéraire is not a single bond but the annuity:\n$$A_t = \\sum_{i=1}^{n} \\tau_i P(t, T_i)$$the present value of receiving one unit on each payment date. Under the swap measure $\\mathbb{Q}^A$, the swap rate $S_t$ (the fair fixed rate on the swap) is a martingale. This is the analogue of $R_t = S^1_t/S^2_t$ being driftless under $\\mathbb{Q}^2$: the swap rate is driftless under its natural measure.\nA payer swaption pays $A_T\\max(S_T - K, 0)$ at expiry. Under $\\mathbb{Q}^A$:\n$$V_0 = A_0 \\, \\mathbb{E}^{\\mathbb{Q}^A}[\\max(S_T - K, 0)]$$The stochastic annuity disappears into the measure change, leaving a plain option on the swap rate. Assuming $S_T$ is log-normal under $\\mathbb{Q}^A$, this becomes Black\u0026rsquo;s formula for interest rate swaptions.\nThe FX Measure In the equity world, there is one currency and one risk-free rate. In FX, there are two, and any valuation across currencies must explicitly account for the exchange rate.\nLet $X_t$ be the spot rate (domestic per foreign). Under the domestic risk-neutral measure $\\mathbb{Q}^d$: $$ dX_t = (r_d - r_f) X_t \\, dt + \\sigma X_t \\, dW_t $$The drift reflects the funding differential between the two economies: holding foreign currency earns $r_f$ instead of $r_d$, so the exchange rate must compensate through its drift under domestic pricing.\nBefore adopting the FX measure, it is worth being precise about what the numéraire is in this setting. Although $X_t$ enters directly into pricing problems, it is not itself a traded wealth process. Holding foreign currency is not static: it accrues interest at the foreign short rate $r_f$. Any self-financing position in foreign currency therefore grows like a money market account. This means the natural tradable benchmark is not $X_t$ alone, but the foreign money market account expressed in domestic currency: $$ N_t = X_t e^{r_f t} $$This distinction is important. Using $X_t$ alone ignores the carry from holding foreign cash and therefore does not correspond to any replicable trading strategy.\nAdopting the FX measure does not simplify the computation for most multi-currency derivatives. In practice, quantos and composites are still priced under $\\mathbb{Q}^d$. What the FX measure clarifies is the structure: it makes explicit how FX carry enters pricing through the numéraire, separating the funding differential from the underlying risk.\nConclusion Pricing is not tied to a single measure but is a choice of representation: every numéraire change preserves the price while altering how the problem is expressed. For the exchange option, the forward measure, and the annuity measure, embedding the structure of the payoff into the numéraire dissolves the complexity into the measure change itself. The FX measure illustrates that the same principle does not always deliver computational simplicity, but it remains a useful lens for understanding how carry and correlation enter multi-currency pricing.\nFor two jointly normal random variables $A$ and $B$: $\\mathbb{E}\\!\\left[e^A f(B)\\right] = e^{\\mu_A + \\frac{1}{2}\\sigma_A^2} \\, \\mathbb{E}\\!\\left[f(B + \\text{Cov}(A, B))\\right]$. Applied to the first term with $A = X$, $B = Z$: $\\text{Cov}(X, Z) = (\\sigma_1^2 - \\rho\\sigma_1\\sigma_2)T$, the scalar $e^{-rT} \\cdot e^{\\mu_X + \\frac{1}{2}\\sigma_1^2 T}$ simplifies to $S^1_0$, and the shifted event standardises to $\\Phi(d_1)$. The second term follows identically with $\\text{Cov}(Y, Z) = (\\rho\\sigma_1\\sigma_2 - \\sigma_2^2)T$.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/change_of_numeraire/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eIn the \u003ca href=\"/articles/quant-foundations/girsanov/\"\u003earticle on Girsanov\u0026rsquo;s Theorem\u003c/a\u003e, we studied how the real-world measure $\\mathbb{P}$ and the risk-neutral measure $\\mathbb{Q}$ relate, and showed that switching between them amounts to reweighting paths via the Girsanov exponential. Throughout, the risk-free bond was the numéraire: the asset against which all prices were expressed. But this is a convenient choice, not a fundamental one.\u003c/p\u003e\n\u003cp\u003eAny strictly positive self-financing wealth process can serve as a numéraire, and each choice gives a different probability measure under which asset prices, expressed in units of that numéraire, become martingales. The price of a derivative is invariant; what changes is how the problem is represented. So instead of viewing pricing as a fixed-measure expectation problem, it is often more natural to think of it as choosing the numéraire that best matches the structure of the payoff.\u003c/p\u003e","title":"The Measure We Choose: How Numéraires Simplify Pricing"},{"content":"Why This Matters We want to price a derivative. Under the real world measure $\\mathbb{P}$, we face two problems. First, we do not know the true drift $\\mu$ of the underlying, and historical estimates are notoriously unreliable. Second, even if we knew $\\mu$, taking the expected payoff under $\\mathbb{P}$ would still not give the market price. Risky cash flows must be discounted more heavily than guaranteed ones because investors are risk averse. Pricing under $\\mathbb{P}$ requires both the true probabilities of outcomes and a model for how the market prices risk. Both are fundamentally unobservable. So what can we do?\nIt turns out there is a way to bypass both problems entirely. Rather than estimating $\\mu$ and modelling risk aversion separately, we can work under a different probability measure where pricing is simple. But this raises an immediate question: is such a measure legitimate? And how does it relate to the real world?\nThis article is my attempt to answer both. We will take the First Fundamental Theorem of Asset Pricing (FTAP) as given, use it to work out what the risk-neutral measure must look like, and then show where Girsanov\u0026rsquo;s theorem enters and what it adds.\nWhat We Know Before Girsanov Taking FTAP as Given The First Fundamental Theorem of Asset Pricing states that in an arbitrage-free market, there exists a measure $\\mathbb{Q}$ under which the price of any asset, expressed in units of the numeraire, is a martingale. Equivalently, the price of any derivative equals the expected value of its payoff discounted by the numeraire. We will take FTAP as given here without proving it.\nIn principle, any positive traded asset can serve as a numeraire. Different choices lead to different probability measures, but they all give consistent prices for the same assets. For now we choose the simplest option: the risk-free bond $e^{rt}$. With this choice, the derivative price is:\n$$V_0 = e^{-rT}\\mathbb{E}^{\\mathbb{Q}}[g(S_T)]$$This is a remarkable result. The unknown drift $\\mu$ and the unobservable risk aversion have both disappeared. Pricing reduces to computing an expectation under $\\mathbb{Q}$. But what do we actually know about $\\mathbb{Q}$?\nTwo Properties That Follow Immediately Property 1: $\\mathbb{Q}$ must price the stock correctly.\nThe stock is already trading at an observed price $S_0$ today. The stock is itself a tradable payoff, in the same sense as a derivative, since it pays $S_T$ at a future date $T$. Under $\\mathbb{Q}$, it must be priced consistently:\n$$S_0 = e^{-rT}\\mathbb{E}^{\\mathbb{Q}}[S_T]$$Equivalently, the discounted stock price $e^{-rt}S_t$ must be a martingale under $\\mathbb{Q}$.\nProperty 2: The drift of the stock under $\\mathbb{Q}$ must be $r$.\nThis follows directly from Property 1. If the stock follows $S_t = S_0 e^{\\mu t + \\sigma W_t}$ under $\\mathbb{P}$, then for the discounted stock to be a martingale under $\\mathbb{Q}$, the drift must be exactly $r$. Any other drift would allow an arbitrage between the stock and the risk-free bond: if the stock drifted faster than $r$ under $\\mathbb{Q}$, we could borrow at $r$ and buy the stock for a riskless profit.\nSo the drift being $r$ under $\\mathbb{Q}$ is not an assumption. It is forced on us by the risk-free rate $r$, and no-arbitrage.\nWhat We Still Do Not Know These properties tell us a great deal about $\\mathbb{Q}$. But two questions remain unanswered.\nFirst, is $\\mathbb{Q}$ actually a legitimate probability measure? FTAP guarantees its existence abstractly, and we know its drift is $r$. But a valid probability measure must assign probabilities to all events consistently and integrate to one.\nSecond, how exactly does $\\mathbb{Q}$ relate to $\\mathbb{P}$? Knowing the drift is $r$ under $\\mathbb{Q}$ gives us one piece of information, but not the full structure of how the two measures are connected. If we want to convert expectations between $\\mathbb{P}$ and $\\mathbb{Q}$, or understand how the real world is reshaped into the pricing world, we need this relationship explicitly.\nThis is precisely what Girsanov\u0026rsquo;s theorem provides.\nThe Intuition: Reweighting Paths Drift Lives in the Probability Weights Our goal is to construct a new measure $\\mathbb{Q}$ under which the process has a different drift (replacing $\\mu$ with $r$). How to do that? Do we change the paths to change the drift?\nThat instinct turns out to be wrong.\nTo see why, consider two processes. Under $\\mathbb{P}$, a particle moves up with probability 0.7 and down with probability 0.3. Under $\\mathbb{Q}$, it moves up and down with probability 0.5 each. The important point is that the set of possible paths is identical under both measures. A trajectory like up, up, down exists in both worlds unchanged. Nothing about the paths themselves has been altered, only how likely each path is.\nThis is where a common intuition breaks. When people first see stochastic processes, they may read drift directly off a single realised path. A steadily rising stock chart is labelled as \u0026ldquo;positive drift,\u0026rdquo; while a flat or noisy one is seen as \u0026ldquo;low drift.\u0026rdquo; But this is an inference from one sample path, not a property of the process.\nThat interpretation fails because a single path carries no information about typical behaviour. A zero-drift process can produce strong upward trends, and a positive-drift process can still fall over finite horizons. What changes across models is not the path, but the distribution over paths.\nSo drift is not something encoded in trajectories. It is encoded in how probability is assigned to trajectories. If drift lives in the probability weights, then changing the drift from $\\mu$ to $r$ cannot mean modifying paths. It must mean reweighting them. That reweighting of path probabilities is exactly what a change of measure does.\nThe Per-Step Reweighting Consider a random walk where the underlying has drift $\\mu$ and volatility $\\sigma$. At each step, the particle moves up by $\\sigma\\sqrt{\\Delta t}$ or down by $\\sigma\\sqrt{\\Delta t}$. Under $\\mathbb{P}$, to match a process with drift $\\mu$, the probabilities are:\n$$\\mathbb{P}(\\text{up}) = \\frac{1}{2} + \\frac{\\mu\\sqrt{\\Delta t}}{2\\sigma}, \\qquad \\mathbb{P}(\\text{down}) = \\frac{1}{2} - \\frac{\\mu\\sqrt{\\Delta t}}{2\\sigma}$$Under $\\mathbb{Q}$, we want drift $r$ instead of $\\mu$, so the same step size $\\sigma\\sqrt{\\Delta t}$ must now be weighted to give a net drift of $r$:\n$$\\mathbb{Q}(\\text{up}) = \\frac{1}{2} + \\frac{r\\sqrt{\\Delta t}}{2\\sigma}, \\qquad \\mathbb{Q}(\\text{down}) = \\frac{1}{2} - \\frac{r\\sqrt{\\Delta t}}{2\\sigma}$$We now have two probability measures sitting side by side. Our goal is to compute the pricing expectation $\\mathbb{E}^{\\mathbb{Q}}[g(S_T)]$, but $\\mathbb{Q}$ is not a distribution we can sample from directly. It is a mathematical construction whose existence FTAP guarantees, while its explicit form is still something we need to construct. What we can do is simulate paths under $\\mathbb{P}$, since $\\mathbb{P}$ corresponds to historically observed dynamics, or any baseline measure from which we can simulate. So we need a way to evaluate a $\\mathbb{Q}$-expectation using $\\mathbb{P}$-paths, by reweighting each path by how much more or less likely it is under $\\mathbb{Q}$ than under $\\mathbb{P}$. The formal object that does this reweighting, the ratio $d\\mathbb{Q}/d\\mathbb{P}$, is called the Radon-Nikodym derivative. At each step it is simply the ratio of $\\mathbb{Q}$-probability to $\\mathbb{P}$-probability for the outcome that occurred.\nThe Radon-Nikodym derivative at each step is:\n$$\\frac{d\\mathbb{Q}}{d\\mathbb{P}}\\bigg|_{\\text{up}} = \\frac{1/2 + r\\sqrt{\\Delta t}/(2\\sigma)}{1/2 + \\mu\\sqrt{\\Delta t}/(2\\sigma)} \\approx 1 - \\frac{(\\mu-r)\\sqrt{\\Delta t}}{\\sigma}$$$$\\frac{d\\mathbb{Q}}{d\\mathbb{P}}\\bigg|_{\\text{down}} = \\frac{1/2 - r\\sqrt{\\Delta t}/(2\\sigma)}{1/2 - \\mu\\sqrt{\\Delta t}/(2\\sigma)} \\approx 1 + \\frac{(\\mu-r)\\sqrt{\\Delta t}}{\\sigma}$$Since $r \u003c \\mu$ in a typical equity market, upward moves get downweighted and downward moves get upweighted, just enough to cancel the excess drift $\\mu - r$. Note that the step size $\\sigma$ appears naturally in the denominator: a larger volatility means each step is larger, so a smaller probability adjustment is needed to shift the drift by the same amount. Over a full path of $n = T/\\Delta t$ steps:\n$$\\frac{d\\mathbb{Q}}{d\\mathbb{P}} = \\prod_{i=1}^{n} \\left(1 - \\frac{(\\mu-r)}{\\sigma}\\xi_i\\sqrt{\\Delta t}\\right)$$where $\\xi_i = \\pm 1$ records whether step $i$ was up or down. Here we are treating $\\mu$, $r$, and $\\sigma$ as constants, so $\\theta = (\\mu - r)/\\sigma$ is constant across all steps. This keeps the random walk tractable.\nTaking the Continuous Limit Writing $\\theta = (\\mu - r)/\\sigma$ for the constant excess drift per unit of volatility, taking the logarithm and using $\\log(1-x) \\approx -x - x^2/2$ for small $x$:\n$$\\log\\frac{d\\mathbb{Q}}{d\\mathbb{P}} \\approx \\sum_{i=1}^{n} \\left(-\\theta\\xi_i\\sqrt{\\Delta t} - \\frac{\\theta^2\\Delta t}{2}\\right) = -\\theta\\sum_{i=1}^{n}\\xi_i\\sqrt{\\Delta t} - \\frac{\\theta^2 T}{2}$$As $\\Delta t \\to 0$, the sum $\\sum_{i=1}^{n}\\xi_i\\sqrt{\\Delta t}$ converges to Brownian motion $W^{\\mathbb{P}}_T$ by the same argument as in Brownian Motion: From Random Walks to Option Prices. This is a $\\mathbb{P}$-Brownian motion specifically because the steps $\\xi_i$ were drawn according to $\\mathbb{P}$-probabilities. The product of per-step Radon-Nikodym factors becomes:\n$$\\frac{d\\mathbb{Q}}{d\\mathbb{P}} = \\exp \\left(-\\theta W^{\\mathbb{P}}_T - \\frac{\\theta^2 T}{2}\\right)$$This is the Girsanov exponential, the continuous-time Radon-Nikodym derivative of $\\mathbb{Q}$ with respect to $\\mathbb{P}$.\nThe process $Z_t = \\exp\\left(-\\theta W^{\\mathbb{P}}_t - \\frac{\\theta^2 t}{2}\\right)$ is a local martingale under $\\mathbb{P}$. This follows from Itô\u0026rsquo;s lemma: applying it to $Z_t$ shows that $dZ_t = -\\theta Z_t dW^{\\mathbb{P}}_t$, which has no $dt$ term and is therefore a local martingale. Under standard integrability conditions, this process is in fact a true martingale1, so its expectation remains constant: $\\mathbb{E}^{\\mathbb{P}}[Z_T] = Z_0 = 1$.\nIs $\\mathbb{Q}$ a Valid Probability Measure? We now have an explicit formula for $\\mathbb{Q}$, but we should check that it is actually a legitimate probability measure. Two things are required: the probabilities must be non-negative, and they must sum to one.\nNon-negativity is immediate. The Girsanov exponential is an exponential function, so it is strictly positive for every path.\nSumming to one requires a short argument. The total probability assigned by $\\mathbb{Q}$ to all events is:\n$$\\mathbb{Q}(\\Omega) = \\mathbb{E}^{\\mathbb{Q}}[\\mathbf{1}]$$By the definition of the Radon-Nikodym derivative, any $\\mathbb{Q}$-expectation can be converted to a $\\mathbb{P}$-expectation by reweighting the probabilities, i.e. applying the conversion factor $Z_T = d\\mathbb{Q}/d\\mathbb{P}$:\n$$\\mathbb{E}^{\\mathbb{Q}}[\\mathbf{1}]= \\mathbb{E}^{\\mathbb{P}}[\\mathbf{1}\\cdot Z_T ]$$We showed earlier that $Z_t$ is a martingale under $\\mathbb{P}$, so $\\mathbb{E}^{\\mathbb{P}}[Z_T] = Z_0 = 1$. Therefore $\\mathbb{Q}(\\Omega) = 1$ and $\\mathbb{Q}$ is a proper probability measure.\nWhy $\\mathbb{Q}$ and $\\mathbb{P}$ Must Agree on What Is Possible There is one further requirement that is easy to overlook. Since the Girsanov exponential is strictly positive, every path that has positive probability under $\\mathbb{P}$ also has positive probability under $\\mathbb{Q}$, and vice versa. The two measures agree on which events are possible and which are not. This property is called equivalence of measures, and it is not just a technicality.\nTo see why it matters, suppose $\\mathbb{Q}$ assigned zero probability to some event that $\\mathbb{P}$ considered possible, say a large downward move in the stock. Then a derivative that pays off only in that scenario would be priced at zero under $\\mathbb{Q}$, even though it has a genuine chance of paying out in the real world. A trader who knew this could buy the derivative for nothing and collect a positive expected payoff under $\\mathbb{P}$, which is a pure arbitrage. Equivalent measures rule this out by ensuring that anything that can happen in the real world is also priced as possible under $\\mathbb{Q}$.\nGirsanov\u0026rsquo;s Theorem We now have all the pieces. Let $X_t$ be a process under $\\mathbb{P}$ with drift $\\mu$ and diffusion $\\sigma$:\n$$dX_t = \\mu_t dt + \\sigma_t dW^{\\mathbb{P}}_t$$In the random walk above we used a constant $\\theta = (\\mu - r)/\\sigma$. The continuous-time theorem makes no such restriction. Define the market price of risk $\\theta_t$, which is now allowed to vary over time, as the excess drift removed per unit of volatility at each instant. Girsanov\u0026rsquo;s theorem states that there exists a measure $\\mathbb{Q}$, whose Radon-Nikodym derivative with respect to $\\mathbb{P}$ is:\n$$\\frac{d\\mathbb{Q}}{d\\mathbb{P}} = \\exp\\left(-\\int_0^T \\theta_t dW^{\\mathbb{P}}_t - \\frac{1}{2}\\int_0^T \\theta_t^2 dt\\right)$$under which $X_t$ has drift $r$ instead of $\\mu$:\n$$dX_t = r_tdt + \\sigma_t dW^{\\mathbb{Q}}_t$$The diffusion $\\sigma_t$ is unchanged between the two measures. The paths are the same. Only the drift and the probability weights have changed.\nComponent Meaning $\\theta_t = (\\mu_t- r_t)/\\sigma_t$ Market price of risk: excess drift removed per unit of volatility at time $t$ $\\int_0^T \\theta_tdW^{\\mathbb{P}}_t$ Stochastic integral capturing how each infinitesimal shock is reweighted over time $\\frac{1}{2}\\int_0^T \\theta_t^2dt$ Deterministic quadratic-variation correction that naturally emerges from exponentiating Brownian motion and ensures the exponential remains normalized $dW^{\\mathbb{Q}}_t = dW^{\\mathbb{P}}_t + \\theta_tdt$ Same Brownian increments, recentred to remove excess drift This answers both open questions from earlier. The Girsanov exponential is the explicit Radon-Nikodym derivative relating $\\mathbb{P}$ and $\\mathbb{Q}$. And the theorem guarantees that this reweighting always produces a valid probability measure.\nThe Connection to the Feynman-Kac Article We can now close the loop with the Feynman-Kac article. There we took $\\mathbb{Q}$ as given and showed that expectations under it satisfy a PDE. Here we have shown where $\\mathbb{Q}$ comes from and why it is legitimate.\nThe full picture across the two articles is:\nFTAP guarantees a measure $\\mathbb{Q}$ exists under which pricing is simple. Girsanov tells us $\\mathbb{Q}$ is obtained from $\\mathbb{P}$ by the explicit Radon-Nikodym derivative above, and that the result is always a valid measure. Feynman-Kac tells us expectations under $\\mathbb{Q}$ satisfy a PDE, giving us a second way to compute prices. The PDE approach is particularly valuable when early exercise is possible, cases where computing the expectation directly becomes intractable. Every time we write $\\mathbb{E}^{\\mathbb{Q}}$ in a pricing formula, both Girsanov and FTAP are working quietly in the background.\nWhen Does Girsanov Get Used Explicitly? Situation How Girsanov is Used Switching from $\\mathbb{P}$ to $\\mathbb{Q}$ At the SDE level, it changes the drift from $\\mu$ to $r$. At the distribution level, the Radon–Nikodym derivative reweights entire paths, so that the market price of risk is absorbed into probabilities rather than dynamics Change of numeraire Switching numeraire corresponds to a drift change; Girsanov guarantees each switch produces a valid measure Stochastic volatility models Volatility risk cannot be hedged; its market price of risk is a free parameter and each choice gives a valid $\\mathbb{Q}$ via Girsanov Importance sampling in Monte Carlo Replace the original sampling measure with a more convenient one that makes rare payoff events more likely, and correct expectations using the Radon–Nikodym derivative to preserve unbiased pricing while reducing variance Looking Ahead So far we have taken the risk-free bond as the numeraire, which led us to the risk-neutral measure $\\mathbb{Q}$ and the familiar drift of $r$. But any strictly positive self-financing wealth process can play that role, and each choice produces a different valid measure via exactly the same Girsanov machinery. The next article develops this in full, exploring what happens when we switch to other natural numeraires and how each choice simplifies the pricing of a different class of derivatives.\nUnder the Novikov condition $\\mathbb{E}^{\\mathbb{P}}\\left[\\exp\\left(\\frac{1}{2}\\int_0^T\\theta_t^2dt\\right)\\right] \u003c \\infty$, $Z_t$ is a true martingale rather than merely a local martingale. A local martingale has zero drift locally but may fail to have constant expectation globally. The Novikov condition rules this out and ensures $\\mathbb{E}^{\\mathbb{P}}[Z_T] = Z_0 = 1$ holds exactly, which is what the validity argument in the next section requires.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/girsanov/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eWe want to price a derivative. Under the real world measure $\\mathbb{P}$, we face\ntwo problems. First, we do not know the true drift $\\mu$ of the underlying, and\nhistorical estimates are notoriously unreliable. Second, even if we knew $\\mu$,\ntaking the expected payoff under $\\mathbb{P}$ would still not give the market price.\nRisky cash flows must be discounted more heavily than guaranteed ones because\ninvestors are risk averse. Pricing under $\\mathbb{P}$ requires both the true\nprobabilities of outcomes and a model for how the market prices risk. Both are\nfundamentally unobservable. So what can we do?\u003c/p\u003e","title":"Drift Lives in the Measure: An Intuitive Look at Girsanov's Theorem"},{"content":"Why This Matters The first time I encountered the Feynman-Kac theorem, I found it fascinating but unintuitive. The theorem claims that a deterministic PDE and the expectation of a stochastic process are two representations of the same object. A PDE is smooth and deterministic. A stochastic expectation involves randomness, probability measures, and averaging over infinitely many paths. How could these be the same thing? I understood the steps of the proof, but I still didn’t have a clear intuition for why this equivalence should exist.\nI also found myself slightly confused about its role in practice. In derivative pricing, we often work directly with risk-neutral expectations. The PDE formulation and the stochastic formulation both appear natural, so it is not immediately obvious what additional insight Feynman–Kac is adding.\nThis article is my attempt to answer both. Starting from a simple random walk, I hope the equivalence feels less like a coincidence by the end, and that it becomes clear where Feynman-Kac sits in derivative valuation and why it matters even when it may appear unnecessary.\nThe Intuition: A Random Walk Before stating any theorem, I want to show through the simplest possible example why a deterministic equation and a stochastic expectation are naturally the same thing. The key is to start from neither, and instead start from something more primitive.\nSetup Consider a particle that can sit at any integer position on a line. Starting from position $x$ at time $t$, at each discrete time step of size $\\Delta t$, the particle moves up by $\\Delta x$ or down by $\\Delta x$ with equal probability $\\frac{1}{2}$. At the final time $T$, we collect a payoff $g(X_T)$ depending on where the particle ends up.\nWe want to find a function $u(x, t)$ that tells us the fair value of this payoff at any position $x$ and time $t$ before expiry.\nThe Averaging Property We do not yet say what $u$ is: not an expectation, not the solution to a PDE. We only impose one requirement: $u$ must be consistent with the random walk. That is, the value at $(x, t)$ must equal the average of the values at the two positions the particle could reach at the next step:\n$$u(x, t) = \\frac{1}{2}u(x + \\Delta x, t + \\Delta t) + \\frac{1}{2}u(x - \\Delta x, t + \\Delta t)$$with the boundary condition $u(x, T) = g(x)$.\nThis is the only thing we are asking of $u$. If we know the value at every position at time $t + \\Delta t$, the value at time $t$ must be the average of the two possible next positions.\nTwo Consequences of the Same Property This single averaging requirement has two very different looking consequences, and this is the heart of the intuition.\nConsequence 1: $u$ satisfies a PDE. Rearranging the averaging equation and subtracting $u(x, t + \\Delta t)$ from both sides:\n$$u(x, t) - u(x, t + \\Delta t) = \\frac{1}{2}\\left[u(x + \\Delta x, t + \\Delta t) - 2u(x, t + \\Delta t) + u(x - \\Delta x, t + \\Delta t)\\right]$$The left side is a difference in time. The right side is a second difference in space. Dividing through by $\\Delta t$, using the diffusion scaling $\\frac{(\\Delta x)^2}{\\Delta t} = 1$ (equivalently $\\Delta x = \\sqrt{\\Delta t}$, the same scaling condition established in Brownian Motion: From Random Walks to Option Prices), and taking $\\Delta t \\to 0$, $\\Delta x \\to 0$, we obtain:\n$$\\frac{\\partial u}{\\partial t} + \\frac{1}{2}\\frac{\\partial^2 u}{\\partial x^2} = 0 \\quad \\text{with } u(x, T) = g(x)$$The key point is not the algebra itself, but the structure: the local averaging rule forces a second-order spatial structure in the limit. That structure is the PDE.\nConsequence 2: $u$ equals a stochastic expectation. The averaging property also tells us how to compute $u$ by working forward in time. Starting from position $x$ at time $t$, at each step the particle moves up or down with equal probability. After two steps there are four possible positions, after three steps there are eight, and so on. This generates a binary tree of possible paths, where each branch represents one possible realization of the particle\u0026rsquo;s journey from $t$ to $T$.\nEach path through the tree has a probability: since every step is equally likely, a path consisting of $k$ up-moves and $n-k$ down-moves over $n$ total steps has probability $\\left(\\frac{1}{2}\\right)^n$. The value $u(x, t)$ is the average of $g(X_T)$ weighted by these path probabilities, which is exactly the expectation of $g(X_T)$ over all paths:\n$$u(x, t) = \\mathbb{E}\\left[g(X_T) \\mid X_t = x\\right]$$In the continuous limit, as $\\Delta t \\to 0$ and $\\Delta x \\to 0$, the binary tree of discrete paths converges to Brownian motion, and the sum over tree paths becomes an expectation over continuous paths. The tree has not disappeared; it has become the probability measure over continuous paths that defines the expectation.\nSummary We started from one primitive requirement: consistency with a local averaging rule. That single condition leads to a deterministic PDE when viewed infinitesimally, and a stochastic expectation when viewed globally. The PDE and the expectation are not two different models that happen to agree. They are two ways of reading the same underlying structure: one in the language of calculus, one in the language of probability. This is the intuition behind Feynman-Kac.\nFeynman-Kac: The Theorem Let $X_t$ be a stochastic process under measure $\\mathbb{Q}$ defined by:\n$$dX = \\mu(X, t) dt + \\sigma(X, t) dW^{\\mathbb{Q}}$$Consider the function $u(x, t)$ defined as the stochastic expectation:\n$$u(x, t) = \\mathbb{E}^{\\mathbb{Q}}\\left[e^{-\\int_t^T r(X_s, s)\\,ds} g(X_T) \\middle| X_t = x\\right]$$and the PDE:\n$$\\frac{\\partial u}{\\partial t} + \\mu(x, t)\\frac{\\partial u}{\\partial x} + \\frac{1}{2}\\sigma^2(x, t)\\frac{\\partial^2 u}{\\partial x^2} - r(x, t)u = 0$$with terminal condition $u(x, T) = g(x)$.\nFeynman-Kac states that these two representations are equivalent.1\nIf $u(x,t)$ is defined by the expectation above, then it satisfies the PDE. If $u(x,t)$ is defined as the solution to the PDE above, then it has the stochastic representation given by the expectation. PDE component Stochastic counterpart Drift $\\mu \\frac{\\partial u}{\\partial x}$ Drift of $X_t$ Diffusion $\\frac{1}{2}\\sigma^2 \\frac{\\partial^2 u}{\\partial x^2}$ Diffusion of $X_t$ Discounting $-r u$ Discount factor $e^{-\\int_t^T rds}$ inside the expectation Terminal condition $u(x, T) = g(x)$ Payoff function $g(X_T)$ Two Paths to Derivative Valuation When I first learned derivatives pricing, the PDE approach and the martingale approach were presented as two separate tools to reach for depending on the problem. It took me a while to appreciate that they are not just compatible but provably equivalent, and that Feynman-Kac is precisely what makes that equivalence rigorous. To see why, it helps to understand what each approach delivers on its own.\nPath 1: The PDE Approach The PDE approach starts from no-arbitrage. We construct a delta-hedged portfolio, eliminate the stochastic term, and impose that any risk-free portfolio must earn the risk-free rate. In the Black-Scholes setting for a futures option, this gives:\n$$\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2} - rV = 0, \\quad V(F, T) = g(F)$$The PDE is grounded in no-arbitrage from the start. Any function that solves it is, by construction, consistent with the requirement that a delta-hedged portfolio cannot earn more than the risk-free rate. Solve it once on a grid in $(F, t)$ space and we obtain prices across all underlying levels and all times before expiry in a single pass.\nPath 2: The Martingale Approach The martingale approach starts from a different principle. Under the risk-neutral measure $\\mathbb{Q}$, the no-arbitrage condition is equivalent to discounted asset prices being martingales. From this, any derivative can be priced as the expected discounted payoff:\n$$V(F, t) = \\mathbb{E}^{\\mathbb{Q}}\\left[e^{-r(T-t)}g(F_T) \\mid \\mathcal{F}_t\\right]$$This is a clean and flexible framework. Prices can be computed by Monte Carlo, by numerical integration, or analytically in some cases. But there is something this formula does not immediately provide: a guarantee that the $V$ it defines is consistent with no-arbitrage.\nDefining $V$ as a conditional expectation makes it a well-posed mathematical object. It does not automatically make it an economically valid price. For that, we need to know that this $V$ satisfies the same equation that the delta-hedging argument produces. If it did not, the two approaches would give different prices for the same derivative, which would itself be an arbitrage.\nWhere Feynman-Kac Comes In Applying Feynman-Kac to the martingale pricing formula, where $F_t$ under $\\mathbb{Q}$ has zero drift and diffusion $\\sigma F$, tells us that $V$ defined by the expectation satisfies:\n$$\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2} - rV = 0, \\quad V(F, T) = g(F)$$This is exactly the Black-Scholes PDE. The two approaches are not just compatible in the cases we can solve by hand. They are guaranteed to produce the same function for any well-posed diffusion model, whether or not an analytical solution exists. In simple models like Black-Scholes this equivalence can feel almost unnecessary, but in more complex models such as stochastic volatility settings, where closed-form solutions are no longer available, Feynman-Kac provides the rigorous link that ensures the PDE formulation and the expectation formulation remain consistent representations of the same quantity.\nWhen Does It Matter Which Representation We Use? Both representations are mathematically equivalent but not equally convenient for every problem. In practice, choosing between a PDE solver and Monte Carlo is one of the more common decisions in quantitative work, and the right answer depends on the structure of the problem.\nSituation Preferred approach Reason Computing smooth Greeks PDE Finite differences on the grid are stable; Monte Carlo differentiation is noisy Model calibration PDE Each calibration iteration requires a fast, deterministic price; Monte Carlo is slower and introduces noise into the objective function Pricing across a range of underlying scenarios PDE A single grid solve covers all $F$ at once; Monte Carlo requires a separate simulation per scenario High-dimensional underlyings (basket options) Monte Carlo PDE grid grows exponentially in dimension; simulation cost scales with paths Path-dependent payoffs (Asian, barrier) Monte Carlo Path history requires extra state variables, turning a 2D grid into 3D or higher; simulation handles it naturally by following the full path Validating a PDE implementation Monte Carlo Feynman-Kac guarantees a simulation-based check that should agree with the grid Beyond Monte Carlo and PDE Both the PDE and Monte Carlo perspectives assume a fixed underlying random evolution and differ only in how that evolution is computed. The PDE approach propagates structure deterministically, while Monte Carlo propagates it through sampled paths. Feynman–Kac tells us these are not competing methods but two representations of the same object.\nThis naturally leads to another question: we are computing expectations over paths, so why should we be committed to a single way of assigning probabilities to those paths in the first place? In many problems, the same physical or financial system can be described with different probabilistic weightings of the same trajectories, and some of these representations make computation or analysis significantly simpler than others. Understanding how such reweightings can change the apparent dynamics without changing the value of expectations is the subject of Girsanov’s theorem, which I will discuss in the next article.\nThe theorem holds under standard regularity conditions on $\\mu$, $\\sigma$, $r$, and $g$; we assume these are satisfied throughout.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/feynman_kac/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eThe first time I encountered the Feynman-Kac theorem, I found it fascinating but\nunintuitive. The theorem claims that a deterministic PDE and the expectation of\na stochastic process are two representations of the same object. A PDE is smooth and\ndeterministic. A stochastic expectation involves randomness, probability measures, and\naveraging over infinitely many paths. How could these be the same thing? I understood the steps of the proof, but I still didn’t have a clear intuition for why this equivalence should exist.\u003c/p\u003e","title":"How Randomness Solves a Deterministic Equation: An Intuitive Look at the Feynman–Kac Theorem"},{"content":"Why This Matters When I first studied options, most textbook examples were equity-style: you pay a premium upfront, and at expiry (or whenever you choose to exercise, if the option is American), you receive the payoff. That framing stuck with me for a long time.\nWhen I started working on commodity derivatives, I encountered a different world. Many options in commodity markets are traded under futures-style margining. No premium changes hands at inception, and instead the option is margined daily like a futures contract. This is common across a wide range of exchange-traded products: options on WTI crude oil futures at the CME, options on Henry Hub natural gas futures, options on corn and wheat futures, and options on carbon emissions futures, to name a few.\nA standard assumption in practice is that American futures-style options are valued identically to their European counterparts. When I first went looking for an explanation on why early exercise has no benefit, the most common answer I found was something like:\nDaily marking-to-market removes the time-value-of-money advantage that usually justifies early exercise for American options.\nThat statement makes some intuitive sense, but it never gave me the mathematical comfort I was looking for.\nTo really understand why American and European options coincide under futures-style margining, I found it helpful to break the problem into two smaller steps along two separate dimensions:\nMargining convention: futures-style margining (FSM) vs. equity-style margining (ESM), both applied to options on futures. Exercise style: American vs. European. The first step is to get a clear understanding of the margining dimension: what is the difference between a futures-style and an equity-style European option on a futures contract, and how does the change in margining convention affect the PDE and the meaning of the quantities in it? This is less obvious than it first appears, and getting it right is the key to everything that follows.\nThe second step, showing that the American early exercise feature has no value under futures-style margining, turns out to require no additional mathematical heavy lifting. It follows almost immediately from a simple cash flow argument.\nEuropean Options — Equity-Style vs. Futures-Style Margining I want to compare the valuation difference from the PDE perspective. We will derive the PDE from scratch. Let $F_t$ denote the futures price at time $t$, assumed to follow geometric Brownian motion under the risk-neutral measure $\\mathbb{Q}$:\n$$dF = \\sigma FdW^{\\mathbb{Q}}$$There is no drift term. Under the risk-neutral measure, futures prices are martingales since entering a futures contract requires no capital (other than the initial margin required by the exchange).\nConsider a European option with value $V = V(F, t)$. We construct a delta-hedged portfolio $\\Pi$ consisting of a long position in the option and a position in $\\Delta$ futures contracts:\n$$\\Pi = V - \\Delta F$$where $\\Delta F$ denotes the notional of the futures hedge, not its market value (futures have zero value at inception).\nThe Equity-Style Case In the equity-style world, $V$ is the cash premium paid upfront. Since futures require no upfront investment, the cost of the portfolio is simply the option value $V$.\nApplying Itô\u0026rsquo;s lemma to $V(F, t)$:\n$$dV = \\frac{\\partial V}{\\partial t}dt + \\frac{\\partial V}{\\partial F}dF + \\frac{1}{2}\\frac{\\partial^2 V}{\\partial F^2}(dF)^2$$The change in the portfolio value is:\n$$d\\Pi = dV - \\Delta dF = \\left(\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2}\\right)dt + \\left(\\frac{\\partial V}{\\partial F} - \\Delta\\right)\\sigma F dW^{\\mathbb{Q}}$$Setting $\\Delta = \\frac{\\partial V}{\\partial F}$ eliminates the stochastic term. The portfolio is now instantaneously risk-free:\n$$d\\Pi = \\left(\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2}\\right)dt$$No-arbitrage condition: a risk-free portfolio must earn the risk-free rate $r$. Since the cash invested equals $V$, we require $d\\Pi = rV dt$. Setting the two expressions equal and rearranging:\n$$\\boxed{\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2} - rV = 0}$$with terminal condition $V(F, T) = $ option payoff.\nThe $-rV$ term is the cost of carry on the cash investment $V$. It is present because $V$ is the money the holder has paid out and it must earn the risk-free rate to break even. The solution is the Black (1976) formula with a discount factor:\n$$V^{\\text{equity}}(F, t) = e^{-r(T-t)}\\left[F N(d_1) - K N(d_2)\\right]$$where $d_1, d_2$ are the standard Black expressions.\nThe Futures-Style Case What Changes Under futures-style margining, no cash premium is paid at inception. Instead, the option is margined daily: if the exchange\u0026rsquo;s settlement price moves from $V_t$ to $V_{t+dt}$, the holder receives (or pays) $dV = V_{t+dt} - V_t$ through their margin account. The option position itself requires zero initial cash outlay.\nThis changes the no-arbitrage argument in a subtle but critical way.\nDerivation of the Futures-Style PDE Construct the same delta-hedged portfolio. Setting $\\Delta = \\frac{\\partial V}{\\partial F}$ eliminates the stochastic term as before. The instantaneous risk-free P\u0026amp;L of this portfolio is:\n$$d\\Pi = \\left(\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2}\\right)dt$$Now apply the no-arbitrage condition. The portfolio consists of:\nOption position (no cash outlay — futures-style, no premium paid) Futures position (no cash outlay — futures require no upfront payment) The total cash invested in this portfolio is zero. Since the hedged portfolio requires no initial capital and is instantaneously riskless, any non-zero deterministic drift would imply arbitrage. Therefore its drift must vanish:\n$$d\\Pi = 0$$$$\\boxed{\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2} = 0}$$with the same terminal condition. The $-rV$ term is gone because there is no cash investment to carry.\n$V$ as a Martingale Under the Risk-Neutral Measure The absence of the $-rV$ term has a direct probabilistic interpretation. Applying Itô\u0026rsquo;s lemma to $V(F, t)$ under the risk-neutral measure and substituting $dF = \\sigma F dW^{\\mathbb{Q}}$:\n$$dV = \\left(\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2}\\right)dt + \\frac{\\partial V}{\\partial F}\\sigma F dW^{\\mathbb{Q}}$$$V$ satisfies the futures-style PDE, which requires $\\frac{\\partial V}{\\partial t} + \\frac{1}{2}\\sigma^2 F^2 \\frac{\\partial^2 V}{\\partial F^2} = 0$. The $dt$ term therefore vanishes, leaving:\n$$dV = \\frac{\\partial V}{\\partial F}\\sigma F dW^{\\mathbb{Q}}$$Therefore $V$ is a martingale under $\\mathbb{Q}$. This is the direct counterpart to the futures price $F$ itself being a martingale under $\\mathbb{Q}$: just as $F$ requires no discounting because entering a futures contract requires no cash outlay, $V$ requires no discounting because the futures-style option requires no upfront premium. Being a martingale, $V$ satisfies:\n$$V(F_t, t) = \\mathbb{E}^{\\mathbb{Q}}\\left[V(F_T, T) \\middle|\\mathcal{F}_t\\right] = \\mathbb{E}^{\\mathbb{Q}}\\left[\\text{Payoff}(F_T) \\middle|\\mathcal{F}_t\\right]$$where $\\text{Payoff}(F_T) = \\max(F_T - K, 0)$ for a call and $\\text{Payoff}(F_T) = \\max(K - F_T, 0)$ for a put.\nThat is, the futures-style MTM at any point in time is the risk-neutral expectation of the terminal payoff with no discount factor applied. We will rely on one consequence of this in the American options section:\n$$V(F_t, t) \u003e \\text{Intrinsic Value}(F_t) \\quad \\text{for all } t \u003c T$$The strict inequality holds by a simple no-arbitrage argument for the lower bound, plus an intuitive observation for the strictness. First, $V$ can never fall below intrinsic value. If it did, one could buy the option and immediately exercise it for a riskless profit. Second, $V$ must be strictly greater than intrinsic value as long as time and volatility remain, because the option holder benefits from any further favourable move in $F_T$ before expiry, while being fully protected against unfavourable moves by the payoff floor at zero. This asymmetry between favourable participation and downside protection always commands a strictly positive premium above intrinsic value whenever $F_t \u003e 0$.\nComparing the Two Worlds We can now contrast the two margining conventions clearly.\nEquity-Style Futures-Style Premium at inception Paid upfront in cash Zero — no cash changes hands What $V$ represents Present value of the option Exchange MTM settlement price PDE $V_t + \\frac{1}{2}\\sigma^2F^2V_{FF} - rV = 0$ $V_t + \\frac{1}{2}\\sigma^2F^2V_{FF} = 0$ Probabilistic form $e^{-r(T-t)}\\mathbb{E}^{\\mathbb{Q}}[\\text{payoff}]$ $\\mathbb{E}^{\\mathbb{Q}}[\\text{payoff}]$ Cash flow to holder Premium $V$ paid at $t_0$, payoff received at $T$ Daily margin flows $dV$, summing to payoff at $T$ In the equity-style world, $V(F, t)$ is the fair cash amount to exchange today for the right to receive the option payoff at expiry. It is a present value in the traditional sense.\nIn the futures-style world, $V(F, t)$ is the exchange\u0026rsquo;s mark-to-market settlement quote, used to compute each day\u0026rsquo;s margin flow. It is not paid or received as a lump sum. It always strictly exceeds intrinsic value before expiry.\nThe closed-form solution to the futures-style PDE is:\n$$V^{\\text{futures}}(F, t) = F N(d_1) - K N(d_2)$$Comparing with the equity-style solution:\n$$V^{\\text{futures}} = e^{r(T-t)} V^{\\text{equity}}$$The futures-style MTM exceeds the equity-style present value by exactly $e^{r(T-t)}$. The futures-style holder collects the same economic cash flows as the equity-style holder but without paying anything upfront, so the quoted price is scaled up by the cost of carry that the equity-style holder effectively prepays.\nAmerican Options Under Futures-Style Margining We now turn to the central question. In the equity-style world, American options can be worth more than European options. Early exercise can be optimal when the intrinsic value in hand, reinvested at $r$, exceeds the value of waiting (discussed in Early Exercise of American Options: Call Equivalence and the Put Premium). Does the same logic apply under futures-style margining?\nThe answer is no, and we can see why by simply looking at what happens to the cash flows and payoffs when the holder exercises early.\nSetup and Notation Consider an American put option on a futures contract, traded under futures-style margining, with strike $K$, expiry at time $T$, and current time $t_0$. The exchange publishes a daily MTM settlement price for the option. We denote this settlement price on day $i$ as $V_i$, where:\n$$V_i = V(F_i, t_i)$$is the futures-style option MTM as defined above. Recall that $V_i$ is not a present value but the exchange-quoted settlement price used to compute each day\u0026rsquo;s margin flow. The holder receives $V_i - V_{i-1}$ on day $i$ through their margin account.\nAt expiry on day $n$, the settlement price converges to intrinsic value:\n$$V_n = \\max(K - F_n, 0)$$Exercise Mechanics When the holder of a futures-style American put exercises on day $m$, the following happens in sequence:\nThe regular daily margin flow $V_m - V_{m-1}$ is settled as usual through the margin account. This happens regardless of exercise. The option position is submitted for exercise. The exchange assigns the holder a short futures position at the strike price $K$. Since the current futures price is $F_m$, this newly assigned position is immediately marked to market, and the margin account is credited with $K - F_m$ (assuming the put is in the money). The holder may then close out the short futures position at $F_m$ at no further cost, or carry it forward. The option is extinguished. No further option margin flows occur from day $m+1$ onward. It is the combination of steps 2 and 3 that determines whether early exercise is beneficial. Step 2 delivers the intrinsic value $\\max(K - F_m, 0)$ through futures assignment, while step 3 forfeits all future option margin flows. The question is whether the amount received in step 2 compensates for what is given up in step 3.\nEarly Exercise on Day $m$ Suppose the holder exercises the American put early on day $m$, where $m \u003c n$. The complete cash flows over the life of the position are:\nDay Cash Flow Sign 1 $V_1 - V_0$ positive or negative 2 $V_2 - V_1$ positive or negative $\\vdots$ $\\vdots$ $\\vdots$ $m$ (regular margin flow) $V_m - V_{m-1}$ positive or negative $m$ (exercise payoff) $\\max(K - F_m, 0) - V_m$ always $\u003c0$ $m+1, \\ldots, n$ 0 (option is extinguished) On the exercise day, after the regular margin flow is settled, the holder receives the intrinsic value $\\max(K - F_m, 0)$ through futures assignment.\nFrom the no-arbitrage argument established in the previous section:\n$$V_m \u003e \\max(K - F_m, 0) \\quad \\text{for all } m \\lt n$$The exercise payoff row in the table is therefore always $\u003c 0$ as long as $F_m \u003e0$ .There is never a reason to exercise early, so the American feature has no value. Although we have used a put option to illustrate the mechanics, the same argument holds symmetrically for call options.\n$$\\boxed{V^{\\text{American, futures-style}} = V^{\\text{European, futures-style}}}$$ Conclusion The central insight of this article is that the quantity $V$ means something fundamentally different under each margining convention. In the equity-style world, $V$ is a present value, which creates a tradeoff between immediate exercise and continued optionality. In the futures-style world, $V$ is a martingale under the risk-neutral measure, a forward-like quantity that always strictly exceeds intrinsic value before expiry. Once that is understood, the conclusion for American options follows directly from the cash flow mechanics: exercising early forfeits the option\u0026rsquo;s remaining time value. The early exercise feature is contractually present but economically worthless, and American and European futures-style options are priced identically.\nReferences Black, F. (1976). The pricing of commodity contracts. Journal of Financial Economics, 3(1), 167–179. ","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/future_style_margining_options/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eWhen I first studied options, most textbook examples were equity-style:\nyou pay a premium upfront, and at expiry (or whenever you choose to exercise, if the\noption is American), you receive the payoff. That framing stuck with me for a long time.\u003c/p\u003e\n\u003cp\u003eWhen I started working on commodity derivatives, I encountered a different world. Many\noptions in commodity markets are traded under futures-style margining. No premium\nchanges hands at inception, and instead the option is margined daily like a futures\ncontract. This is common across a wide range of exchange-traded products: options on WTI\ncrude oil futures at the CME, options on Henry Hub natural gas futures, options on corn\nand wheat futures, and options on carbon emissions futures, to name a few.\u003c/p\u003e","title":"Futures-Style Margined Options: The Absence of Early Exercise Premium"},{"content":"Why This Matters Most of my early intuition about options came from the Black-Scholes model, which is clean and widely used. But once I started working with real option data, it becomes clear that the Black-Scholes assumption of a lognormal distribution is too restrictive. For a given maturity, the implied volatility is not constant across strikes, and its shape suggests asymmetry and heavy tails in the risk-neutral distribution.\nThat leads to a more basic question. Instead of imposing a parametric model and calibrating its parameters, is there a way to extract volatility, skewness, and kurtosis directly from option prices in a model-free way? This is where the Bakshi, Kapadia, and Madan (2003) framework becomes useful. Their key idea is that smooth payoff functions can be represented as a continuum of vanilla options across strikes. In this view, volatility, skewness, and kurtosis are not model assumptions or calibration outputs. They are quantities that can be recovered from market prices through static option replication.\nThis article is my attempt to explain the math behind their ideas. But before getting into their formulas, I first revisit a simple form of Taylor\u0026rsquo;s theorem, which turns out to be the key foundation behind their representation.\nTaylor\u0026rsquo;s Theorem with Integral Remainder Let $H$ be a twice continuously differentiable function and let $\\bar{S}$ be a reference point. The first-order Taylor expansion with integral remainder states:\n$$H(S) = H(\\bar{S}) + H'(\\bar{S})(S - \\bar{S}) + \\int_{\\bar{S}}^{S} H''(K)(S - K)dK \\tag{1}$$Derivation. Start with the fundamental theorem of calculus:\n$$H(S) = H(\\bar{S}) + \\int_{\\bar{S}}^{S} H'(t)dt$$Apply the fundamental theorem of calculus again to $H'(t)$:\n$$H'(t) = H'(\\bar{S}) + \\int_{\\bar{S}}^{t} H''(u)du$$Substitute into the first equation:\n$$H(S) = H(\\bar{S}) + \\int_{\\bar{S}}^{S} \\left[ H'(\\bar{S}) + \\int_{\\bar{S}}^{t} H''(u)du \\right] dt$$Since $H'(\\bar{S})$ is constant with respect to $t$, this separates into:\n$$H(S) = H(\\bar{S}) + H'(\\bar{S})(S - \\bar{S}) + \\int_{\\bar{S}}^{S}\\int_{\\bar{S}}^{t} H''(u)dudt$$It remains to simplify the double integral. The integration runs over the triangular region:\n$$\\{(u, t) : \\bar{S} \\leq u \\leq t \\leq S\\}$$Switching the order of integration — integrating over $t$ first, then $u$ — this same region is described by $\\bar{S} \\leq u \\leq S$ and $u \\leq t \\leq S$:\n$$\\int_{\\bar{S}}^{S}\\int_{\\bar{S}}^{t} H''(u)\\,du\\,dt = \\int_{\\bar{S}}^{S}\\int_{u}^{S} H''(u)dtdu$$Since $H''(u)$ is constant with respect to $t$, the inner integral evaluates to $(S - u)$:\n$$\\int_{\\bar{S}}^{S}\\int_{u}^{S} H''(u)\\,dt\\,du = \\int_{\\bar{S}}^{S} H''(u)(S - u)du$$Substituting back and renaming the dummy variable $u \\to K$:\n$$H(S) = H(\\bar{S}) + H'(\\bar{S})(S - \\bar{S}) + \\int_{\\bar{S}}^{S} H''(K)(S - K)dK \\qquad \\blacksquare$$ Pricing a Claim via Static Replication Equation (1) holds for any $S$ and any reference point $\\bar{S}$, but the remainder integral runs between $\\bar{S}$ and $S$, which depends on the realized value of $S$. To express the remainder in terms of traded instruments, i.e. calls and puts at fixed strikes, we rewrite it by separating the call and put payoffs. Noting that for any $K$:\n$$(S - K)^+ - (K - S)^+ = S - K$$we can split the single integral over $[\\bar{S}, S]$ into an integral of call payoffs over all strikes above $\\bar{S}$ and an integral of put payoffs over all strikes below $\\bar{S}$:\n$$H(S) = H(\\bar{S}) + H'(\\bar{S})(S - \\bar{S}) + \\int_{\\bar{S}}^{\\infty} H''(K)(S - K)^+ dK + \\int_0^{\\bar{S}} H''(K)(K - S)^+dK \\tag{2}$$This works because when $S \u003e \\bar{S}$, the term $(S - K)^+$ is zero for all $K \u003e S$, so the first integral effectively only collects contributions from $K \\in [\\bar{S}, S]$, which matches the original remainder in equation (1). The second integral vanishes entirely since $(K - S)^+ = 0$ for $K \\leq \\bar{S} \u003c S$. The case $S \\leq \\bar{S}$ is symmetric: only the put integral contributes.\nEquation (2) decomposes any smooth payoff into three components: a bond position of size $H(\\bar{S}) - \\bar{S} H'(\\bar{S})$, a stock position of size $H'(\\bar{S})$, and a continuum of calls and puts at every strike, each weighted by the second derivative $H''(K)$ of the payoff evaluated at that strike.\nNow set $\\bar{S} = S_0$, where $S_0$ is the known stock price at the pricing date and $S_t$ is the unknown stock price at maturity. This departs slightly from BKM\u0026rsquo;s original notation, where they use $S_t$ for the known current price and $S$ for the unknown future price. I find the $S_0 / S_t$ convention cleaner for distinguishing constants from random variables.\nApplying risk-neutral valuation to both sides of equation (2) gives the arbitrage-free price of the claim:\n$$e^{-rt}E^Q[H(S_t)]=[H(S_0) - S_0 H'(S_0)]e^{-rt} + H'(S_0)S_0 + \\int_{S_0}^{\\infty} H''(K)C(0,K)dK+ \\int_0^{S_0} H''(K)P(0,K)dK \\tag{3}$$where $C(0,K)$ and $P(0,K)$ are the time-$0$ prices of European calls and puts with strike $K$ and maturity $t$. The bond and stock terms follow from the facts that $H(S_0)$ and $H'(S_0)$ are constants at time $0$, and that the no-arbitrage forward price satisfies $e^{-rt}\\mathbb{E}^Q[S_t] = S_0$.1\nPayoff Functions for the Volatility, Cubic, and Quartic Contracts Define the log return over the horizon $t$ as:\n$$R_t = \\ln\\frac{S_t}{S_0}$$BKM propose three contracts whose payoffs are defined by powers of this return:\nContract Payoff $H(S_t)$ Volatility $\\left(\\ln \\dfrac{S_t}{S_0}\\right)^2$ Cubic $\\left(\\ln \\dfrac{S_t}{S_0}\\right)^3$ Quartic $\\left(\\ln \\dfrac{S_t}{S_0}\\right)^4$ The cubic contract captures asymmetry in the return distribution and the quartic contract captures tail heaviness. Their risk-neutral discounted values are:\n$$V(0,t) = e^{-rt}\\mathbb{E}^Q\\left[R_t^2\\right], \\quad W(0,t) = e^{-rt}\\mathbb{E}^Q\\left[R_t^3\\right], \\quad X(0,t) = e^{-rt}\\mathbb{E}^Q\\left[R_t^4\\right]$$By equation (3), each can be replicated by a static portfolio of options once we compute $H'(S_0)$ and $H''(K)$ for each payoff and substitute into the replication formula. That is the task of the next section.\nComputing the Option Weights Replicating each contract requires two ingredients: the first derivative $H'(S_0)$ evaluated at the current stock price, and the second derivative $H''(K)$ evaluated at each strike $K$. The first derivative determines the stock position and the second derivative determines the weight on each option. We compute these for all three contracts in turn and show a graph of the option weights at the end of the section. I plan to discuss the intuition of the option weights profile in a separate article.\n1. The Volatility Contract $$H(S_t) = \\left(\\ln\\frac{S_t}{S_0}\\right)^2$$Taking derivatives with respect to $S_t$, evaluated at strike $K$:\n$$H'(S_t) = \\frac{2\\ln\\frac{S_t}{S_0}}{S_t}, \\qquad H''(K) = \\frac{2 - 2\\ln\\frac{K}{S_0}}{K^2} = \\frac{2\\left(1 - \\ln\\frac{K}{S_0}\\right)}{K^2}$$At $S_t = S_0$, the log term vanishes: $H'(S_0) = 0$. This means the stock position is zero and the bond position $H(S_0) - S_0 H'(S_0) = 0$ vanishes as well. The replication formula reduces entirely to the option integrals:\n$$V(0,t) = \\int_{S_0}^{\\infty} \\frac{2\\left(1 - \\ln\\frac{K}{S_0}\\right)}{K^2} C(0,K)dK + \\int_0^{S_0} \\frac{2\\left(1 + \\ln\\frac{S_0}{K}\\right)}{K^2} P(0,K)dK$$where for the put integral we used the fact that $-\\ln(K/S_0) = \\ln(S_0/K)$ for $K \u003c S_0$. The weight on OTM calls, $2(1 - \\ln(K/S_0))/K^2$, is positive for near-the-money strikes but turns negative for strikes above $eS_0$, approximately 2.7 times the current stock price. In practice this sign change has little numerical consequence: option prices at such extreme strikes are close to zero, and the $K^2$ denominator suppresses the contribution further. The put weight $2(1 + \\ln(S_0/K))/K^2$ is always positive. Overall, the volatility contract is long calls and puts across nearly all practically relevant strikes.\n2. The Cubic Contract $$H(S_t) = \\left(\\ln\\frac{S_t}{S_0}\\right)^3$$Taking derivatives:\n$$H'(S_t) = \\frac{3\\left(\\ln\\frac{S_t}{S_0}\\right)^2}{S_t}, \\qquad H''(K) = \\frac{6\\ln\\frac{K}{S_0} - 3\\left(\\ln\\frac{K}{S_0}\\right)^2}{K^2}$$At $S_t = S_0$, the log term again vanishes: $H'(S_0) = 0$, so the stock and bond positions are both zero. The replication formula is:\n$$W(0,t) = \\int_{S_0}^{\\infty} \\frac{6\\ln\\frac{K}{S_0} - 3\\left(\\ln\\frac{K}{S_0} \\right)^2}{K^2} C(0,K)\\,dK - \\int_0^{S_0} \\frac{6\\ln\\frac{S_0}{K} + 3\\left(\\ln\\frac{S_0} {K}\\right)^2}{K^2} P(0,K)\\,dK$$The sign pattern here is economically meaningful. The cubic contract is long calls and short puts. When the risk-neutral distribution is left-skewed, OTM puts are expensive relative to OTM calls, so the cost of the short put position exceeds the long call position, driving the cubic contract value, $W$, to negative. A more negative $W$ corresponds to a more left-skewed distribution.\n3. The Quartic Contract $$H(S_t) = \\left(\\ln\\frac{S_t}{S_0}\\right)^4$$Taking derivatives:\n$$H'(S_t) = \\frac{4\\left(\\ln\\frac{S_t}{S_0}\\right)^3}{S_t}, \\qquad H''(K) = \\frac{12\\left(\\ln\\frac{K}{S_0}\\right)^2 - 4\\left(\\ln\\frac{K}{S_0}\\right)^3}{K^2}$$At $S_t = S_0$: $H'(S_0) = 0$, so again the stock and bond positions vanish. The replication formula is:\n$$X(0,t) = \\int_{S_0}^{\\infty} \\frac{12\\left(\\ln\\frac{K}{S_0}\\right)^2 - 4\\left( \\ln\\frac{K}{S_0}\\right)^3}{K^2} C(0,K)\\,dK + \\int_0^{S_0} \\frac{12\\left(\\ln\\frac{S_0} {K}\\right)^2 + 4\\left(\\ln\\frac{S_0}{K}\\right)^3}{K^2} P(0,K)\\,dK$$The quartic contract is long both calls and puts. Unlike the volatility contract, however, the weights grow with distance from $S_0$: deep out-of-the-money options receive progressively larger weights. This makes the quartic contract especially sensitive to tail options, which is exactly what we want from a kurtosis measure.\nSummary of Option Weights It is useful to collect the results in a single table. Define $m = \\ln(K/S_0)$ for calls ($K \u003e S_0$) and $m = \\ln(S_0/K)$ for puts ($K \u003c S_0$), so $m \u003e 0$ in both cases.\nContract Weight on OTM calls Weight on OTM puts Direction Volatility $\\dfrac{2(1-m)}{K^2}$ $\\dfrac{2(1+m)}{K^2}$ Long calls and puts Cubic $\\dfrac{6m - 3m^2}{K^2}$ $-\\dfrac{6m + 3m^2}{K^2}$ Long calls, short puts Quartic $\\dfrac{12m^2 - 4m^3}{K^2}$ $\\dfrac{12m^2 + 4m^3}{K^2}$ Long calls and puts Volatility contract weights 2(1+m)/K² for puts · 2(1−m)/K² for calls · Both positive, largest near the money OTM puts (K \u0026lt; S₀) OTM calls (K \u0026gt; S₀) Cubic contract weights −(6m+3m²)/K² for puts · (6m−3m²)/K² for calls · Long calls, short puts OTM puts (K \u0026lt; S₀) OTM calls (K \u0026gt; S₀) Quartic contract weights (12m²+4m³)/K² for puts · (12m²−4m³)/K² for calls · Both positive, growing in tails OTM puts (K \u0026lt; S₀) OTM calls (K \u0026gt; S₀) Relating Contract Prices to Risk-Neutral Moments We now have the three contract prices $V(0,t)$, $W(0,t)$, and $X(0,t)$, each recoverable from observed option prices. The final step is to assemble these into the familiar statistical moments: variance, skewness, and kurtosis of the risk-neutral return distribution.\nThe Risk-Neutral Mean Before computing the centered moments, we need the risk-neutral mean log return:\n$$\\mu_t = \\mathbb{E}^Q[R_t]$$$\\mu_t$ can be recovered from option prices using exactly the same static replication approach as the other three contracts — simply apply equation (3) to the payoff $$H(S_t) = \\ln(S_t/S_0) = R_t$$. The result is:\n$$\\mu_t = e^{rt} - 1 - e^{rt}\\int_{S_0}^{\\infty} \\frac{C(0,K)}{K^2}dK - e^{rt}\\int_0^{S_0} \\frac{P(0,K)}{K^2}dK$$So $\\mu_t$ is fully determined by observed option prices with no model assumptions, just like the volatility, cubic, and quartic contracts.\nWith $\\mu_t$ in hand, the centered moments follow from standard moment algebra. Write $\\hat{R}_t = R_t - \\mu_t$ for the demeaned return.\nVariance. The second centered moment is:\n$$\\mathbb{E}^Q\\left[\\hat{R}_t^2\\right] = \\mathbb{E}^Q\\left[R_t^2\\right] - \\mu_t^2 = e^{rt}V(0,t) - \\mu_t^2$$Third centered moment. Expanding $(R_t - \\mu_t)^3$ and taking expectations under $\\mathbb{Q}$:\n$$\\mathbb{E}^Q\\left[\\hat{R}_t^3\\right] = \\mathbb{E}^Q\\left[R_t^3\\right] - 3\\mu_t\\mathbb{E}^Q\\left[R_t^2\\right] + 2\\mu_t^3 = e^{rt}W(0,t) - 3\\mu_t e^{rt}V(0,t) + 2\\mu_t^3$$Fourth centered moment. Expanding $(R_t - \\mu_t)^4$ and taking expectations under $\\mathbb{Q}$:\n$$\\mathbb{E}^Q\\left[\\hat{R}_t^4\\right] = \\mathbb{E}^Q\\left[R_t^4\\right] - 4\\mu_t\\mathbb{E}^Q\\left[R_t^3\\right] + 6\\mu_t^2\\mathbb{E}^Q\\left[R_t^2\\right] - 3\\mu_t^4 = e^{rt}X(0,t) - 4\\mu_t e^{rt}W(0,t) + 6\\mu_t^2 e^{rt}V(0,t) - 3\\mu_t^4$$The Moment Formulas Combining the above, the risk-neutral variance, skewness, and kurtosis are:\n$$\\text{Var}^Q_t = e^{rt}V(0,t) - \\mu_t^2$$$$\\text{SKEW}^Q_t = \\frac{e^{rt}W(0,t) - 3\\mu_t e^{rt}V(0,t) + 2\\mu_t^3} {\\left(e^{rt}V(0,t) - \\mu_t^2\\right)^{3/2}}$$$$\\text{KURT}^Q_t = \\frac{e^{rt}X(0,t) - 4\\mu_t e^{rt}W(0,t) + 6\\mu_t^2 e^{rt}V(0,t) - 3\\mu_t^4}{\\left(e^{rt}V(0,t) - \\mu_t^2\\right)^{2}}$$Every quantity on the right hand side — $V$, $W$, $X$, and $\\mu_t$ — is recoverable from the OTM option prices via static replication. No model has been assumed beyond the existence of a risk-neutral measure. In a typical implementation, the integrals are approximated numerically using a discrete set of observed option prices across available strikes.\nIn practice, risk-neutral skewness extracted for an equity or equity index (e.g. SPX) is typically negative, reflecting the market’s tendency to assign higher probability and higher price to large downside moves than to symmetric upside moves. In intuitive terms, skewness captures crash asymmetry: downside moves are sharper, more abrupt, and more expensive to insure than upside moves.\nRisk-neutral kurtosis is typically elevated relative to a normal distribution, capturing the market’s expectation that extreme moves, both positive and negative, occur more frequently than Gaussian assumptions would imply. In this sense, kurtosis measures the frequency and severity of extreme outcomes, independent of direction.\nTogether, these imply that the risk-neutral distribution is left-skewed and fat-tailed, with asymmetric crash risk and elevated probability of extreme events relative to the lognormal benchmark of Black–Scholes.\nConclusion The BKM framework shows that variance, skewness, and kurtosis of the risk-neutral distribution can be recovered directly from vanilla option prices without calibrating a parametric model. The key is Taylor\u0026rsquo;s theorem with integral remainder, which decomposes any smooth payoff into a static portfolio of calls and puts weighted by the payoff\u0026rsquo;s second derivative. In practice, the continuum of strikes is replaced by a discrete set of observed option prices, requiring numerical integration across available strikes.\nReferences Bakshi, G., Kapadia, N., \u0026amp; Madan, D. (2003). Stock return characteristics, skew laws, and the differential pricing of individual equity options. The Review of Financial Studies, 16(1), 101–143. Carr, P., \u0026amp; Madan, D. (2001). Optimal positioning in derivative securities. Quantitative Finance, 1(1), 19–37. This relation holds for non-dividend-paying assets. For dividend-paying equities or indices such as the S\u0026amp;P 500, the no-arbitrage forward price is $S_0 e^{(r-q)t}$ where $q$ is the continuous dividend yield. In that case, the stock and bond terms in equation (3) require adjustment. Throughout this article we maintain the no-dividend assumption for clarity; the extension is straightforward.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/vol_skewness_kurtosis/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eMost of my early intuition about options came from the Black-Scholes model, which is\nclean and widely used. But once I started working with real option data, it becomes\nclear that the Black-Scholes assumption of a lognormal distribution is too restrictive.\nFor a given maturity, the implied volatility is not constant across strikes, and its\nshape suggests asymmetry and heavy tails in the risk-neutral distribution.\u003c/p\u003e\n\u003cp\u003eThat leads to a more basic question. Instead of imposing a parametric model and\ncalibrating its parameters, is there a way to extract volatility, skewness, and kurtosis\ndirectly from option prices in a model-free way? This is where the Bakshi, Kapadia, and\nMadan (2003) framework becomes useful. Their key idea is that smooth payoff functions\ncan be represented as a continuum of vanilla options across strikes. In this view,\nvolatility, skewness, and kurtosis are not model assumptions or calibration outputs.\nThey are quantities that can be recovered from market prices through static option\nreplication.\u003c/p\u003e","title":"From Option Prices to the Shape of Returns: A Model-Free Construction of Volatility, Skewness and Kurtosis"},{"content":"Why This Matters I started writing about calibrating a full volatility surface and realised it first requires a clear understanding of a simpler problem: solving for implied vol from a single option price. At its core, this is a root-finding problem: given a market price, we need to find the volatility that makes the model match that price. Once framed this way, the question becomes how to solve this nonlinear problem efficiently and reliably.\nI learned Newton\u0026rsquo;s method in class, saw it in textbooks, and implemented it in the traders\u0026rsquo; tools at work. Most of the time it just works. I knew convergence wasn\u0026rsquo;t guaranteed, but that stayed abstract until I actually hit an edge case.\nSo the natural question becomes: why does Newton\u0026rsquo;s method break, and what do you use when it does? That\u0026rsquo;s where Brent\u0026rsquo;s method comes in. It\u0026rsquo;s robust, guaranteed to converge, and commonly used in production. This article is my attempt to build an intuition for both: how they converge, why they fail, and how to combine them into something you can actually trust in production.\n1. Newton\u0026rsquo;s Method (Newton–Raphson) The Idea Newton\u0026rsquo;s method is a gradient-based iterative scheme. The basic idea is to linearize the function around a current guess and step to the root of that linear approximation. For implied volatility, we define:\n$$f(\\sigma) = C_\\text{BS}(\\sigma) - C_\\text{mkt}$$Newton\u0026rsquo;s update formula:\n$$\\sigma_{n+1} = \\sigma_n - \\frac{f(\\sigma_n)}{f'(\\sigma_n)}$$comes directly from a first-order Taylor expansion:\n$$f(\\sigma) \\approx f(\\sigma_n) + f'(\\sigma_n)(\\sigma - \\sigma_n)$$Setting this approximation to zero and solving for $\\sigma$ gives the iteration formula above. Intuitively, it moves the current guess toward where the tangent line crosses zero.\nQuadratic Convergence Newton\u0026rsquo;s method converges quadratically near the root. Let $\\hat{\\sigma}$ be the true root of $f(\\hat{\\sigma})=0$ and define the error $e_n = \\sigma_n - \\hat{\\sigma}$. Using a Taylor expansion around the root:\n$$f(\\sigma_n) = f(\\hat{\\sigma} + e_n) = f'(\\hat{\\sigma})e_n + \\frac{1}{2} f''(\\xi_n)e_n^2$$for some $\\xi_n$ between $\\sigma_n$ and $\\hat{\\sigma}$. The next step gives an updated error:\n$$e_{n+1} = \\sigma_{n+1} - \\hat{\\sigma} \\approx -\\frac{f''(\\xi_n)}{2f'(\\hat{\\sigma})}e_n^2$$Thus the error satisfies $e_{n+1} \\propto e_n^2$, proving quadratic convergence: the number of correct digits roughly doubles each iteration once near the root.\nWhen Newton\u0026rsquo;s Method Fails Newton\u0026rsquo;s method is not guaranteed to converge. In practice, I\u0026rsquo;ve seen it converge in a few iterations for a liquid ATM option but oscillate or diverge entirely on a 5-delta wing. Why?\nThis method is derived from a first-order Taylor expansion of the Black–Scholes price around the current volatility estimate:\n$$C_{\\text{BS}}(\\sigma + \\Delta\\sigma) \\approx C_{\\text{BS}}(\\sigma) + \\mathcal{V}\\Delta\\sigma$$ Here vega $\\mathcal{V} = \\frac{\\partial C_{\\text{BS}}}{\\partial \\sigma}$ corresponds to $f'$ in the earlier notation.\nThis gives the Newton step:\n$$\\Delta\\sigma = \\frac{C_{\\text{market}} - C_{\\text{BS}}(\\sigma)}{\\mathcal{V}}$$This update relies on two implicit assumptions: vega is well-defined and nonzero, and the price function is well-approximated by a linear function in volatility over the step size. When either breaks down, Newton\u0026rsquo;s method becomes unstable or inaccurate.\n1. Vega Near Zero — Step Size Instability Vega $\\mathcal{V}$ appears in the denominator of the Newton step. When $\\mathcal{V} \\approx 0$, even a small pricing error produces a large volatility update, causing the iterate to move far from the solution. Vega becomes small when the option is deep in-the-money or deep out-of-the-money, or when time to maturity is very short. In these regimes, the option price is close to intrinsic value and has limited sensitivity to volatility.\nIn this regime, the update becomes ill-conditioned:\n$$\\Delta\\sigma = \\frac{C_{\\text{market}} - C_{\\text{BS}}(\\sigma)}{\\mathcal{V}} \\quad \\text{becomes unstable as } \\mathcal{V} \\to 0$$This is not because the root does not exist, but because the inverse problem becomes poorly conditioned: small errors in price translate into large errors in volatility.\n2. Poor Initial Guess — Curvature (Vomma) Breakdown Newton\u0026rsquo;s method is a local approximation scheme. It relies on truncating the Taylor expansion:\n$$C_{\\text{BS}}(\\sigma + \\Delta\\sigma) \\approx C_{\\text{BS}}(\\sigma) + \\mathcal{V}\\Delta\\sigma + \\frac{1}{2}\\frac{\\partial^2 C_{\\text{BS}}}{\\partial \\sigma^2}(\\Delta\\sigma)^2 + \\cdots$$The second-order term involves vomma, $\\frac{\\partial^2 C_{\\text{BS}}}{\\partial \\sigma^2}$, which measures the curvature of the option price with respect to volatility. When the initial guess is far from the true implied volatility, the Newton step $\\Delta\\sigma$ becomes large. In this regime, the omitted quadratic term is no longer negligible and the linear approximation breaks down.\nAs a result, the tangent line no longer accurately predicts where the price curve intersects the market price level. The Newton update can overshoot the root, placing the next iterate on the opposite side of the solution. Repeated overshoots can lead to oscillation, and in extreme cases, failure to converge.\nThe interactive chart below lets you drag both the true implied vol and the initial guess to observe how the method behaves. It uses a European call with $S = 100$, $K = 130$, $r = 5\\%$, $T = 0.5$ years, and no dividends.\nTrue implied vol (σ*) 70% Initial guess (σ₀) 20% BS price curve C(σ) Market price C_mkt Newton tangent lines Newton iterates Improving Newton\u0026rsquo;s Method in Practice Smart initial guess: Use the Brenner-Subrahmanyam approximation for near-the-money options (discussed in this article) and the Corrado-Miller approximation for options away from the money. A start close to the true implied vol suppresses curvature error on the first Newton step and reduces iteration count significantly. Bounded volatility updates: Clamp the iterate to an admissible interval (e.g. $[10^{-4}, 5]$) after every update, and cap the step size to prevent large single-step overshoots. Robust fallback: When vega falls below a threshold, switch to a bracketing method such as Brent\u0026rsquo;s method to guarantee convergence. This hybrid approach preserves Newton\u0026rsquo;s fast convergence in well-behaved regimes while remaining robust at the boundaries. 2. Brent\u0026rsquo;s Method The Idea Brent\u0026rsquo;s method is best understood not as a single algorithm, but as an adaptive system that combines three root-finding methods — bisection, the secant method, and inverse quadratic interpolation. At each iteration, it selects the most aggressive step that satisfies its safety conditions, and falls back to a more conservative method when those conditions are not met. The result is a solver that is simultaneously guaranteed to converge and capable of superlinear acceleration whenever the function is well-behaved.\nThe foundation is a bracket $[\\sigma_a, \\sigma_b]$ satisfying:\n$$f(\\sigma_a) \\cdot f(\\sigma_b) \u003c 0$$meaning the pricing error $f(\\sigma) = C_\\text{BS}(\\sigma) - C_\\text{mkt}$ changes sign across the interval, guaranteeing a root lies within. This bracket is maintained throughout every iteration — it is the safety guarantee that Newton\u0026rsquo;s method lacks.\nThe Three Building Blocks Brent\u0026rsquo;s method can be viewed as a safeguarded interpolation scheme: bisection guarantees global convergence, while interpolation provides local acceleration whenever the function behaves well.\nBisection — Slow but Unconditionally Converges At each iteration, bisection evaluates $f$ at the midpoint of the current bracket:\n$$\\sigma_\\text{mid} = \\frac{\\sigma_a + \\sigma_b}{2}$$and replaces whichever endpoint shares the same sign as $f(\\sigma_\\text{mid})$, halving the interval. The bracket shrinks by exactly half each step, regardless of the shape of $f$.\nConvergence intuition: Bisection uses only the sign of the function — not its magnitude or slope. It discards almost all quantitative information at each step. This is why it is slow: halving the interval each time gives linear convergence at rate $\\frac{1}{2}$, requiring $\\lceil \\log_2(W/\\varepsilon) \\rceil$ iterations to reduce a bracket of width $W$ to tolerance $\\varepsilon$. For a bracket $[10^{-4}, 5]$ and tolerance $10^{-8}$, that is 29 iterations with no possibility of acceleration.\nSecant Method — Fast but No Guarantee The secant method fits a straight line through the two most recent iterates $(\\sigma_a, f_a)$ and $(\\sigma_b, f_b)$, and steps to where that line crosses zero:\n$$\\sigma_\\text{new} = \\sigma_b - f_b \\frac{\\sigma_b - \\sigma_a}{f_b - f_a}$$Unlike bisection, it uses the actual values of $f$ at both endpoints — not just their signs — to estimate where the root is.\nAlthough this looks similar to Newton\u0026rsquo;s method — replacing the analytic derivative with a finite difference — the secant method is fundamentally a different approach. Newton\u0026rsquo;s method always evaluates the derivative at the current point, anchoring each step to local curvature. The secant method instead draws a chord through the two most recent iterates, making it entirely derivative-free and driven purely by recent function history.\nConvergence intuition: By using function values, the secant method can take a much more informed step than bisection. Rather than depending on a single previous error term, the next error is influenced by both of the two most recent errors. In fact, a local asymptotic analysis shows that near the root, $$ e_{n+1} \\approx Ce_n e_{n-1} $$ for some constant $C$ determined by the local behavior of $f$ . This coupling between successive errors leads to superlinear convergence. In particular, the asymptotic convergence rate is approximately $\\varphi \\approx 1.618$, the golden ratio. Intuitively, each iteration amplifies the effect of the two previous error reductions, producing faster decay than linear methods but slower than quadratic methods like Newton\u0026rsquo;s method. The cost is that without a bracket constraint, the secant step can overshoot the root if $f$ is highly curved between the two points, and there is no convergence guarantee in general.\nInverse Quadratic Interpolation — Fastest but Most Fragile Inverse quadratic interpolation (IQI) fits a quadratic polynomial through the three most recent iterates $(\\sigma_a, f_a)$, $(\\sigma_b, f_b)$, $(\\sigma_c, f_c)$. Critically, it fits $\\sigma$ as a function of $f$ — not $f$ as a function of $\\sigma$ — so that evaluating at $f = 0$ directly yields the next iterate. Using Lagrange interpolation:\n$$\\sigma_\\text{new} = \\sigma_a \\frac{f_b f_c}{(f_a - f_b)(f_a - f_c)} + \\sigma_b \\frac{f_a f_c}{(f_b - f_a)(f_b - f_c)} + \\sigma_c \\frac{f_a f_b}{(f_c - f_a)(f_c - f_b)}$$Convergence intuition: By incorporating a third point and fitting a quadratic, IQI captures the local curvature of $f$ — the information that the secant method ignores. Near the root, a local asymptotic analysis shows that the error satisfies a higher-order nonlinear recurrence involving the three most recent iterates. This leads to an asymptotic convergence order of approximately $q \\approx 1.839$. Intuitively, each additional interpolation point increases the amount of local structure captured by the model, allowing the method to reduce the error more aggressively than two-point methods. The trade-off is that fitting a quadratic through three points can be numerically unstable when the points are poorly spaced. If the corresponding function values are close together or do not adequately span the root, the interpolation becomes ill-conditioned: small differences in the data lead to large changes in the fitted curve. Because IQI effectively extrapolates to $f = 0$, this instability can produce a step that lies far outside the current bracket. This is why IQI is not used in isolation and instead requires a bracketing safeguard.\nHow Brent Combines Them At each iteration, Brent\u0026rsquo;s method first checks whether the bracket width has reduced below a tolerance $\\delta$ — if so, the method terminates. Otherwise, it proposes an IQI step (or secant step if only two distinct function values are available) and accepts it only if two safety conditions hold:\nThe step lands inside the current bracket $[\\sigma_a, \\sigma_b]$ The step represents sufficient progress toward the root — A common implementation heuristic is to reject the step if it is larger than roughly half the previous bracket width, treating such moves as insufficiently controlled. If either condition fails, the method falls back to bisection. This gives Brent\u0026rsquo;s method its core character: it runs as fast as IQI or the secant method when the interpolation is well-behaved, but is guaranteed to make at least the progress of bisection at every step.\n$$\\sigma_\\text{new} = \\begin{cases} \\sigma_\\text{IQI or secant} \u0026 \\text{if both safety conditions are satisfied} \\\\\\\\ \\sigma_\\text{mid} \u0026 \\text{otherwise (bisection fallback)} \\end{cases}$$ Evaluating and Updating the Bracket Once $\\sigma_\\text{new}$ is determined — regardless of which method proposed it — the following three steps are always executed:\nStep 1: Evaluate the pricing error at the new point\n$$f(\\sigma_\\text{new}) = C_\\text{BS}(\\sigma_\\text{new}) - C_\\text{mkt}$$The sign of $f(\\sigma_\\text{new})$ determines which half of the bracket contains the root.\nStep 2: Narrow the bracket using the sign of $f(\\sigma_\\text{new})$\nReplace whichever endpoint shares the same sign as $f(\\sigma_\\text{new})$. The bracket always shrinks after every iteration — this is the convergence guarantee that no interpolation step can violate.\nStep 3: Promote $\\sigma_\\text{new}$ as the best current estimate\n$\\sigma_\\text{new}$ becomes the new endpoint — Brent always keeps the best estimate as one of the bracket endpoints. The previous endpoint becomes the third point available for the next IQI step.\nWhen Brent\u0026rsquo;s Method Struggles Brent\u0026rsquo;s method is significantly more robust than Newton\u0026rsquo;s, but it is not without limitations.\n1. Slower Convergence Near the Root Once Newton\u0026rsquo;s method enters its quadratic convergence regime it is faster than Brent. For applications requiring very high precision — such as calibrating a vol surface across many strikes simultaneously — the difference in iteration count can matter. Brent\u0026rsquo;s superlinear rate means it typically requires more iterations than Newton to achieve the same terminal accuracy, assuming Newton does not encounter the failure modes described above.\n2. Requires a Valid Initial Bracket Brent\u0026rsquo;s method requires two initial values $\\sigma_a$ and $\\sigma_b$ such that $f(\\sigma_a) f(\\sigma_b) \u003c 0$. In practice, for implied volatility, constructing such a bracket is rarely difficult because economically reasonable volatility ranges are well known (e.g. $[10^{-4}, 5]$). As a result, bracket initialization is usually a one-time evaluation rather than a significant computational overhead. However, it remains a structural requirement of the method, in contrast to Newton\u0026rsquo;s method, which only requires a single starting point.\nBrent vs Newton: When to Use Which Neither method dominates in all regimes. The choice depends on the structure of the problem.\nScenario Preferred Method Near-the-money, liquid option Newton — fast quadratic convergence Deep ITM / OTM, short expiry Brent — robust when vega is near zero Poor or unknown initial guess Brent — bracketing guarantees convergence High-precision vol surface calibration Newton with smart initial guess, provided vega is well-conditioned Production solver requiring robustness Hybrid: Newton with Brent fallback In practice, a well-engineered implied vol solver uses Newton\u0026rsquo;s method as the primary engine and falls back to Brent when vega is too small or the Newton iterate leaves the admissible domain. This hybrid approach inherits the speed of Newton in normal regimes and the reliability of Brent at the boundaries.\nA natural question is why not use bisection alone as the fallback rather than Brent. The answer is that bisection is reliable but slow. Brent already contains bisection as its internal worst-case fallback, but accelerates with inverse quadratic interpolation whenever safe to do so. In practice, using Brent as a fallback provides a more structured and generally faster alternative to pure bisection, while maintaining the same global convergence guarantees.\nFor a full derivation of the convergence rate for secant method or IQI, feel free to contact me.\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/newton_vs_brent_vol_solver/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eI started writing about calibrating a full volatility surface and realised it first requires a clear understanding of a simpler problem: solving for implied vol from a single option price. At its core, this is a root-finding problem: given a market price, we need to find the volatility that makes the model match that price. Once framed this way, the question becomes how to solve this nonlinear problem efficiently and reliably.\u003c/p\u003e","title":"Solving for Implied Volatility: Newton's Method vs Brent's Method"},{"content":"Why This Matters While practitioners price American puts correctly in production systems, the deeper question of why early exercise is sometimes optimal, and the precise conditions under which it occurs, is less often articulated rigorously. This article works through the argument, starting with why early exercise is never optimal for calls without dividend, and then showing, using the Black–Scholes PDE, when and why it becomes mandatory for puts.\nFor those working with options pricing, hedging, or products with embedded American optionality, a rigorous understanding of the early exercise boundary can offer useful intuition beyond what standard pricing tools provide.\nPut–Call Parity For European options on a non-dividend-paying stock, put–call parity states:\n$$C_0 + Ke^{-rT} = P_0 + S_0$$Rearranging, we get:\n$$C_0 = S_0 + P_0 - Ke^{-rT}$$This shows that a European call can be thought of as:\nOwning the stock Holding a European put (downside protection) Deferring payment of the strike (earning interest on $K$ until maturity) The value of a put option must be non-negative. So from the rearranged parity equation, we can get a lower bound for the European call option:\n$$C_0 \\geq S_0 - Ke^{-rT}$$ Why Early Exercise of an American Call Is Suboptimal Consider an American call. Its holder can exercise early, but is it ever optimal?\nExercising early gives a payoff of $S_t - K$ at time $t \u003c T$. If the holder does not exercise, the option is worth at least $S_t - Ke^{-r(T-t)}$, which is more valuable than the exercised payoff. Since the stock pays no dividends, there is no economic benefit to holding the stock earlier. Exercising the call early would forfeit the time value of the option and the interest on $K$, making it suboptimal. Therefore, the American call is never exercised early, and its price equals that of the European call.\nWhen This Breaks Down The result changes if the stock pays dividends. Dividends reduce the stock price on the ex-dividend date, and option holders do not receive dividends. Exercising just before a dividend can therefore be advantageous sometimes — the early exercise premium becomes positive, and the American call is worth more than its European counterpart.\nWhy American Puts Are Worth More Than European Puts For puts, the logic reverses. Waiting is no longer free — the holder defers receiving $K$, rather than deferring paying it. Every period the put is held unexercised, the holder foregoes interest on $K$. When that cost exceeds the remaining benefit of holding, early exercise is optimal.\nTo see this concretely, consider a put deep in-the-money with $S_t \\approx 0$:\nExercising immediately yields $K - S_t \\approx K$, which can be invested at rate $r$ Waiting until maturity to receive $K - S_T$ means forgoing interest on $K$, while the stock can barely fall further The cost of waiting is real and quantifiable. The benefit of waiting has nearly vanished. Early exercise dominates.\nThe General Early Exercise Condition More generally, think of holding the put as a trade-off between two things:\nThe cost of waiting: every period the option is held unexercised, the holder foregoes interest proportional to $r(K-S)$\nThe benefit of waiting: the stock could fall further, increasing the payoff. This is the option\u0026rsquo;s remaining time value — the value of continued optionality.\nEarly exercise is optimal whenever the cost exceeds the benefit. When $S \\approx 0$, the remaining optionality collapses to zero and this condition is trivially satisfied. But the same crossover occurs more broadly — whenever the put is sufficiently deep in-the-money, interest rates are high, or volatility is low enough that the time value of waiting no longer justifies the interest cost.\nThe Black–Scholes PDE Perspective The Black–Scholes framework gives a precise, formal account of why early exercise becomes optimal. Under the Black–Scholes assumptions, any derivative $P(S,t)$ on a non-dividend-paying stock satisfies the PDE:\n\\[ \\underbrace{\\frac{\\partial P}{\\partial t}}_{\\text{Time Decay}} + \\underbrace{\\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 P}{\\partial S^2}}_{\\text{Convexity Gain (Gamma)}} + \\underbrace{rS\\frac{\\partial P}{\\partial S}}_{\\text{Drift of Underlying}} - \\underbrace{rP}_{\\text{Carry Cost}} = 0 \\] This PDE is derived by constructing a delta-hedged portfolio and requiring that its value grows at the risk-free rate under no-arbitrage. Each term has a financial meaning:\nTime Decay $\\frac{\\partial P}{\\partial t}$: the rate at which the option loses value as expiry approaches, holding $S$ fixed Convexity Gain $\\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 P}{\\partial S^2}$: the gain from being long gamma — because the put is convex in $S$, the holder benefits on average from large moves in either direction Drift of Underlying $rS\\frac{\\partial P}{\\partial S}$: since $\\frac{\\partial P}{\\partial S} \u003c 0$ for a put, the risk-neutral upward drift of the stock works against the put holder Carry Cost $-rP$: the opportunity cost of holding the option rather than investing its value at the risk-free rate The PDE is a balance condition: the convexity gain from being long gamma exactly offsets the combined drag from time decay, adverse drift, and carry cost.\nThe Hold Region and the Exercise Region For a European put, the PDE holds everywhere — the holder has no choice but to wait. For an American put, the holder has agency, and the picture splits into two regions.\nIn the hold (continuation) region, where $P(S,t) \u003e K - S$, the option is worth more alive than dead. The convexity gain is sufficient to justify the carry cost and the drag from drift. The PDE holds with equality:\n$$\\frac{\\partial P}{\\partial t} + \\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 P}{\\partial S^2} + rS\\frac{\\partial P}{\\partial S} - rP = 0$$In the exercise (stopping) region, where $P(S,t) = K - S$, the balance breaks down. Substituting $P = K - S$:\n$\\frac{\\partial P}{\\partial t} = 0$ — intrinsic value has no time decay $\\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 P}{\\partial S^2} = 0$ — intrinsic value is linear in $S$, so gamma is zero $rS\\frac{\\partial P}{\\partial S} = -rS$ — the delta of $K - S$ is $-1$ $-rP = -r(K-S)$ The PDE evaluates to:\n$$0 + 0 + (-rS) - r(K - S) = -rK \u003c 0$$The equality strictly fails. With gamma zero and drift negligible, the carry cost $rK$ is entirely uncompensated. The PDE inequality signals that continuation is dominated by immediate exercise.\nThe $S \\to 0$ Case: A Formal Illustration To see this most clearly, assume $S \\approx 0$ and suppose the holder continues to hold. As $S \\to 0$, the gamma and drift terms vanish:\n$$\\frac{1}{2}\\sigma^2 S^2 \\frac{\\partial^2 P}{\\partial S^2} \\to 0, \\qquad rS\\frac{\\partial P}{\\partial S} \\to 0$$The PDE reduces to:\n$$\\frac{\\partial P}{\\partial t} - rP = 0 \\implies \\frac{\\partial P}{\\partial t} \\approx rK$$This says the option\u0026rsquo;s value must be growing at rate $rK$ per unit time. But that is impossible — when $S \\approx 0$, the put is already worth approximately $K$, its maximum possible value. There is no room left to grow. The assumption of holding cannot be sustained, and the option must be exercised.\nNote that $S \\to 0$ is an extreme case that makes the argument unambiguous. The same logic applies more broadly: whenever the put is sufficiently deep in-the-money, interest rates are high, or volatility is low enough that the remaining optionality has eroded below the the opportunity cost of delaying the receipt of $K$, early exercise is optimal.\nPractical Implications This result has direct consequences for practitioners:\nOptions pricing: American puts must be priced using methods that account for early exercise — binomial trees, finite difference methods, or approximation formulas. Using Black–Scholes directly will systematically underprice them, with the error growing as the put goes deeper in-the-money or as interest rates rise.\nHedging: The delta of an American put in the stopping region is $-1$ — the option moves one-for-one with the stock. A hedger treating it as a live option with a partial delta will be systematically underhedged.\nStructured products: Any product with embedded American put optionality requires careful treatment of the early exercise boundary. Ignoring it introduces model risk that can be material in high rate environments.\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/american_vs_european_options/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eWhile practitioners price American puts correctly in production systems, the deeper question of \u003cem\u003ewhy\u003c/em\u003e early exercise is sometimes optimal, and the precise conditions under which it occurs, is less often articulated rigorously. This article works through the argument, starting with why early exercise is never optimal for calls without dividend, and then showing, using the Black–Scholes PDE, when and why it becomes mandatory for puts.\u003c/p\u003e\n\u003cp\u003eFor those working with options pricing, hedging, or products with embedded American optionality, a rigorous understanding of the early exercise boundary can offer useful intuition beyond what standard pricing tools provide.\u003c/p\u003e","title":"Early Exercise of American Options: Call Equivalence and the Put Premium"},{"content":"Why This Matters Brownian motion, the mathematical model underlying everything from stock prices to heat diffusion, has one of its most elegant properties: the variance of its position at time $t$ grows linearly with time. Not $t^2$, not $\\sqrt{t}$, but exactly $t$. This seemingly abstract fact has a concrete consequence in financial markets: under the idealised conditions of an at-the-money option with zero rates, it is precisely why option prices scale with $\\sqrt{T}$ rather than $T$, a direct fingerprint of Brownian motion inside Black-Scholes. Understanding why requires looking at both physical observations and the mathematical construction of Brownian motion.\nPhysical Motivation Brownian motion was first observed by Robert Brown in 1827 as the erratic motion of microscopic particles in water. Einstein (1905) quantified it, showing that a particle\u0026rsquo;s mean squared displacement grows proportionally to time:\n$$ \\mathbb{E}[(X(t) - X(0))^2] \\propto t $$Einstein\u0026rsquo;s key insight was that this linear scaling follows from particles receiving many small, independent random kicks. Because the kicks are independent, their variances add, so the total variance grows linearly with the number of kicks, and hence with time. The Central Limit Theorem also tells us that the sum of many small independent kicks is approximately Gaussian, which is why the displacement of a Brownian particle is normally distributed. This linear growth in variance is the hallmark of diffusive motion, and it is the same additivity argument that drives the random-walk construction below.\nDefinition of Standard Brownian Motion Mathematically, a standard Brownian motion $B(t)$ is a continuous-time stochastic process defined by:\n$B(0) = 0$. Independent increments: The increment $B(t+s) - B(s)$ is independent of the past. Normally distributed increments: $B(t+s) - B(s) \\sim N(0, t)$, where the variance equals the length of the interval. Notice that the last condition encodes linear variance growth: an increment over a time interval of length $\\Delta t$ has variance $\\Delta t$. But why is this the correct scaling rather than an arbitrary choice? The answer comes from modeling Brownian motion as a limit of a random walk.\nBrownian Motion as a Limit of a Random Walk Consider a simple symmetric random walk:\n$$ S_n = X_1 + X_2 + \\dots + X_n $$where each $X_i$ is $\\pm 1$ with equal probability. Because the steps are independent:\n$$ \\text{Var}(S_n) = \\text{Var}(X_1) + \\dots + \\text{Var}(X_n) = n $$Variance grows linearly with the number of steps.\nScaling Step Size to Match Time To approximate Brownian motion over a fixed time horizon $t$, divide it into $n$ small steps of length:\n$$ \\Delta t = \\frac{t}{n} $$Now, assign a step size $\\delta$ to each increment such that the total variance matches the observed linear growth in $t$. Let:\n$$ \\delta = \\sqrt{\\Delta t} = \\sqrt{\\frac{t}{n}} $$Then, the Brownian motion approximation is:\n$$ B(t) \\approx \\delta (X_1 + X_2 + \\dots + X_n) $$and the total variance becomes:\n$$ \\text{Var}[B(t)] = n \\cdot \\delta^2 = n \\cdot \\frac{t}{n} = t $$This shows that the step size must scale as $\\sqrt{\\Delta t}$ to reproduce the observed linear growth of variance in continuous time. Moreover, as $n \\to \\infty$, the Central Limit Theorem guarantees that the normalised sum $\\delta(X_1 + \\dots + X_n)$ converges in distribution to a Gaussian — which is why the limit is not merely variance-correct, but fully normally distributed, justifying the $N(0, t)$ increments in the formal definition above.\nWhy Other Step Sizes Fail Step size Resulting variance Problem Constant $c$ $c^2 n$ Grows without bound as $n \\to \\infty$; no well-defined limit. $1/n$ $(1/n)^2 n = 1/n$ Variance → 0; the process becomes deterministic. $t/\\sqrt{n}$ $(t/\\sqrt{n})^2 n = t^2$ Quadratic growth in time; inconsistent with physical observation. Only $\\sqrt{\\Delta t}$ gives variance proportional to $t$, consistent with both physical observation and the continuous-time limit.\nApplication: Option Price Scaling with Expiry A concrete place where the $\\sqrt{T}$ scaling shows up in practice is in the price of at-the-money options. To isolate the effect of Brownian motion cleanly, we work under two idealising assumptions: the option is struck exactly at the current price ($S = K$), and the risk-free rate is zero ($r = 0$). In practice, implied volatility varies across expiries and rates are non-zero, both of which distort the pure $\\sqrt{T}$ relationship — but stripping these away lets the Brownian motion signature come through clearly.\nUnder these assumptions, consider a European call option on a stock with no drift. Intuitively, the option\u0026rsquo;s value should grow with time-to-expiry $T$ — more time means more opportunity for the stock to move in your favour. But by how much? If variance grew as $T^2$, prices would scale linearly with $T$; if it were constant, prices would not change with expiry at all. Because Brownian motion gives $\\text{Var}(B_T) = T$, the standard deviation of the stock\u0026rsquo;s position scales as $\\sqrt{T}$, and since an ATM option\u0026rsquo;s value is essentially compensating the seller for the expected absolute deviation of the terminal stock price, the price scales accordingly.\nThe table below shows Black-Scholes ATM call prices for $S = K = 100$, $\\sigma = 20\\%$, $r = 0$:\nExpiry $T$ Call Price Ratio to 1-month price 1 month 2.31 1.00 4 months 4.62 2.00 9 months 6.93 3.00 16 months 9.24 4.00 The expiry quadruples from 1 to 4 months, yet the price only doubles — exactly the $\\sqrt{T}$ signature of Brownian motion. This is not an approximation artifact.\nWhy the ATM Black-Scholes Formula Reduces to $\\sigma\\sqrt{T}$ Under the two conditions $S = K$ and $r = 0$, the Black-Scholes $d_1$ and $d_2$ terms simplify considerably. Recall:\n$$ d_1 = \\frac{\\ln(S/K) + (r + \\frac{1}{2}\\sigma^2)T}{\\sigma\\sqrt{T}}, \\qquad d_2 = d_1 - \\sigma\\sqrt{T} $$Setting $S = K$ (so $\\ln(S/K) = 0$) and $r = 0$:\n$$ d_1 = \\frac{\\frac{1}{2}\\sigma^2 T}{\\sigma\\sqrt{T}} = \\frac{\\sigma\\sqrt{T}}{2}, \\qquad d_2 = -\\frac{\\sigma\\sqrt{T}}{2} $$The call price formula $C = S \\cdot N(d_1) - K \\cdot e^{-rT} N(d_2)$ then becomes:\n$$ C = S \\left( N\\left(\\tfrac{\\sigma\\sqrt{T}}{2}\\right) - N\\left(-\\tfrac{\\sigma\\sqrt{T}}{2}\\right) \\right) = S \\left( 2N\\left(\\tfrac{\\sigma\\sqrt{T}}{2}\\right) - 1 \\right) $$The price is an exact function of $\\sigma\\sqrt{T}$ alone — there is no separate dependence on $\\sigma$ or $T$ individually, only through their product $\\sigma\\sqrt{T}$. Doubling $T$ is equivalent to doubling $\\sigma^2$. To make the $\\sqrt{T}$ dependence fully explicit, apply a first-order Taylor expansion of $N(x)$ around $x = 0$. Since $N'(x) = \\phi(x)$ and $\\phi(0) = \\frac{1}{\\sqrt{2\\pi}}$, we have $N(x) \\approx \\frac{1}{2} + \\frac{1}{\\sqrt{2\\pi}} x$ for small $x$, and therefore:\n$$ 2N(x) - 1 \\approx \\sqrt{\\frac{2}{\\pi}}x $$Substituting $x = \\frac{\\sigma\\sqrt{T}}{2}$ gives the leading-order ATM call price:\n$$ C \\approx S \\cdot \\sqrt{\\frac{2}{\\pi}} \\cdot \\frac{\\sigma\\sqrt{T}}{2} = \\frac{S\\sigma\\sqrt{T}}{\\sqrt{2\\pi}} $$with $\\frac{\\sigma}{\\sqrt{2\\pi}}$ acting as a simple proportionality constant. The approximation is accurate for $\\sigma\\sqrt{T} \\leq 0.3$ (e.g. 20% vol with under roughly two years to expiry), and deteriorates for long-dated or high-vol options where the cubic correction term in the Taylor expansion becomes material.\nAny model that replaced Brownian motion with a process whose variance scaled differently would produce option prices inconsistent with this pattern, which is one reason the continuous-time Brownian framework is so deeply embedded in derivatives pricing.\nConnection to Brenner–Subrahmanyam The relationship between ATM call price and $\\sigma\\sqrt{T}$ is exactly what motivates the Brenner–Subrahmanyam initial guess for implied volatility:\n$$ \\sigma_0 \\approx \\frac{C_\\text{mkt}}{S} \\cdot \\sqrt{\\frac{2\\pi}{T}} $$By taking the observed market price and inverting this relationship, we get a smart starting point for Newton-Raphson iterations when solving for implied volatility. For a deeper dive into solving for implied vol, including Newton vs Brent methods, see my article: Solving for Implied Volatility: Newton\u0026rsquo;s Method vs Brent\u0026rsquo;s Method\nConclusion The variance of Brownian motion grows linearly with time because it models the diffusive behaviour of real particles, and the $\\sqrt{\\Delta t}$ scaling in the random-walk limit ensures this property holds in continuous time. This is not merely a mathematical convenience — it is the property that ties the abstract construction directly to observable phenomena, from the spread of particle positions to the pricing of financial derivatives. The $\\sqrt{T}$ signature appears wherever diffusion governs the dynamics, and recognising it is one of the more transferable intuitions in quantitative finance.\n","permalink":"https://inflection-quant.pages.dev/articles/quant-foundations/understanding_brownian_motion/","summary":"\u003ch2 id=\"why-this-matters\"\u003eWhy This Matters\u003c/h2\u003e\n\u003cp\u003eBrownian motion, the mathematical model underlying everything from stock prices to heat diffusion, has one of its most elegant properties: the variance of its position at time $t$ grows linearly with time. Not $t^2$, not $\\sqrt{t}$, but exactly $t$. This seemingly abstract fact has a concrete consequence in financial markets: under the idealised conditions of an at-the-money option with zero rates, it is precisely why option prices scale with $\\sqrt{T}$ rather than $T$, a direct fingerprint of Brownian motion inside Black-Scholes. Understanding why requires looking at both \u003cstrong\u003ephysical observations\u003c/strong\u003e and the \u003cstrong\u003emathematical construction\u003c/strong\u003e of Brownian motion.\u003c/p\u003e","title":"Brownian Motion: From Random Walks to Option Prices"},{"content":"From Quant Insights to Real-World Solutions\nI help trading desks and quant teams tackle complex problems by designing models, frameworks, and tools that deliver measurable business impact. My work spans derivative pricing, risk management, and translating models into production systems, bridging the gap between quantitative ideas and their execution in live trading and risk environments.\nWhat I Do Many quant ideas are conceptually strong but challenging to implement in practice. I focus on solving real-world problems that have tangible business impact, turning models into usable tools and solutions. I enjoy diving into the details of implementation, because the process often uncovers practical insights that make the final solution more robust and effective.\nSelected Work:\nBuilding cross-asset quantitative investment strategy (QIS) production infrastructure Developing intraday risk and P\u0026amp;L attribution tools for trading desks Pricing linear and non-linear energy derivatives Designing and implementing interest rate (IR) risk hedging strategies for commodity desks Implementing hourly power swaps in Openlink Modeling counterparty credit risk for commodity derivatives Writing I write about two types of topics:\nQuantitative Finance Concepts\nI enjoy exploring the math behind models: stochastic processes, derivatives pricing and risk modeling. Writing helps me clarify my own understanding while sharing insights with others.\nPractical Industry Insights\nI also write about lessons from building and using quant systems in practice: the challenges, the model choices, and the surprises you only notice when theory meets reality.\n*Articles are written with AI-assisted drafting and editing where helpful. All analysis, interpretations, and conclusions are my own.\nGet In Touch I love questions, ideas, and collaboration. Feel free to reach out.\n→ inflection.quant@gmail.com\n","permalink":"https://inflection-quant.pages.dev/_index_consulting_version/","summary":"\u003cp\u003e\u003cstrong\u003eFrom Quant Insights to Real-World Solutions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eI help trading desks and quant teams tackle complex problems by designing models, frameworks, and tools that deliver measurable business impact. My work spans derivative pricing, risk management, and translating models into production systems, bridging the gap between quantitative ideas and their execution in live trading and risk environments.\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"what-i-do\"\u003eWhat I Do\u003c/h2\u003e\n\u003cp\u003eMany quant ideas are conceptually strong but challenging to implement in practice. I focus on solving real-world problems that have tangible business impact, turning models into usable tools and solutions. I enjoy diving into the details of implementation, because the process often uncovers practical insights that make the final solution more robust and effective.\u003c/p\u003e","title":""}]