Volatility Modelling

From Stylised Facts to GARCH

Author

Barry Quinn

Published

March 27, 2026

1 Introduction

Volatility : the tendency of asset prices to fluctuate : is one of the most important concepts in finance. It matters for risk management, option pricing, portfolio construction, and regulatory capital. Yet volatility is not directly observable; we must estimate it from price data.

Why this chapter matters: In the Foundations chapter, we introduced the Three Prediction Problems in finance: predicting the mean (returns), the variance (volatility), and cross-sectional variation (which assets outperform). We saw that returns have almost no predictable signal (~1-2% R²), making ARIMA largely useless for return prediction. Volatility is different. The conditional variance is genuinely predictable (~15-40% R²), and this predictability has direct economic value for options pricing, risk management, and portfolio allocation. This chapter focuses on the second prediction problem: where the signal actually exists.

This chapter develops your understanding of volatility from three perspectives: empirically, by examining what patterns we observe in financial return volatility; theoretically, by showing how ARCH and GARCH models capture these patterns; and practically, by demonstrating how to estimate, forecast, and apply volatility models. By the end of this chapter, you will understand why GARCH(1,1) has become the workhorse model for volatility in finance, and when more sophisticated approaches are needed.

NoteView Slides

Open the lecture deck: Week 4: Volatility Modelling

TipConnection to Week 3

This chapter builds directly on the time series foundations from Week 3. There you learned that the ACF of returns is near zero (no signal in the mean), but the ACF of squared returns shows strong persistence. That persistence is volatility clustering: and modelling it is what this chapter is about.

2 Stylised Facts of Financial Volatility

Before building models, we must understand what we’re trying to capture. Financial return volatility exhibits several well-documented patterns : stylised facts : that any good model should reproduce.

2.1 Volatility Clustering

Perhaps the most striking feature of financial returns is volatility clustering: large returns (positive or negative) tend to be followed by large returns, and small returns tend to be followed by small returns.

Tsay (2010) describes this as: “Volatility is not constant over time. There are periods of high volatility alternating with periods of relative calm.”

This pattern is visible in virtually every financial time series. The figure below demonstrates clustering in the Bloomberg equity data:

Show code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load Bloomberg database
df = load_bloomberg()

# Get SPY returns
spy = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
returns = spy['PX_LAST'].pct_change().dropna()

# Plot returns
fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)

# Returns
axes[0].plot(returns.index, returns.values, linewidth=0.5, color='steelblue')
axes[0].axhline(0, color='gray', linestyle='--', linewidth=0.5)
axes[0].set_ylabel('Daily Return')
axes[0].set_title('SPY Daily Returns: Volatility Clustering')

# Absolute returns (proxy for volatility)
axes[1].plot(returns.index, np.abs(returns.values), linewidth=0.5, color='coral')
axes[1].set_ylabel('|Return|')
axes[1].set_xlabel('Date')
axes[1].set_title('Absolute Returns: Clusters of High and Low Volatility')

plt.tight_layout()
plt.show()
Figure 1: Volatility clustering in SPY returns: Large moves cluster together

The autocorrelation structure confirms this pattern: while returns themselves show little autocorrelation (consistent with market efficiency), squared returns are highly autocorrelated:

Show code
from statsmodels.graphics.tsaplots import plot_acf

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# ACF of returns
plot_acf(returns.dropna(), ax=axes[0], lags=20, title='ACF: Returns')

# ACF of squared returns
plot_acf(returns.dropna()**2, ax=axes[1], lags=20, title='ACF: Squared Returns')

plt.tight_layout()
plt.show()
Figure 2: Returns show no autocorrelation; squared returns do

2.2 Fat Tails (Leptokurtosis)

Financial returns consistently exhibit fatter tails than the normal distribution predicts. Extreme events : crashes, spikes : occur far more frequently than a Gaussian model would suggest.

Tsay (2010) notes: “Kurtosis often exceeds three (the kurtosis of a normal distribution), and often exceeds three by a substantial margin.”

Show code
from scipy import stats

# Calculate statistics for multiple assets
tickers = ['AAPL', 'GOOGL', 'MSFT', 'SPY']
results = []

for ticker in tickers:
    asset = df[df['ticker'] == ticker].set_index('date')['PX_LAST']
    ret = asset.pct_change().dropna()
    
    results.append({
        'Ticker': ticker,
        'Mean (%)': f"{ret.mean()*100:.3f}",
        'Std (%)': f"{ret.std()*100:.2f}",
        'Skewness': f"{stats.skew(ret):.2f}",
        'Kurtosis': f"{stats.kurtosis(ret):.2f}",
        'Normal?': 'Yes' if stats.kurtosis(ret) < 1 else 'No'
    })

pd.DataFrame(results)
Table 1: Financial returns exhibit excess kurtosis
Ticker Mean (%) Std (%) Skewness Kurtosis Normal?
0 AAPL 0.119 1.92 -0.00 5.10 No
1 GOOGL 0.090 1.94 -0.06 3.87 No
2 MSFT 0.107 1.82 -0.02 6.76 No
3 SPY 0.052 1.23 -0.55 11.50 No

2.3 The Leverage Effect

Volatility responds asymmetrically to returns: negative returns tend to increase volatility more than positive returns of the same magnitude. This leverage effect was first documented by Black (1976).

Several theoretical explanations have been proposed:

Theory Mechanism Key Insight
Leverage hypothesis When prices fall, debt/equity ratio rises mechanically Firms become riskier → volatility increases
Volatility feedback If volatility is priced, expected volatility increase raises required return Causation runs both ways: high volatility → lower prices
Risk premium channel Higher expected volatility demands higher risk premium Current price falls to deliver higher expected return
Behavioural asymmetry Investors react more strongly to losses (prospect theory) Bad news generates more trading, uncertainty
Margin constraints Downturns trigger margin calls, forced selling Amplification through deleveraging cascades

The leverage hypothesis is the most cited explanation, but Campbell and Hentschel (1992) and Bekaert and Wu (2000) show that volatility feedback may be equally or more important. In practice, all mechanisms likely operate simultaneously, reinforcing the asymmetric pattern.

Show code
# Calculate rolling volatility and lagged returns
spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
spy_df['return'] = spy_df['PX_LAST'].pct_change()
spy_df['rolling_vol'] = spy_df['return'].rolling(20).std()
spy_df['lagged_return'] = spy_df['return'].shift(1)

# Bin by lagged return
spy_clean = spy_df.dropna()
spy_clean['return_bin'] = pd.qcut(spy_clean['lagged_return'], q=5, labels=['Very Neg', 'Neg', 'Neutral', 'Pos', 'Very Pos'])

# Plot
fig, ax = plt.subplots(figsize=(8, 5))
spy_clean.groupby('return_bin')['rolling_vol'].mean().plot(kind='bar', ax=ax, color='steelblue')
ax.set_xlabel('Previous Day Return')
ax.set_ylabel('Average 20-Day Volatility')
ax.set_title('Leverage Effect: Negative Returns Lead to Higher Volatility')
ax.tick_params(axis='x', rotation=0)
plt.tight_layout()
plt.show()
Figure 3: Negative returns increase volatility more than positive returns

3 The ARCH Model

3.1 Motivation: Conditional Heteroskedasticity

Classical econometrics assumes homoskedasticity : constant variance of errors. But volatility clustering tells us this assumption fails for financial data. The variance is not constant; it changes over time in predictable ways.

The key insight of Engle (1982) was to distinguish between unconditional and conditional variance:

  • Unconditional variance: The long-run average variance (constant)
  • Conditional variance: The variance right now, given what we know (time-varying)
ImportantEngle’s Key Insight

While the unconditional variance of returns may be constant, the conditional variance : what we expect given recent information : varies over time in a way we can model.

3.2 The ARCH(q) Specification

The AutoRegressive Conditional Heteroskedasticity model specifies:

\[r_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim N(0,1)\]

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \alpha_2 \varepsilon^2_{t-2} + \cdots + \alpha_q \varepsilon^2_{t-q}\]

Where:

  • \(r_t\) is the return at time \(t\)
  • \(\sigma^2_t\) is the conditional variance at time \(t\)
  • \(\varepsilon^2_{t-1}, \ldots, \varepsilon^2_{t-q}\) are past squared shocks

Interpretation: Today’s volatility depends on recent surprises. If yesterday had a large shock (positive or negative), today’s volatility will be elevated.

4 The GARCH Model

4.1 From ARCH to GARCH

Bollerslev (1986) extended ARCH by including lagged conditional variances:

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

This is GARCH(1,1) : one ARCH term, one GARCH term. Brooks (2019) notes:

“A GARCH(1,1) model will be sufficient to capture the volatility clustering in the data, and rarely is any higher order model estimated or even entertained in the academic finance literature.”

4.2 Parameter Interpretation

Parameter Name Interpretation
\(\alpha_0\) Constant Long-run variance floor
\(\alpha_1\) ARCH term Reaction to recent shocks
\(\beta_1\) GARCH term Persistence of volatility
\(\alpha_1 + \beta_1\) Persistence How long shocks affect volatility
TipRule of Thumb

For most financial assets, \(\alpha_1 + \beta_1\) is close to (but less than) 1, meaning volatility shocks are highly persistent. Values above 0.9 are typical.

4.3 Why GARCH(1,1) Often Suffices

GARCH(1,1) is remarkably effective for three reasons: parsimony (only 3 parameters capture complex volatility dynamics), memory (the recursive structure implicitly uses the entire history), and mean reversion (volatility eventually returns to its long-run level).

Show code
from arch import arch_model

# Fit GARCH(1,1) to SPY
returns_pct = returns * 100  # Scale for numerical stability

model = arch_model(returns_pct.dropna(), vol='Garch', p=1, q=1, mean='Constant')
result = model.fit(disp='off')

# Extract conditional volatility
cond_vol = result.conditional_volatility

# Plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(returns_pct.dropna().index, np.abs(returns_pct.dropna().values), 
        alpha=0.3, label='|Returns|', color='steelblue')
ax.plot(cond_vol.index, cond_vol.values, 
        label='GARCH(1,1) Conditional Volatility', color='coral', linewidth=1.5)
ax.legend()
ax.set_ylabel('Volatility (%)')
ax.set_title('GARCH(1,1) Fitted Volatility vs Absolute Returns')
plt.tight_layout()
plt.show()

print(result.summary().tables[1])
Figure 4

4.4 Stationarity Conditions

For GARCH(1,1) to be covariance stationary, we require:

\[\alpha_1 + \beta_1 < 1\]

When this holds, the unconditional variance exists:

\[\sigma^2 = \frac{\alpha_0}{1 - \alpha_1 - \beta_1}\]

If \(\alpha_1 + \beta_1 = 1\), we have Integrated GARCH (IGARCH) : shocks persist forever.

5 Asymmetric GARCH Models

Standard GARCH treats positive and negative shocks symmetrically. But the leverage effect suggests this is wrong. Several extensions address this:

5.1 GJR-GARCH

Glosten, Jagannathan, and Runkle (1993) add an indicator for negative shocks:

\[\sigma^2_t = \alpha_0 + (\alpha_1 + \gamma I_{t-1}) \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

Where \(I_{t-1} = 1\) if \(\varepsilon_{t-1} < 0\) (negative shock). The parameter \(\gamma\) captures the additional impact of bad news.

5.2 EGARCH

Nelson (1991) use logarithms to ensure positivity:

\[\ln(\sigma^2_t) = \alpha_0 + \alpha_1 \left( \frac{|\varepsilon_{t-1}|}{\sigma_{t-1}} - \sqrt{2/\pi} \right) + \gamma \frac{\varepsilon_{t-1}}{\sigma_{t-1}} + \beta_1 \ln(\sigma^2_{t-1})\]

The \(\gamma\) term directly captures asymmetry.

5.3 Visualising Asymmetry: News Impact Curves

The news impact curve (Pagan and Schwert (1990)) provides a powerful way to visualise how different shocks affect future volatility. It plots next-period volatility against various values of the previous shock, holding other factors constant.

For a symmetric GARCH model, the curve is a parabola centred at zero : positive and negative shocks of equal magnitude have identical effects. For asymmetric models (GJR, EGARCH), the curve tilts, showing that negative shocks have larger impact.

Show code
# Simulated news impact curves for illustration
shocks = np.linspace(-3, 3, 100)

# Assume typical parameters
alpha0, alpha1, beta1 = 0.00001, 0.08, 0.90
gamma = 0.10  # Asymmetry parameter for GJR

# GARCH: symmetric response
garch_vol = alpha0 + alpha1 * shocks**2

# GJR: asymmetric - larger response to negative shocks
gjr_vol = alpha0 + (alpha1 + gamma * (shocks < 0)) * shocks**2

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(shocks, garch_vol * 10000, label='GARCH (symmetric)', color='steelblue', linewidth=2)
ax.plot(shocks, gjr_vol * 10000, label='GJR (asymmetric)', color='coral', linewidth=2)
ax.axvline(0, color='gray', linestyle='--', linewidth=0.5)
ax.set_xlabel('Previous Shock (standardised)')
ax.set_ylabel('Next-Period Variance (×10⁴)')
ax.set_title('News Impact Curves: Asymmetry in Volatility Response')
ax.legend()
plt.tight_layout()
plt.show()
Figure 5: News impact curves: GARCH (symmetric) vs GJR (asymmetric)

The news impact curve reveals that under GJR, a negative shock of magnitude 2 has substantially more impact on future volatility than a positive shock of the same size : consistent with the leverage effect.

6 GARCH-M: The Risk-Return Relationship

Standard GARCH models volatility, but doesn’t connect it to returns. The GARCH-in-Mean (GARCH-M) model (Engle, Lilien, and Robins (1987)) lets volatility enter the mean equation directly:

\[r_t = \mu + \delta \sigma_{t-1} + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t\] \[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

The parameter \(\delta\) captures the risk premium: if \(\delta > 0\), higher expected volatility leads to higher expected returns : compensation for bearing risk.

NoteInterpreting GARCH-M

A positive and significant \(\delta\) supports the risk-return trade-off from finance theory. However, empirical evidence is mixed : \(\delta\) is often insignificant or even negative at short horizons. This may reflect:

  • Time-varying risk aversion
  • Different risk horizons for different investors
  • Measurement error in conditional volatility

7 Long Memory in Volatility: IGARCH and Beyond

When \(\alpha_1 + \beta_1\) approaches 1, volatility shocks become extremely persistent. The Integrated GARCH (IGARCH) model sets \(\alpha_1 + \beta_1 = 1\) exactly:

\[\sigma^2_t = \alpha_0 + \beta_1 \sigma^2_{t-1} + (1 - \beta_1) \varepsilon^2_{t-1}\]

Tsay (2010) notes that under IGARCH, “the impact of past squared shocks… on \(\sigma^2_t\) is persistent” : shocks never fully decay. The unconditional variance does not exist.

Model Persistence Unconditional Variance Use Case
GARCH(1,1) \(\alpha_1 + \beta_1 < 1\) Exists, finite Standard applications
IGARCH(1,1) \(\alpha_1 + \beta_1 = 1\) Does not exist Very persistent volatility
FIGARCH Fractional integration Exists Long memory without unit root
WarningThe IGARCH Puzzle

Finding \(\alpha_1 + \beta_1 \approx 1\) is common empirically, but concerning theoretically : it implies volatility shocks persist forever. This may indicate:

  • Occasional level shifts in volatility (regime changes)
  • Structural breaks misinterpreted as persistence
  • Need for a regime-switching specification

8 Multivariate Volatility: DCC and Portfolio Applications

When managing portfolios, we need not just individual asset volatilities but covariances between assets. Multivariate GARCH models capture time-varying correlations.

8.1 Dynamic Conditional Correlation (DCC)

The DCC model (Engle (2002)) separates volatility dynamics from correlation dynamics:

\[r_t = \mu_t + D_t z_t\]

Where \(D_t\) is a diagonal matrix of univariate GARCH volatilities, and correlations evolve as:

\[Q_t = (1 - \theta_1 - \theta_2)\bar{Q} + \theta_1 \epsilon_{t-1}\epsilon'_{t-1} + \theta_2 Q_{t-1}\]

The correlation matrix is then \(R_t = \text{diag}(Q_t)^{-1/2} Q_t \text{diag}(Q_t)^{-1/2}\).

8.2 Application: Time-Varying Hedge Ratios

A key application of multivariate GARCH is computing optimal hedge ratios that vary over time. The minimum variance hedge ratio is:

\[h_t = \rho_t \frac{\sigma_{s,t}}{\sigma_{f,t}}\]

Where \(\rho_t\) is the conditional correlation between spot and futures returns, and \(\sigma_{s,t}, \sigma_{f,t}\) are their conditional volatilities. Brooks (2019) shows that time-varying hedge ratios from multivariate GARCH often outperform static hedges, particularly during volatile periods.

9 Practical Application: VIX and Realised Volatility

9.1 The VIX Index

The VIX (CBOE Volatility Index) measures the market’s expectation of 30-day volatility, extracted from S&P 500 option prices. It represents implied volatility : what traders expect.

Show code
# Load VIX from Bloomberg database
vix = df[df['ticker'] == 'VIX'].set_index('date')['PX_LAST']
spy_close = df[df['ticker'] == 'SPY'].set_index('date')['PX_LAST']

# Calculate 30-day realised volatility (annualised)
spy_ret = spy_close.pct_change()
realised_vol = spy_ret.rolling(30).std() * np.sqrt(252) * 100

# Align dates
common_idx = vix.index.intersection(realised_vol.dropna().index)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(common_idx, vix.loc[common_idx], label='VIX (Implied)', color='coral')
ax.plot(common_idx, realised_vol.loc[common_idx], label='30-Day Realised Vol', 
        color='steelblue', alpha=0.7)
ax.legend()
ax.set_ylabel('Volatility (%)')
ax.set_title('Implied vs Realised Volatility')
plt.tight_layout()
plt.show()
Figure 6: VIX (implied) vs SPY realised volatility

9.2 Volatility Risk Premium

Implied volatility typically exceeds realised volatility : the volatility risk premium. This reflects compensation for bearing volatility risk.

10 Bayesian Approaches to Volatility

While GARCH models are typically estimated by maximum likelihood, Bayesian methods offer several advantages for volatility modelling : particularly when dealing with parameter uncertainty, model comparison, and complex specifications.

10.1 Why Bayesian Volatility Modelling?

Challenge MLE Approach Bayesian Solution
Parameter uncertainty Point estimates + asymptotic SE Full posterior distributions
Model comparison Information criteria (AIC, BIC) Bayes factors, posterior model probabilities
Small samples Unreliable estimates Priors stabilise inference
Complex models Numerical optimisation often fails MCMC handles high dimensions

10.2 Stochastic Volatility Models

GARCH treats volatility as a deterministic function of past shocks. Stochastic volatility (SV) models treat volatility itself as a random process:

\[r_t = \exp(h_t/2) \varepsilon_t, \quad \varepsilon_t \sim N(0,1)\] \[h_t = \mu + \phi(h_{t-1} - \mu) + \sigma_\eta \eta_t, \quad \eta_t \sim N(0,1)\]

Where \(h_t = \ln(\sigma^2_t)\) is log-volatility, which follows an AR(1) process. The parameter \(\phi\) captures volatility persistence; \(\sigma_\eta\) captures volatility-of-volatility.

NoteSV vs GARCH
  • GARCH: Volatility is a known function of past data; likelihood is tractable
  • SV: Volatility is a latent (unobserved) process; requires MCMC or particle filters

SV models often fit financial data better than GARCH (lower in-sample MSE), but GARCH is easier to estimate and produces similar forecasts. The choice depends on whether you need full uncertainty quantification.

10.3 Bayesian GARCH

Even standard GARCH can benefit from Bayesian treatment:

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

With priors: \[\alpha_0 \sim \text{Gamma}(\cdot), \quad \alpha_1, \beta_1 \sim \text{Beta}(\cdot)\]

The Beta priors on \(\alpha_1\) and \(\beta_1\) automatically enforce the constraint \(0 \leq \alpha_1, \beta_1 \leq 1\). A prior favouring \(\alpha_1 + \beta_1\) close to (but less than) 1 incorporates the stylised fact that volatility is highly persistent.

10.4 Practical Benefits

Bayesian approaches to volatility offer three key practical benefits. First, uncertainty propagation: when forecasting volatility, Bayesian methods produce predictive distributions rather than point forecasts, which is essential for risk management since the uncertainty in your volatility estimate should feed into your VaR calculation. Second, shrinkage and regularisation: Bayesian priors prevent extreme parameter estimates that can arise with MLE in small samples, analogous to ridge regression stabilising coefficient estimates. Third, model averaging: rather than selecting a single model, Bayesian model averaging weights predictions from multiple models by their posterior probability, hedging against model misspecification.

TipThe Pragmatic View

For routine volatility forecasting, MLE-GARCH remains the workhorse. Bayesian methods add value when:

  • You need full uncertainty quantification for downstream decisions
  • You’re combining volatility estimates with other uncertain inputs
  • You’re estimating complex models where MLE struggles
  • You want principled model comparison across specifications

11 Time-Varying Parameter Models

So far, we’ve modelled volatility as varying while treating other parameters (mean, coefficients) as constant. But what if the relationship between variables changes over time? State-space models and the Kalman filter provide a general framework for time-varying parameters.

11.1 The State-Space Framework

A state-space model separates what we observe from what we want to estimate:

Measurement equation (what we see): \[y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t)\]

Transition equation (how states evolve): \[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t)\]

Where: - \(y_t\) is the observed variable (e.g., returns) - \(\alpha_t\) is the state vector (e.g., time-varying beta) - \(Z_t, T_t\) are system matrices defining the model structure

Tsay (2010) notes: “The Kalman filter is a recursive algorithm for computing the optimal estimator of the state vector at time \(t\) based on information available at time \(t\).”

11.2 Application: Time-Varying Beta (CAPM)

The static CAPM assumes beta is constant: \[r_{i,t} - r_f = \alpha + \beta (r_{m,t} - r_f) + \varepsilon_t\]

But betas change as firms’ risk profiles evolve. A state-space formulation allows beta to drift:

Measurement: \(r_{i,t} = \alpha_t + \beta_t r_{m,t} + \varepsilon_t\)

State transition: \(\beta_{t+1} = \beta_t + \eta_t\)

This is a random walk specification for beta : each period’s beta equals last period’s plus noise. The Kalman filter optimally tracks the evolving beta given the noisy observations.

Approach Assumption Use Case
Static regression \(\beta\) constant forever Long-run average exposure
Rolling window \(\beta\) constant within window Simple adaptation
Kalman filter \(\beta\) evolves as random walk Optimal tracking
Regime switching \(\beta\) jumps between states Discrete regime changes

11.3 The Kalman Filter: Predict and Update

The Kalman filter operates in two steps. First, predict: use the transition equation to forecast the state and its uncertainty. Second, update: when new data arrive, combine the forecast with the observation. The Kalman gain \(K_t\) determines how much weight to give new information versus the forecast : it’s automatically higher when the forecast is uncertain or the observation is precise.

NoteConnection to Bayesian Inference

The Kalman filter is Bayesian inference for linear-Gaussian state-space models:

  • The prior is the predicted state distribution
  • The likelihood comes from the measurement equation
  • The posterior is the updated (filtered) state distribution

For non-linear or non-Gaussian models, extensions like particle filters or MCMC are needed.

11.4 Practical Applications in Finance

Application State Variable What It Captures
Time-varying beta \(\beta_t\) Evolving systematic risk
Hedging Hedge ratio \(h_t\) Changing spot-futures relationship
Factor models Factor loadings Rotating style exposures
Volatility \(\ln(\sigma^2_t)\) Stochastic volatility (SV models)
Local trend \(\mu_t\) Slowly varying expected return

11.5 When to Use State-Space Models

State-space models offer several advantages: they handle missing data naturally, provide optimal filtering for signal extraction, quantify uncertainty at each time step, and are flexible enough that many models emerge as special cases. However, they also present challenges: model specification (choosing \(T_t\), \(Q_t\)) requires careful thought, implementation is more complex than GARCH, and computational costs increase substantially for high-dimensional state vectors.

ImportantThe Big Picture

State-space models, regime switching, and GARCH all address the same fundamental problem: financial relationships are not static. They differ in how they model non-stationarity:

  • GARCH: Variance changes, but the structure is constant
  • Regime switching: Parameters jump between discrete states
  • State-space: Parameters drift continuously

The best choice depends on whether you believe changes are gradual (state-space), abrupt (regime), or only affect volatility (GARCH).

12 Structural Breaks and Regime Switching

GARCH models assume that the parameters governing volatility dynamics remain constant over time. But financial markets undergo episodes where behaviour changes dramatically : what Brooks (2019) calls “very substantial changes in the properties of a series.” These changes may be one-off structural breaks or recurring regime switches.

12.1 What Causes Structural Breaks?

Structural breaks typically result from large-scale events:

Type Examples Effect on Volatility
Policy changes Introduction of inflation targeting, QE programmes May reduce or increase baseline volatility
Market microstructure Electronic trading (“Big Bang”), decimal pricing Often reduces transaction-related volatility
Financial crises 2008 GFC, COVID-19 crash Dramatic volatility regime shift
Regulatory changes Basel requirements, Dodd-Frank May alter risk-taking behaviour
ImportantWhy Breaks Matter

A linear model (including GARCH) estimated over a sample containing a structural break will be misspecified. The estimated parameters will be a weighted average of the true parameters in each regime : accurate for neither.

12.2 Testing for Structural Breaks

The Chow test is the classical approach: split the sample at a suspected break point and test whether the parameters differ significantly between sub-samples. However, this requires knowing when the break occurred.

More sophisticated approaches include:

  • CUSUM tests: Monitor cumulative sums of recursive residuals for evidence of parameter instability
  • Bai-Perron tests: Detect multiple unknown break points
  • Andrews-Ploberger tests: Test for breaks at unknown dates
Show code
# Calculate rolling 60-day volatility
spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
spy_df['return'] = spy_df['PX_LAST'].pct_change()
spy_df['rolling_vol_60'] = spy_df['return'].rolling(60).std() * np.sqrt(252) * 100

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(spy_df['rolling_vol_60'].dropna(), color='steelblue', linewidth=1)
ax.axhline(spy_df['rolling_vol_60'].mean(), color='coral', linestyle='--', 
           label=f'Mean: {spy_df["rolling_vol_60"].mean():.1f}%')
ax.set_ylabel('Annualised Volatility (%)')
ax.set_title('Rolling 60-Day Volatility: Evidence of Regime Changes?')
ax.legend()
plt.tight_layout()
plt.show()
Figure 7: Visual inspection for structural breaks in SPY volatility

12.3 Markov Switching Models

Rather than treating regime changes as one-off events, Markov switching models allow the series to switch between regimes probabilistically. Developed by Hamilton (1989), these models specify that an unobserved state variable \(z_t\) governs which regime is active.

For a simple two-state model:

\[y_t = \begin{cases} \mu_1 + \phi_1 y_{t-1} + \sigma_1 u_t & \text{if } z_t = 1 \text{ (calm regime)} \\ \mu_2 + \phi_2 y_{t-1} + \sigma_2 u_t & \text{if } z_t = 2 \text{ (turbulent regime)} \end{cases}\]

The state evolves according to transition probabilities:

\[P = \begin{pmatrix} p_{11} & 1 - p_{11} \\ 1 - p_{22} & p_{22} \end{pmatrix}\]

Where \(p_{11}\) is the probability of staying in regime 1 given we were in regime 1, and \(p_{22}\) is the probability of staying in regime 2.

NoteThe Markov Property

A process has the Markov property if the probability of being in a given state depends only on the previous state, not on the entire history. This makes estimation tractable while still capturing regime persistence.

12.4 Threshold Autoregressive (TAR) Models

An alternative approach makes regime switches deterministic rather than probabilistic. In a Self-Exciting Threshold AutoRegressive (SETAR) model, the regime depends on whether a threshold variable exceeds a critical value:

\[y_t = \begin{cases} \phi^{(1)}_0 + \phi^{(1)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} \leq c \\ \phi^{(2)}_0 + \phi^{(2)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} > c \end{cases}\]

Where \(c\) is the threshold and \(d\) is the delay parameter.

Model Type Regime Determination Estimation Use Case
Markov Switching Probabilistic (latent state) MLE with Hamilton filter When switching is random/unpredictable
SETAR Deterministic (observable variable) NLS with grid search When switching depends on observed values

12.5 Regime-Switching GARCH

Combining regime switching with GARCH captures both volatility clustering within regimes and shifts between regimes:

\[\sigma^2_t = \begin{cases} \alpha_0^{(1)} + \alpha_1^{(1)} \varepsilon^2_{t-1} + \beta_1^{(1)} \sigma^2_{t-1} & \text{if calm regime} \\ \alpha_0^{(2)} + \alpha_1^{(2)} \varepsilon^2_{t-1} + \beta_1^{(2)} \sigma^2_{t-1} & \text{if turbulent regime} \end{cases}\]

This is particularly useful for modelling financial crises, where not only the level of volatility changes but also its dynamics.

13 From Classical Models to Sequence Learning

The models discussed so far : GARCH, Markov switching, TAR : represent the classical econometric approach to time-varying volatility. Modern machine learning offers powerful extensions through sequence models that can capture more complex temporal dependencies.

13.1 The Conceptual Bridge

Consider the parallel between classical and modern approaches:

Classical Concept ML Extension Key Advance
AR(p) process Recurrent Neural Network (RNN) Non-linear dependencies, learned representations
Markov switching Hidden Markov Model (HMM) More flexible state dynamics
GARCH LSTM/GRU networks Long-range memory without explicit structure
Regime detection Change point detection Automated, data-driven identification

13.2 Recurrent Neural Networks as Non-Linear AR Models

A recurrent neural network (RNN) extends the autoregressive framework to allow non-linear dependencies:

\[h_t = f(W_h h_{t-1} + W_x x_t + b)\] \[y_t = g(W_y h_t + c)\]

Where \(h_t\) is a hidden state that captures information from the entire history. Dixon, Halperin, and Bilokon (2020) describe RNNs as “non-linear time series models [that] generalise classical linear time series models such as AR(p).”

TipWhy This Matters for Finance

RNNs can learn complex patterns in volatility that GARCH cannot capture:

  • Non-linear responses to shocks
  • Interactions between multiple assets
  • Regime-dependent dynamics (without pre-specifying regimes)

13.3 Long Short-Term Memory (LSTM)

Standard RNNs struggle with long-range dependencies : the vanishing gradient problem. LSTM networks address this with gated mechanisms that control information flow: a forget gate (what to discard from memory), an input gate (what new information to store), and an output gate (what to reveal to the next layer).

This architecture is particularly relevant for financial time series where volatility shocks can have persistent effects, market regimes may last months or years, and multiple timescales interact (daily noise, weekly patterns, monthly cycles).

13.4 Practical Implications

The connection between classical econometrics and modern ML is not merely theoretical. Consider four key trade-offs. Interpretability versus flexibility: GARCH parameters have direct economic meaning (persistence, reaction to news), whereas neural networks offer flexibility but less transparency. Data requirements: GARCH can be estimated reliably with hundreds of observations, but deep learning typically requires thousands or millions. Out-of-sample performance: simple GARCH often outperforms complex ML models for volatility forecasting : a reminder that more parameters does not equal better predictions. Finally, hybrid approaches that combine GARCH-type structure with neural network flexibility (such as Neural GARCH) may offer the best of both worlds.

ImportantThe Fundamental Challenge

Whether using GARCH or LSTM, the core challenge remains: volatility is unobservable. We estimate it from returns, but we never see the “true” volatility to evaluate our models against. This is why comparing implied vs realised volatility, and testing forecasts against future squared returns, remains essential.

14 Summary

Concept Key Point
Volatility clustering Large returns follow large returns
Fat tails Extreme events more common than normal
Leverage effect Bad news increases volatility more than good news
ARCH Conditional variance depends on past shocks
GARCH(1,1) Adds persistence; often sufficient
Asymmetric GARCH GJR-GARCH, EGARCH capture leverage
News impact curve Visualises asymmetric volatility response
GARCH-M Volatility in mean equation; risk premium
IGARCH Unit-root persistence; unconditional variance undefined
DCC Dynamic correlations for multivariate volatility
Stochastic volatility Volatility as latent process; requires MCMC
Bayesian GARCH Full uncertainty quantification; regularisation
State-space models Time-varying parameters via Kalman filter
Time-varying beta Evolving systematic risk exposure
Structural breaks One-off changes in series behaviour
Markov switching Probabilistic regime changes (Hamilton)
TAR/SETAR Deterministic threshold-based regimes
Sequence learning RNN/LSTM as non-linear AR extensions
VIX Market’s expectation of future volatility
NoteWhat’s Next?

In the lab, you’ll estimate GARCH models on the Bloomberg data, compare implied vs realised volatility, explore whether asymmetric models improve forecasts, and investigate evidence of regime changes.

15 References

Bekaert, Geert, and Guojun Wu. 2000. “Asymmetric Volatility and Risk in Equity Markets.” Review of Financial Studies 13 (1): 1–42. https://doi.org/10.1093/rfs/13.1.1.
Black, Fischer. 1976. “Studies of Stock Price Volatility Changes.” Proceedings of the 1976 Meetings of the American Statistical Association, Business and Economics Statistics Section, 177–81.
Bollerslev, Tim. 1986. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics 31 (3): 307–27. https://doi.org/10.1016/0304-4076(86)90063-1.
Brooks, Chris. 2019. Introductory Econometrics for Finance. 4th ed. Cambridge, UK: Cambridge University Press.
Campbell, John Y., and Ludger Hentschel. 1992. “No News Is Good News: An Asymmetric Model of Changing Volatility in Stock Returns.” Journal of Financial Economics 31 (3): 281–318. https://doi.org/10.1016/0304-405X(92)90037-X.
Dixon, Matthew F., Igor Halperin, and Paul Bilokon. 2020. Machine Learning in Finance: From Theory to Practice. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-41068-1.
Engle, Robert F. 1982. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (4): 987–1007. https://doi.org/10.2307/1912773.
———. 2002. “Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Conditional Heteroskedasticity Models.” Journal of Business and Economic Statistics 20 (3): 339–50. https://doi.org/10.1198/073500102288618487.
Engle, Robert F., David M. Lilien, and Russell P. Robins. 1987. “Estimating Time Varying Risk Premia in the Term Structure: The ARCH-m Model.” Econometrica 55 (2): 391–407. https://doi.org/10.2307/1913242.
Glosten, Lawrence R., Ravi Jagannathan, and David E. Runkle. 1993. “On the Relation Between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” Journal of Finance 48 (5): 1779–1801. https://doi.org/10.1111/j.1540-6261.1993.tb05128.x.
Nelson, Daniel B. 1991. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica 59 (2): 347–70. https://doi.org/10.2307/2938260.
Pagan, Adrian R., and G. William Schwert. 1990. “Alternative Models for Conditional Stock Volatility.” Journal of Econometrics 45 (1-2): 267–90. https://doi.org/10.1016/0304-4076(90)90101-X.
Tsay, Ruey S. 2010. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: Wiley.