Volatility Modelling

From Stylised Facts to GARCH

Author

Barry Quinn

Published

March 27, 2026

1 Introduction

Volatility : the tendency of asset prices to fluctuate : is one of the most important concepts in finance. It matters for risk management, option pricing, portfolio construction, and regulatory capital. Yet volatility is not directly observable; we must estimate it from price data.

Why this chapter matters: In the Foundations chapter, we introduced the Three Prediction Problems in finance: predicting the mean (returns), the variance (volatility), and cross-sectional variation (which assets outperform). We saw that returns have almost no predictable signal (~1-2% R²), making ARIMA largely useless for return prediction. Volatility is different. The conditional variance is genuinely predictable (~15-40% R²), and this predictability has direct economic value for options pricing, risk management, and portfolio allocation. This chapter focuses on the second prediction problem: where the signal actually exists.

This chapter develops your understanding of volatility from three perspectives: empirically, by examining what patterns we observe in financial return volatility; theoretically, by showing how ARCH and GARCH models capture these patterns; and practically, by demonstrating how to estimate, forecast, and apply volatility models. By the end of this chapter, you will understand why GARCH(1,1) has become the workhorse model for volatility in finance, and when more sophisticated approaches are needed.

View Slides

Open the lecture deck: Week 4: Volatility Modelling

Connection to Week 3

This chapter builds directly on the time series foundations from Week 3. There you learned that the ACF of returns is near zero (no signal in the mean), but the ACF of squared returns shows strong persistence. That persistence is volatility clustering: and modelling it is what this chapter is about.

2 Stylised Facts of Financial Volatility

Before building models, we must understand what we’re trying to capture. Financial return volatility exhibits several well-documented patterns : stylised facts : that any good model should reproduce.

2.1 Volatility Clustering

Perhaps the most striking feature of financial returns is volatility clustering: large returns (positive or negative) tend to be followed by large returns, and small returns tend to be followed by small returns.

Tsay (2010) describes this as: “Volatility is not constant over time. There are periods of high volatility alternating with periods of relative calm.”

This pattern is visible in virtually every financial time series. The figure below demonstrates clustering in the Bloomberg equity data:

Show code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load Bloomberg database
df = load_bloomberg()

# Get SPY returns
spy = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
returns = spy['PX_LAST'].pct_change().dropna()

# Plot returns
fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True)

# Returns
axes[0].plot(returns.index, returns.values, linewidth=0.5, color='steelblue')
axes[0].axhline(0, color='gray', linestyle='--', linewidth=0.5)
axes[0].set_ylabel('Daily Return')
axes[0].set_title('SPY Daily Returns: Volatility Clustering')

# Absolute returns (proxy for volatility)
axes[1].plot(returns.index, np.abs(returns.values), linewidth=0.5, color='coral')
axes[1].set_ylabel('|Return|')
axes[1].set_xlabel('Date')
axes[1].set_title('Absolute Returns: Clusters of High and Low Volatility')

plt.tight_layout()
plt.show()

Figure 1: Volatility clustering in SPY returns: Large moves cluster together

The autocorrelation structure confirms this pattern: while returns themselves show little autocorrelation (consistent with market efficiency), squared returns are highly autocorrelated:

Show code

from statsmodels.graphics.tsaplots import plot_acf

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# ACF of returns
plot_acf(returns.dropna(), ax=axes[0], lags=20, title='ACF: Returns')

# ACF of squared returns
plot_acf(returns.dropna()**2, ax=axes[1], lags=20, title='ACF: Squared Returns')

plt.tight_layout()
plt.show()

Figure 2: Returns show no autocorrelation; squared returns do

2.2 Fat Tails (Leptokurtosis)

Financial returns consistently exhibit fatter tails than the normal distribution predicts. Extreme events : crashes, spikes : occur far more frequently than a Gaussian model would suggest.

Tsay (2010) notes: “Kurtosis often exceeds three (the kurtosis of a normal distribution), and often exceeds three by a substantial margin.”

Show code

from scipy import stats

# Calculate statistics for multiple assets
tickers = ['AAPL', 'GOOGL', 'MSFT', 'SPY']
results = []

for ticker in tickers:
    asset = df[df['ticker'] == ticker].set_index('date')['PX_LAST']
    ret = asset.pct_change().dropna()
    
    results.append({
        'Ticker': ticker,
        'Mean (%)': f"{ret.mean()*100:.3f}",
        'Std (%)': f"{ret.std()*100:.2f}",
        'Skewness': f"{stats.skew(ret):.2f}",
        'Kurtosis': f"{stats.kurtosis(ret):.2f}",
        'Normal?': 'Yes' if stats.kurtosis(ret) < 1 else 'No'
    })

pd.DataFrame(results)

Table 1: Financial returns exhibit excess kurtosis

	Ticker	Mean (%)	Std (%)	Skewness	Kurtosis	Normal?
0	AAPL	0.119	1.92	-0.00	5.10	No
1	GOOGL	0.090	1.94	-0.06	3.87	No
2	MSFT	0.107	1.82	-0.02	6.76	No
3	SPY	0.052	1.23	-0.55	11.50	No

2.3 The Leverage Effect

Volatility responds asymmetrically to returns: negative returns tend to increase volatility more than positive returns of the same magnitude. This leverage effect was first documented by Black (1976).

Several theoretical explanations have been proposed:

Theory	Mechanism	Key Insight
Leverage hypothesis	When prices fall, debt/equity ratio rises mechanically	Firms become riskier → volatility increases
Volatility feedback	If volatility is priced, expected volatility increase raises required return	Causation runs both ways: high volatility → lower prices
Risk premium channel	Higher expected volatility demands higher risk premium	Current price falls to deliver higher expected return
Behavioural asymmetry	Investors react more strongly to losses (prospect theory)	Bad news generates more trading, uncertainty
Margin constraints	Downturns trigger margin calls, forced selling	Amplification through deleveraging cascades

The leverage hypothesis is the most cited explanation, but Campbell and Hentschel (1992) and Bekaert and Wu (2000) show that volatility feedback may be equally or more important. In practice, all mechanisms likely operate simultaneously, reinforcing the asymmetric pattern.

Show code

# Calculate rolling volatility and lagged returns
spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
spy_df['return'] = spy_df['PX_LAST'].pct_change()
spy_df['rolling_vol'] = spy_df['return'].rolling(20).std()
spy_df['lagged_return'] = spy_df['return'].shift(1)

# Bin by lagged return
spy_clean = spy_df.dropna()
spy_clean['return_bin'] = pd.qcut(spy_clean['lagged_return'], q=5, labels=['Very Neg', 'Neg', 'Neutral', 'Pos', 'Very Pos'])

# Plot
fig, ax = plt.subplots(figsize=(8, 5))
spy_clean.groupby('return_bin')['rolling_vol'].mean().plot(kind='bar', ax=ax, color='steelblue')
ax.set_xlabel('Previous Day Return')
ax.set_ylabel('Average 20-Day Volatility')
ax.set_title('Leverage Effect: Negative Returns Lead to Higher Volatility')
ax.tick_params(axis='x', rotation=0)
plt.tight_layout()
plt.show()

Figure 3: Negative returns increase volatility more than positive returns

3 The ARCH Model

3.1 Motivation: Conditional Heteroskedasticity

Classical econometrics assumes homoskedasticity : constant variance of errors. But volatility clustering tells us this assumption fails for financial data. The variance is not constant; it changes over time in predictable ways.

The key insight of Engle (1982) was to distinguish between unconditional and conditional variance:

Unconditional variance: The long-run average variance (constant)
Conditional variance: The variance right now, given what we know (time-varying)

Engle’s Key Insight

While the unconditional variance of returns may be constant, the conditional variance : what we expect given recent information : varies over time in a way we can model.

3.2 The ARCH(q) Specification

The AutoRegressive Conditional Heteroskedasticity model specifies:

\[r_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim N(0,1)\]

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \alpha_2 \varepsilon^2_{t-2} + \cdots + \alpha_q \varepsilon^2_{t-q}\]

Where:

$r_t$ is the return at time $t$
$\sigma^2_t$ is the conditional variance at time $t$
$\varepsilon^2_{t-1}, \ldots, \varepsilon^2_{t-q}$ are past squared shocks

Interpretation: Today’s volatility depends on recent surprises. If yesterday had a large shock (positive or negative), today’s volatility will be elevated.

4 The GARCH Model

4.1 From ARCH to GARCH

Bollerslev (1986) extended ARCH by including lagged conditional variances:

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

This is GARCH(1,1) : one ARCH term, one GARCH term. Brooks (2019) notes:

“A GARCH(1,1) model will be sufficient to capture the volatility clustering in the data, and rarely is any higher order model estimated or even entertained in the academic finance literature.”

4.2 Parameter Interpretation

Parameter	Name	Interpretation
$\alpha_0$	Constant	Long-run variance floor
$\alpha_1$	ARCH term	Reaction to recent shocks
$\beta_1$	GARCH term	Persistence of volatility
$\alpha_1 + \beta_1$	Persistence	How long shocks affect volatility

Rule of Thumb

For most financial assets, $\alpha_1 + \beta_1$ is close to (but less than) 1, meaning volatility shocks are highly persistent. Values above 0.9 are typical.

4.3 Why GARCH(1,1) Often Suffices

GARCH(1,1) is remarkably effective for three reasons: parsimony (only 3 parameters capture complex volatility dynamics), memory (the recursive structure implicitly uses the entire history), and mean reversion (volatility eventually returns to its long-run level).

Show code

from arch import arch_model

# Fit GARCH(1,1) to SPY
returns_pct = returns * 100  # Scale for numerical stability

model = arch_model(returns_pct.dropna(), vol='Garch', p=1, q=1, mean='Constant')
result = model.fit(disp='off')

# Extract conditional volatility
cond_vol = result.conditional_volatility

# Plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(returns_pct.dropna().index, np.abs(returns_pct.dropna().values), 
        alpha=0.3, label='|Returns|', color='steelblue')
ax.plot(cond_vol.index, cond_vol.values, 
        label='GARCH(1,1) Conditional Volatility', color='coral', linewidth=1.5)
ax.legend()
ax.set_ylabel('Volatility (%)')
ax.set_title('GARCH(1,1) Fitted Volatility vs Absolute Returns')
plt.tight_layout()
plt.show()

print(result.summary().tables[1])

Figure 4

4.4 Stationarity Conditions

For GARCH(1,1) to be covariance stationary, we require:

\[\alpha_1 + \beta_1 < 1\]

When this holds, the unconditional variance exists:

\[\sigma^2 = \frac{\alpha_0}{1 - \alpha_1 - \beta_1}\]

If $\alpha_1 + \beta_1 = 1$, we have Integrated GARCH (IGARCH) : shocks persist forever.

5 Asymmetric GARCH Models

Standard GARCH treats positive and negative shocks symmetrically. But the leverage effect suggests this is wrong. Several extensions address this:

5.1 GJR-GARCH

Glosten, Jagannathan, and Runkle (1993) add an indicator for negative shocks:

\[\sigma^2_t = \alpha_0 + (\alpha_1 + \gamma I_{t-1}) \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

Where $I_{t-1} = 1$ if $\varepsilon_{t-1} < 0$ (negative shock). The parameter $\gamma$ captures the additional impact of bad news.

5.2 EGARCH

Nelson (1991) use logarithms to ensure positivity:

\[\ln(\sigma^2_t) = \alpha_0 + \alpha_1 \left( \frac{|\varepsilon_{t-1}|}{\sigma_{t-1}} - \sqrt{2/\pi} \right) + \gamma \frac{\varepsilon_{t-1}}{\sigma_{t-1}} + \beta_1 \ln(\sigma^2_{t-1})\]

The $\gamma$ term directly captures asymmetry.

5.3 Visualising Asymmetry: News Impact Curves

The news impact curve (Pagan and Schwert (1990)) provides a powerful way to visualise how different shocks affect future volatility. It plots next-period volatility against various values of the previous shock, holding other factors constant.

For a symmetric GARCH model, the curve is a parabola centred at zero : positive and negative shocks of equal magnitude have identical effects. For asymmetric models (GJR, EGARCH), the curve tilts, showing that negative shocks have larger impact.

Show code

# Simulated news impact curves for illustration
shocks = np.linspace(-3, 3, 100)

# Assume typical parameters
alpha0, alpha1, beta1 = 0.00001, 0.08, 0.90
gamma = 0.10  # Asymmetry parameter for GJR

# GARCH: symmetric response
garch_vol = alpha0 + alpha1 * shocks**2

# GJR: asymmetric - larger response to negative shocks
gjr_vol = alpha0 + (alpha1 + gamma * (shocks < 0)) * shocks**2

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(shocks, garch_vol * 10000, label='GARCH (symmetric)', color='steelblue', linewidth=2)
ax.plot(shocks, gjr_vol * 10000, label='GJR (asymmetric)', color='coral', linewidth=2)
ax.axvline(0, color='gray', linestyle='--', linewidth=0.5)
ax.set_xlabel('Previous Shock (standardised)')
ax.set_ylabel('Next-Period Variance (×10⁴)')
ax.set_title('News Impact Curves: Asymmetry in Volatility Response')
ax.legend()
plt.tight_layout()
plt.show()

Figure 5: News impact curves: GARCH (symmetric) vs GJR (asymmetric)

The news impact curve reveals that under GJR, a negative shock of magnitude 2 has substantially more impact on future volatility than a positive shock of the same size : consistent with the leverage effect.

6 GARCH-M: The Risk-Return Relationship

Standard GARCH models volatility, but doesn’t connect it to returns. The GARCH-in-Mean (GARCH-M) model (Engle, Lilien, and Robins (1987)) lets volatility enter the mean equation directly:

\[r_t = \mu + \delta \sigma_{t-1} + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t\] \[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

The parameter $\delta$ captures the risk premium: if $\delta > 0$, higher expected volatility leads to higher expected returns : compensation for bearing risk.

Interpreting GARCH-M

A positive and significant $\delta$ supports the risk-return trade-off from finance theory. However, empirical evidence is mixed : $\delta$ is often insignificant or even negative at short horizons. This may reflect:

Time-varying risk aversion
Different risk horizons for different investors
Measurement error in conditional volatility

7 Long Memory in Volatility: IGARCH and Beyond

When $\alpha_1 + \beta_1$ approaches 1, volatility shocks become extremely persistent. The Integrated GARCH (IGARCH) model sets $\alpha_1 + \beta_1 = 1$ exactly:

\[\sigma^2_t = \alpha_0 + \beta_1 \sigma^2_{t-1} + (1 - \beta_1) \varepsilon^2_{t-1}\]

Tsay (2010) notes that under IGARCH, “the impact of past squared shocks… on $\sigma^2_t$ is persistent” : shocks never fully decay. The unconditional variance does not exist.

Model	Persistence	Unconditional Variance	Use Case
GARCH(1,1)	$\alpha_1 + \beta_1 < 1$	Exists, finite	Standard applications
IGARCH(1,1)	$\alpha_1 + \beta_1 = 1$	Does not exist	Very persistent volatility
FIGARCH	Fractional integration	Exists	Long memory without unit root

The IGARCH Puzzle

Finding $\alpha_1 + \beta_1 \approx 1$ is common empirically, but concerning theoretically : it implies volatility shocks persist forever. This may indicate:

Occasional level shifts in volatility (regime changes)
Structural breaks misinterpreted as persistence
Need for a regime-switching specification

8 Multivariate Volatility: DCC and Portfolio Applications

When managing portfolios, we need not just individual asset volatilities but covariances between assets. Multivariate GARCH models capture time-varying correlations.

8.1 Dynamic Conditional Correlation (DCC)

The DCC model (Engle (2002)) separates volatility dynamics from correlation dynamics:

\[r_t = \mu_t + D_t z_t\]

Where $D_t$ is a diagonal matrix of univariate GARCH volatilities, and correlations evolve as:

\[Q_t = (1 - \theta_1 - \theta_2)\bar{Q} + \theta_1 \epsilon_{t-1}\epsilon'_{t-1} + \theta_2 Q_{t-1}\]

The correlation matrix is then $R_t = \text{diag}(Q_t)^{-1/2} Q_t \text{diag}(Q_t)^{-1/2}$.

8.2 Application: Time-Varying Hedge Ratios

A key application of multivariate GARCH is computing optimal hedge ratios that vary over time. The minimum variance hedge ratio is:

\[h_t = \rho_t \frac{\sigma_{s,t}}{\sigma_{f,t}}\]

Where $\rho_t$ is the conditional correlation between spot and futures returns, and $\sigma_{s,t}, \sigma_{f,t}$ are their conditional volatilities. Brooks (2019) shows that time-varying hedge ratios from multivariate GARCH often outperform static hedges, particularly during volatile periods.

9 Practical Application: VIX and Realised Volatility

9.1 The VIX Index

The VIX (CBOE Volatility Index) measures the market’s expectation of 30-day volatility, extracted from S&P 500 option prices. It represents implied volatility : what traders expect.

Show code

# Load VIX from Bloomberg database
vix = df[df['ticker'] == 'VIX'].set_index('date')['PX_LAST']
spy_close = df[df['ticker'] == 'SPY'].set_index('date')['PX_LAST']

# Calculate 30-day realised volatility (annualised)
spy_ret = spy_close.pct_change()
realised_vol = spy_ret.rolling(30).std() * np.sqrt(252) * 100

# Align dates
common_idx = vix.index.intersection(realised_vol.dropna().index)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(common_idx, vix.loc[common_idx], label='VIX (Implied)', color='coral')
ax.plot(common_idx, realised_vol.loc[common_idx], label='30-Day Realised Vol', 
        color='steelblue', alpha=0.7)
ax.legend()
ax.set_ylabel('Volatility (%)')
ax.set_title('Implied vs Realised Volatility')
plt.tight_layout()
plt.show()

Figure 6: VIX (implied) vs SPY realised volatility

9.2 Volatility Risk Premium

Implied volatility typically exceeds realised volatility : the volatility risk premium. This reflects compensation for bearing volatility risk.

10 Bayesian Approaches to Volatility

While GARCH models are typically estimated by maximum likelihood, Bayesian methods offer several advantages for volatility modelling : particularly when dealing with parameter uncertainty, model comparison, and complex specifications.

10.1 Why Bayesian Volatility Modelling?

Challenge	MLE Approach	Bayesian Solution
Parameter uncertainty	Point estimates + asymptotic SE	Full posterior distributions
Model comparison	Information criteria (AIC, BIC)	Bayes factors, posterior model probabilities
Small samples	Unreliable estimates	Priors stabilise inference
Complex models	Numerical optimisation often fails	MCMC handles high dimensions

10.2 Stochastic Volatility Models

GARCH treats volatility as a deterministic function of past shocks. Stochastic volatility (SV) models treat volatility itself as a random process:

\[r_t = \exp(h_t/2) \varepsilon_t, \quad \varepsilon_t \sim N(0,1)\] \[h_t = \mu + \phi(h_{t-1} - \mu) + \sigma_\eta \eta_t, \quad \eta_t \sim N(0,1)\]

Where $h_t = \ln(\sigma^2_t)$ is log-volatility, which follows an AR(1) process. The parameter $\phi$ captures volatility persistence; $\sigma_\eta$ captures volatility-of-volatility.

SV vs GARCH

GARCH: Volatility is a known function of past data; likelihood is tractable
SV: Volatility is a latent (unobserved) process; requires MCMC or particle filters

SV models often fit financial data better than GARCH (lower in-sample MSE), but GARCH is easier to estimate and produces similar forecasts. The choice depends on whether you need full uncertainty quantification.

10.3 Bayesian GARCH

Even standard GARCH can benefit from Bayesian treatment:

\[\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}\]

With priors: \[\alpha_0 \sim \text{Gamma}(\cdot), \quad \alpha_1, \beta_1 \sim \text{Beta}(\cdot)\]

The Beta priors on $\alpha_1$ and $\beta_1$ automatically enforce the constraint $0 \leq \alpha_1, \beta_1 \leq 1$. A prior favouring $\alpha_1 + \beta_1$ close to (but less than) 1 incorporates the stylised fact that volatility is highly persistent.

10.4 Practical Benefits

Bayesian approaches to volatility offer three key practical benefits. First, uncertainty propagation: when forecasting volatility, Bayesian methods produce predictive distributions rather than point forecasts, which is essential for risk management since the uncertainty in your volatility estimate should feed into your VaR calculation. Second, shrinkage and regularisation: Bayesian priors prevent extreme parameter estimates that can arise with MLE in small samples, analogous to ridge regression stabilising coefficient estimates. Third, model averaging: rather than selecting a single model, Bayesian model averaging weights predictions from multiple models by their posterior probability, hedging against model misspecification.

The Pragmatic View

For routine volatility forecasting, MLE-GARCH remains the workhorse. Bayesian methods add value when:

You need full uncertainty quantification for downstream decisions
You’re combining volatility estimates with other uncertain inputs
You’re estimating complex models where MLE struggles
You want principled model comparison across specifications

11 Time-Varying Parameter Models

So far, we’ve modelled volatility as varying while treating other parameters (mean, coefficients) as constant. But what if the relationship between variables changes over time? State-space models and the Kalman filter provide a general framework for time-varying parameters.

11.1 The State-Space Framework

A state-space model separates what we observe from what we want to estimate:

Measurement equation (what we see): \[y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t)\]

Transition equation (how states evolve): \[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t)\]

Where: - $y_t$ is the observed variable (e.g., returns) - $\alpha_t$ is the state vector (e.g., time-varying beta) - $Z_t, T_t$ are system matrices defining the model structure

Tsay (2010) notes: “The Kalman filter is a recursive algorithm for computing the optimal estimator of the state vector at time $t$ based on information available at time $t$.”

11.2 Application: Time-Varying Beta (CAPM)

The static CAPM assumes beta is constant: \[r_{i,t} - r_f = \alpha + \beta (r_{m,t} - r_f) + \varepsilon_t\]

But betas change as firms’ risk profiles evolve. A state-space formulation allows beta to drift:

Measurement: $r_{i,t} = \alpha_t + \beta_t r_{m,t} + \varepsilon_t$

State transition: $\beta_{t+1} = \beta_t + \eta_t$

This is a random walk specification for beta : each period’s beta equals last period’s plus noise. The Kalman filter optimally tracks the evolving beta given the noisy observations.

Approach	Assumption	Use Case
Static regression	$\beta$ constant forever	Long-run average exposure
Rolling window	$\beta$ constant within window	Simple adaptation
Kalman filter	$\beta$ evolves as random walk	Optimal tracking
Regime switching	$\beta$ jumps between states	Discrete regime changes

11.3 The Kalman Filter: Predict and Update

The Kalman filter operates in two steps. First, predict: use the transition equation to forecast the state and its uncertainty. Second, update: when new data arrive, combine the forecast with the observation. The Kalman gain $K_t$ determines how much weight to give new information versus the forecast : it’s automatically higher when the forecast is uncertain or the observation is precise.

Connection to Bayesian Inference

The Kalman filter is Bayesian inference for linear-Gaussian state-space models:

The prior is the predicted state distribution
The likelihood comes from the measurement equation
The posterior is the updated (filtered) state distribution

For non-linear or non-Gaussian models, extensions like particle filters or MCMC are needed.

11.4 Practical Applications in Finance

Application	State Variable	What It Captures
Time-varying beta	$\beta_t$	Evolving systematic risk
Hedging	Hedge ratio $h_t$	Changing spot-futures relationship
Factor models	Factor loadings	Rotating style exposures
Volatility	$\ln(\sigma^2_t)$	Stochastic volatility (SV models)
Local trend	$\mu_t$	Slowly varying expected return

11.5 When to Use State-Space Models

State-space models offer several advantages: they handle missing data naturally, provide optimal filtering for signal extraction, quantify uncertainty at each time step, and are flexible enough that many models emerge as special cases. However, they also present challenges: model specification (choosing $T_t$, $Q_t$) requires careful thought, implementation is more complex than GARCH, and computational costs increase substantially for high-dimensional state vectors.

The Big Picture

State-space models, regime switching, and GARCH all address the same fundamental problem: financial relationships are not static. They differ in how they model non-stationarity:

GARCH: Variance changes, but the structure is constant
Regime switching: Parameters jump between discrete states
State-space: Parameters drift continuously

The best choice depends on whether you believe changes are gradual (state-space), abrupt (regime), or only affect volatility (GARCH).

12 Structural Breaks and Regime Switching

GARCH models assume that the parameters governing volatility dynamics remain constant over time. But financial markets undergo episodes where behaviour changes dramatically : what Brooks (2019) calls “very substantial changes in the properties of a series.” These changes may be one-off structural breaks or recurring regime switches.

12.1 What Causes Structural Breaks?

Structural breaks typically result from large-scale events:

Type	Examples	Effect on Volatility
Policy changes	Introduction of inflation targeting, QE programmes	May reduce or increase baseline volatility
Market microstructure	Electronic trading (“Big Bang”), decimal pricing	Often reduces transaction-related volatility
Financial crises	2008 GFC, COVID-19 crash	Dramatic volatility regime shift
Regulatory changes	Basel requirements, Dodd-Frank	May alter risk-taking behaviour

Why Breaks Matter

A linear model (including GARCH) estimated over a sample containing a structural break will be misspecified. The estimated parameters will be a weighted average of the true parameters in each regime : accurate for neither.

12.2 Testing for Structural Breaks

The Chow test is the classical approach: split the sample at a suspected break point and test whether the parameters differ significantly between sub-samples. However, this requires knowing when the break occurred.

More sophisticated approaches include:

CUSUM tests: Monitor cumulative sums of recursive residuals for evidence of parameter instability
Bai-Perron tests: Detect multiple unknown break points
Andrews-Ploberger tests: Test for breaks at unknown dates

Show code

# Calculate rolling 60-day volatility
spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index()
spy_df['return'] = spy_df['PX_LAST'].pct_change()
spy_df['rolling_vol_60'] = spy_df['return'].rolling(60).std() * np.sqrt(252) * 100

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(spy_df['rolling_vol_60'].dropna(), color='steelblue', linewidth=1)
ax.axhline(spy_df['rolling_vol_60'].mean(), color='coral', linestyle='--', 
           label=f'Mean: {spy_df["rolling_vol_60"].mean():.1f}%')
ax.set_ylabel('Annualised Volatility (%)')
ax.set_title('Rolling 60-Day Volatility: Evidence of Regime Changes?')
ax.legend()
plt.tight_layout()
plt.show()

Figure 7: Visual inspection for structural breaks in SPY volatility

12.3 Markov Switching Models

Rather than treating regime changes as one-off events, Markov switching models allow the series to switch between regimes probabilistically. Developed by Hamilton (1989), these models specify that an unobserved state variable $z_t$ governs which regime is active.

For a simple two-state model:

\[y_t = \begin{cases} \mu_1 + \phi_1 y_{t-1} + \sigma_1 u_t & \text{if } z_t = 1 \text{ (calm regime)} \\ \mu_2 + \phi_2 y_{t-1} + \sigma_2 u_t & \text{if } z_t = 2 \text{ (turbulent regime)} \end{cases}\]

The state evolves according to transition probabilities:

\[P = \begin{pmatrix} p_{11} & 1 - p_{11} \\ 1 - p_{22} & p_{22} \end{pmatrix}\]

Where $p_{11}$ is the probability of staying in regime 1 given we were in regime 1, and $p_{22}$ is the probability of staying in regime 2.

The Markov Property

A process has the Markov property if the probability of being in a given state depends only on the previous state, not on the entire history. This makes estimation tractable while still capturing regime persistence.

12.4 Threshold Autoregressive (TAR) Models

An alternative approach makes regime switches deterministic rather than probabilistic. In a Self-Exciting Threshold AutoRegressive (SETAR) model, the regime depends on whether a threshold variable exceeds a critical value:

\[y_t = \begin{cases} \phi^{(1)}_0 + \phi^{(1)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} \leq c \\ \phi^{(2)}_0 + \phi^{(2)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} > c \end{cases}\]

Where $c$ is the threshold and $d$ is the delay parameter.

Model Type	Regime Determination	Estimation	Use Case
Markov Switching	Probabilistic (latent state)	MLE with Hamilton filter	When switching is random/unpredictable
SETAR	Deterministic (observable variable)	NLS with grid search	When switching depends on observed values

12.5 Regime-Switching GARCH

Combining regime switching with GARCH captures both volatility clustering within regimes and shifts between regimes:

\[\sigma^2_t = \begin{cases} \alpha_0^{(1)} + \alpha_1^{(1)} \varepsilon^2_{t-1} + \beta_1^{(1)} \sigma^2_{t-1} & \text{if calm regime} \\ \alpha_0^{(2)} + \alpha_1^{(2)} \varepsilon^2_{t-1} + \beta_1^{(2)} \sigma^2_{t-1} & \text{if turbulent regime} \end{cases}\]

This is particularly useful for modelling financial crises, where not only the level of volatility changes but also its dynamics.

13 From Classical Models to Sequence Learning

The models discussed so far : GARCH, Markov switching, TAR : represent the classical econometric approach to time-varying volatility. Modern machine learning offers powerful extensions through sequence models that can capture more complex temporal dependencies.

13.1 The Conceptual Bridge

Consider the parallel between classical and modern approaches:

Classical Concept	ML Extension	Key Advance
AR(p) process	Recurrent Neural Network (RNN)	Non-linear dependencies, learned representations
Markov switching	Hidden Markov Model (HMM)	More flexible state dynamics
GARCH	LSTM/GRU networks	Long-range memory without explicit structure
Regime detection	Change point detection	Automated, data-driven identification

13.2 Recurrent Neural Networks as Non-Linear AR Models

A recurrent neural network (RNN) extends the autoregressive framework to allow non-linear dependencies:

\[h_t = f(W_h h_{t-1} + W_x x_t + b)\] \[y_t = g(W_y h_t + c)\]

Where $h_t$ is a hidden state that captures information from the entire history. Dixon, Halperin, and Bilokon (2020) describe RNNs as “non-linear time series models [that] generalise classical linear time series models such as AR(p).”

Why This Matters for Finance

RNNs can learn complex patterns in volatility that GARCH cannot capture:

Non-linear responses to shocks
Interactions between multiple assets
Regime-dependent dynamics (without pre-specifying regimes)

13.3 Long Short-Term Memory (LSTM)

Standard RNNs struggle with long-range dependencies : the vanishing gradient problem. LSTM networks address this with gated mechanisms that control information flow: a forget gate (what to discard from memory), an input gate (what new information to store), and an output gate (what to reveal to the next layer).

This architecture is particularly relevant for financial time series where volatility shocks can have persistent effects, market regimes may last months or years, and multiple timescales interact (daily noise, weekly patterns, monthly cycles).

13.4 Practical Implications

The connection between classical econometrics and modern ML is not merely theoretical. Consider four key trade-offs. Interpretability versus flexibility: GARCH parameters have direct economic meaning (persistence, reaction to news), whereas neural networks offer flexibility but less transparency. Data requirements: GARCH can be estimated reliably with hundreds of observations, but deep learning typically requires thousands or millions. Out-of-sample performance: simple GARCH often outperforms complex ML models for volatility forecasting : a reminder that more parameters does not equal better predictions. Finally, hybrid approaches that combine GARCH-type structure with neural network flexibility (such as Neural GARCH) may offer the best of both worlds.

The Fundamental Challenge

Whether using GARCH or LSTM, the core challenge remains: volatility is unobservable. We estimate it from returns, but we never see the “true” volatility to evaluate our models against. This is why comparing implied vs realised volatility, and testing forecasts against future squared returns, remains essential.

14 Summary

Concept	Key Point
Volatility clustering	Large returns follow large returns
Fat tails	Extreme events more common than normal
Leverage effect	Bad news increases volatility more than good news
ARCH	Conditional variance depends on past shocks
GARCH(1,1)	Adds persistence; often sufficient
Asymmetric GARCH	GJR-GARCH, EGARCH capture leverage
News impact curve	Visualises asymmetric volatility response
GARCH-M	Volatility in mean equation; risk premium
IGARCH	Unit-root persistence; unconditional variance undefined
DCC	Dynamic correlations for multivariate volatility
Stochastic volatility	Volatility as latent process; requires MCMC
Bayesian GARCH	Full uncertainty quantification; regularisation
State-space models	Time-varying parameters via Kalman filter
Time-varying beta	Evolving systematic risk exposure
Structural breaks	One-off changes in series behaviour
Markov switching	Probabilistic regime changes (Hamilton)
TAR/SETAR	Deterministic threshold-based regimes
Sequence learning	RNN/LSTM as non-linear AR extensions
VIX	Market’s expectation of future volatility

What’s Next?

In the lab, you’ll estimate GARCH models on the Bloomberg data, compare implied vs realised volatility, explore whether asymmetric models improve forecasts, and investigate evidence of regime changes.

15 References

Bekaert, Geert, and Guojun Wu. 2000. “Asymmetric Volatility and Risk in Equity Markets.” Review of Financial Studies 13 (1): 1–42. https://doi.org/10.1093/rfs/13.1.1.

Black, Fischer. 1976. “Studies of Stock Price Volatility Changes.” Proceedings of the 1976 Meetings of the American Statistical Association, Business and Economics Statistics Section, 177–81.

Bollerslev, Tim. 1986. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics 31 (3): 307–27. https://doi.org/10.1016/0304-4076(86)90063-1.

Brooks, Chris. 2019. Introductory Econometrics for Finance. 4th ed. Cambridge, UK: Cambridge University Press.

Campbell, John Y., and Ludger Hentschel. 1992. “No News Is Good News: An Asymmetric Model of Changing Volatility in Stock Returns.” Journal of Financial Economics 31 (3): 281–318. https://doi.org/10.1016/0304-405X(92)90037-X.

Dixon, Matthew F., Igor Halperin, and Paul Bilokon. 2020. Machine Learning in Finance: From Theory to Practice. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-41068-1.

Engle, Robert F. 1982. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (4): 987–1007. https://doi.org/10.2307/1912773.

———. 2002. “Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Conditional Heteroskedasticity Models.” Journal of Business and Economic Statistics 20 (3): 339–50. https://doi.org/10.1198/073500102288618487.

Engle, Robert F., David M. Lilien, and Russell P. Robins. 1987. “Estimating Time Varying Risk Premia in the Term Structure: The ARCH-m Model.” Econometrica 55 (2): 391–407. https://doi.org/10.2307/1913242.

Glosten, Lawrence R., Ravi Jagannathan, and David E. Runkle. 1993. “On the Relation Between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” Journal of Finance 48 (5): 1779–1801. https://doi.org/10.1111/j.1540-6261.1993.tb05128.x.

Nelson, Daniel B. 1991. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica 59 (2): 347–70. https://doi.org/10.2307/2938260.

Pagan, Adrian R., and G. William Schwert. 1990. “Alternative Models for Conditional Stock Volatility.” Journal of Econometrics 45 (1-2): 267–90. https://doi.org/10.1016/0304-4076(90)90101-X.

Tsay, Ruey S. 2010. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: Wiley.

--- title: "Volatility Modelling" subtitle: "From Stylised Facts to GARCH" author: "Barry Quinn" date: last-modified bibliography: ../resources/reading.bib format: html: toc: true toc-depth: 3 code-fold: true execute: warning: false message: false --- ## Introduction Volatility : the tendency of asset prices to fluctuate : is one of the most important concepts in finance. It matters for risk management, option pricing, portfolio construction, and regulatory capital. Yet volatility is not directly observable; we must estimate it from price data. **Why this chapter matters:** In the [Foundations chapter](01_foundations.qmd), we introduced the Three Prediction Problems in finance: predicting the *mean* (returns), the *variance* (volatility), and *cross-sectional* variation (which assets outperform). We saw that returns have almost no predictable signal (~1-2% R²), making ARIMA largely useless for return prediction. **Volatility is different.** The conditional variance is genuinely predictable (~15-40% R²), and this predictability has direct economic value for options pricing, risk management, and portfolio allocation. This chapter focuses on the second prediction problem: where the signal actually exists. This chapter develops your understanding of volatility from three perspectives: empirically, by examining what patterns we observe in financial return volatility; theoretically, by showing how ARCH and GARCH models capture these patterns; and practically, by demonstrating how to estimate, forecast, and apply volatility models. By the end of this chapter, you will understand why GARCH(1,1) has become the workhorse model for volatility in finance, and when more sophisticated approaches are needed. ::: {.callout-note} #### View Slides Open the lecture deck: [Week 4: Volatility Modelling](../slides/week04_volatility.qmd) ::: ::: {.callout-tip} ## Connection to Week 3 This chapter builds directly on the time series foundations from Week 3. There you learned that the ACF of returns is near zero (no signal in the mean), but the ACF of *squared* returns shows strong persistence. That persistence is volatility clustering: and modelling it is what this chapter is about. ::: ```{python} #| label: setup-data-root #| include: false import sys from pathlib import Path sys.path.insert(0, str(Path("scripts").resolve())) from bloomberg_loader import load_bloomberg ``` ## Stylised Facts of Financial Volatility Before building models, we must understand what we're trying to capture. Financial return volatility exhibits several well-documented patterns : **stylised facts** : that any good model should reproduce. ### Volatility Clustering Perhaps the most striking feature of financial returns is **volatility clustering**: large returns (positive or negative) tend to be followed by large returns, and small returns tend to be followed by small returns. @tsay2010analysis describes this as: "Volatility is not constant over time. There are periods of high volatility alternating with periods of relative calm." This pattern is visible in virtually every financial time series. The figure below demonstrates clustering in the Bloomberg equity data: ```{python} #| label: fig-volatility-clustering #| fig-cap: "Volatility clustering in SPY returns: Large moves cluster together" import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load Bloomberg database df = load_bloomberg() # Get SPY returns spy = df[df['ticker'] == 'SPY'].set_index('date').sort_index() returns = spy['PX_LAST'].pct_change().dropna() # Plot returns fig, axes = plt.subplots(2, 1, figsize=(10, 6), sharex=True) # Returns axes[0].plot(returns.index, returns.values, linewidth=0.5, color='steelblue') axes[0].axhline(0, color='gray', linestyle='--', linewidth=0.5) axes[0].set_ylabel('Daily Return') axes[0].set_title('SPY Daily Returns: Volatility Clustering') # Absolute returns (proxy for volatility) axes[1].plot(returns.index, np.abs(returns.values), linewidth=0.5, color='coral') axes[1].set_ylabel('|Return|') axes[1].set_xlabel('Date') axes[1].set_title('Absolute Returns: Clusters of High and Low Volatility') plt.tight_layout() plt.show() ``` The autocorrelation structure confirms this pattern: while returns themselves show little autocorrelation (consistent with market efficiency), *squared* returns are highly autocorrelated: ```{python} #| label: fig-acf-comparison #| fig-cap: "Returns show no autocorrelation; squared returns do" from statsmodels.graphics.tsaplots import plot_acf fig, axes = plt.subplots(1, 2, figsize=(10, 4)) # ACF of returns plot_acf(returns.dropna(), ax=axes[0], lags=20, title='ACF: Returns') # ACF of squared returns plot_acf(returns.dropna()**2, ax=axes[1], lags=20, title='ACF: Squared Returns') plt.tight_layout() plt.show() ``` ### Fat Tails (Leptokurtosis) Financial returns consistently exhibit **fatter tails** than the normal distribution predicts. Extreme events : crashes, spikes : occur far more frequently than a Gaussian model would suggest. @tsay2010analysis notes: "Kurtosis often exceeds three (the kurtosis of a normal distribution), and often exceeds three by a substantial margin." ```{python} #| label: tbl-distribution-stats #| tbl-cap: "Financial returns exhibit excess kurtosis" from scipy import stats # Calculate statistics for multiple assets tickers = ['AAPL', 'GOOGL', 'MSFT', 'SPY'] results = [] for ticker in tickers: asset = df[df['ticker'] == ticker].set_index('date')['PX_LAST'] ret = asset.pct_change().dropna() results.append({ 'Ticker': ticker, 'Mean (%)': f"{ret.mean()*100:.3f}", 'Std (%)': f"{ret.std()*100:.2f}", 'Skewness': f"{stats.skew(ret):.2f}", 'Kurtosis': f"{stats.kurtosis(ret):.2f}", 'Normal?': 'Yes' if stats.kurtosis(ret) < 1 else 'No' }) pd.DataFrame(results) ``` ### The Leverage Effect Volatility responds asymmetrically to returns: negative returns tend to increase volatility more than positive returns of the same magnitude. This **leverage effect** was first documented by @black1976studies. Several theoretical explanations have been proposed: | Theory | Mechanism | Key Insight | |--------|-----------|-------------| | **Leverage hypothesis** | When prices fall, debt/equity ratio rises mechanically | Firms become riskier → volatility increases | | **Volatility feedback** | If volatility is priced, expected volatility increase raises required return | Causation runs both ways: high volatility → lower prices | | **Risk premium channel** | Higher expected volatility demands higher risk premium | Current price falls to deliver higher expected return | | **Behavioural asymmetry** | Investors react more strongly to losses (prospect theory) | Bad news generates more trading, uncertainty | | **Margin constraints** | Downturns trigger margin calls, forced selling | Amplification through deleveraging cascades | The leverage hypothesis is the most cited explanation, but @campbell1992no and @bekaert2000asymmetric show that volatility feedback may be equally or more important. In practice, all mechanisms likely operate simultaneously, reinforcing the asymmetric pattern. ```{python} #| label: fig-leverage-effect #| fig-cap: "Negative returns increase volatility more than positive returns" # Calculate rolling volatility and lagged returns spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index() spy_df['return'] = spy_df['PX_LAST'].pct_change() spy_df['rolling_vol'] = spy_df['return'].rolling(20).std() spy_df['lagged_return'] = spy_df['return'].shift(1) # Bin by lagged return spy_clean = spy_df.dropna() spy_clean['return_bin'] = pd.qcut(spy_clean['lagged_return'], q=5, labels=['Very Neg', 'Neg', 'Neutral', 'Pos', 'Very Pos']) # Plot fig, ax = plt.subplots(figsize=(8, 5)) spy_clean.groupby('return_bin')['rolling_vol'].mean().plot(kind='bar', ax=ax, color='steelblue') ax.set_xlabel('Previous Day Return') ax.set_ylabel('Average 20-Day Volatility') ax.set_title('Leverage Effect: Negative Returns Lead to Higher Volatility') ax.tick_params(axis='x', rotation=0) plt.tight_layout() plt.show() ``` ## The ARCH Model ### Motivation: Conditional Heteroskedasticity Classical econometrics assumes **homoskedasticity** : constant variance of errors. But volatility clustering tells us this assumption fails for financial data. The variance is not constant; it changes over time in predictable ways. The key insight of @engle1982autoregressive was to distinguish between **unconditional** and **conditional** variance: - **Unconditional variance**: The long-run average variance (constant) - **Conditional variance**: The variance *right now*, given what we know (time-varying) ::: {.callout-important} ## Engle's Key Insight While the *unconditional* variance of returns may be constant, the *conditional* variance : what we expect given recent information : varies over time in a way we can model. ::: ### The ARCH(q) Specification The **AutoRegressive Conditional Heteroskedasticity** model specifies: $$r_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim N(0,1)$$ $$\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \alpha_2 \varepsilon^2_{t-2} + \cdots + \alpha_q \varepsilon^2_{t-q}$$ Where: - $r_t$ is the return at time $t$ - $\sigma^2_t$ is the **conditional variance** at time $t$ - $\varepsilon^2_{t-1}, \ldots, \varepsilon^2_{t-q}$ are past squared shocks **Interpretation**: Today's volatility depends on recent *surprises*. If yesterday had a large shock (positive or negative), today's volatility will be elevated. ## The GARCH Model ### From ARCH to GARCH @bollerslev1986generalized extended ARCH by including lagged conditional variances: $$\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}$$ This is **GARCH(1,1)** : one ARCH term, one GARCH term. @brooks2019introductory notes: > "A GARCH(1,1) model will be sufficient to capture the volatility clustering in the data, and rarely is any higher order model estimated or even entertained in the academic finance literature." ### Parameter Interpretation | Parameter | Name | Interpretation | |-----------|------|----------------| | $\alpha_0$ | Constant | Long-run variance floor | | $\alpha_1$ | ARCH term | Reaction to recent shocks | | $\beta_1$ | GARCH term | Persistence of volatility | | $\alpha_1 + \beta_1$ | Persistence | How long shocks affect volatility | ::: {.callout-tip} ## Rule of Thumb For most financial assets, $\alpha_1 + \beta_1$ is close to (but less than) 1, meaning volatility shocks are highly persistent. Values above 0.9 are typical. ::: ### Why GARCH(1,1) Often Suffices GARCH(1,1) is remarkably effective for three reasons: parsimony (only 3 parameters capture complex volatility dynamics), memory (the recursive structure implicitly uses the entire history), and mean reversion (volatility eventually returns to its long-run level). ```{python} #| label: fig-garch-fit #| fig-cap: "GARCH(1,1) captures volatility dynamics" #| eval: false from arch import arch_model # Fit GARCH(1,1) to SPY returns_pct = returns * 100 # Scale for numerical stability model = arch_model(returns_pct.dropna(), vol='Garch', p=1, q=1, mean='Constant') result = model.fit(disp='off') # Extract conditional volatility cond_vol = result.conditional_volatility # Plot fig, ax = plt.subplots(figsize=(10, 5)) ax.plot(returns_pct.dropna().index, np.abs(returns_pct.dropna().values), alpha=0.3, label='|Returns|', color='steelblue') ax.plot(cond_vol.index, cond_vol.values, label='GARCH(1,1) Conditional Volatility', color='coral', linewidth=1.5) ax.legend() ax.set_ylabel('Volatility (%)') ax.set_title('GARCH(1,1) Fitted Volatility vs Absolute Returns') plt.tight_layout() plt.show() print(result.summary().tables[1]) ``` ### Stationarity Conditions For GARCH(1,1) to be **covariance stationary**, we require: $$\alpha_1 + \beta_1 < 1$$ When this holds, the **unconditional variance** exists: $$\sigma^2 = \frac{\alpha_0}{1 - \alpha_1 - \beta_1}$$ If $\alpha_1 + \beta_1 = 1$, we have **Integrated GARCH (IGARCH)** : shocks persist forever. ## Asymmetric GARCH Models Standard GARCH treats positive and negative shocks symmetrically. But the leverage effect suggests this is wrong. Several extensions address this: ### GJR-GARCH @glosten1993relation add an indicator for negative shocks: $$\sigma^2_t = \alpha_0 + (\alpha_1 + \gamma I_{t-1}) \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}$$ Where $I_{t-1} = 1$ if $\varepsilon_{t-1} < 0$ (negative shock). The parameter $\gamma$ captures the additional impact of bad news. ### EGARCH @nelson1991conditional use logarithms to ensure positivity: $$\ln(\sigma^2_t) = \alpha_0 + \alpha_1 \left( \frac{|\varepsilon_{t-1}|}{\sigma_{t-1}} - \sqrt{2/\pi} \right) + \gamma \frac{\varepsilon_{t-1}}{\sigma_{t-1}} + \beta_1 \ln(\sigma^2_{t-1})$$ The $\gamma$ term directly captures asymmetry. ### Visualising Asymmetry: News Impact Curves The **news impact curve** (@pagan1990alternative) provides a powerful way to visualise how different shocks affect future volatility. It plots next-period volatility against various values of the previous shock, holding other factors constant. For a symmetric GARCH model, the curve is a parabola centred at zero : positive and negative shocks of equal magnitude have identical effects. For asymmetric models (GJR, EGARCH), the curve tilts, showing that negative shocks have larger impact. ```{python} #| label: fig-news-impact #| fig-cap: "News impact curves: GARCH (symmetric) vs GJR (asymmetric)" # Simulated news impact curves for illustration shocks = np.linspace(-3, 3, 100) # Assume typical parameters alpha0, alpha1, beta1 = 0.00001, 0.08, 0.90 gamma = 0.10 # Asymmetry parameter for GJR # GARCH: symmetric response garch_vol = alpha0 + alpha1 * shocks**2 # GJR: asymmetric - larger response to negative shocks gjr_vol = alpha0 + (alpha1 + gamma * (shocks < 0)) * shocks**2 fig, ax = plt.subplots(figsize=(8, 5)) ax.plot(shocks, garch_vol * 10000, label='GARCH (symmetric)', color='steelblue', linewidth=2) ax.plot(shocks, gjr_vol * 10000, label='GJR (asymmetric)', color='coral', linewidth=2) ax.axvline(0, color='gray', linestyle='--', linewidth=0.5) ax.set_xlabel('Previous Shock (standardised)') ax.set_ylabel('Next-Period Variance (×10⁴)') ax.set_title('News Impact Curves: Asymmetry in Volatility Response') ax.legend() plt.tight_layout() plt.show() ``` The news impact curve reveals that under GJR, a negative shock of magnitude 2 has substantially more impact on future volatility than a positive shock of the same size : consistent with the leverage effect. ## GARCH-M: The Risk-Return Relationship Standard GARCH models volatility, but doesn't connect it to returns. The **GARCH-in-Mean (GARCH-M)** model (@engle1987estimating) lets volatility enter the mean equation directly: $$r_t = \mu + \delta \sigma_{t-1} + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t$$ $$\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}$$ The parameter $\delta$ captures the **risk premium**: if $\delta > 0$, higher expected volatility leads to higher expected returns : compensation for bearing risk. ::: {.callout-note} ## Interpreting GARCH-M A positive and significant $\delta$ supports the risk-return trade-off from finance theory. However, empirical evidence is mixed : $\delta$ is often insignificant or even negative at short horizons. This may reflect: - Time-varying risk aversion - Different risk horizons for different investors - Measurement error in conditional volatility ::: ## Long Memory in Volatility: IGARCH and Beyond When $\alpha_1 + \beta_1$ approaches 1, volatility shocks become extremely persistent. The **Integrated GARCH (IGARCH)** model sets $\alpha_1 + \beta_1 = 1$ exactly: $$\sigma^2_t = \alpha_0 + \beta_1 \sigma^2_{t-1} + (1 - \beta_1) \varepsilon^2_{t-1}$$ @tsay2010analysis notes that under IGARCH, "the impact of past squared shocks... on $\sigma^2_t$ is persistent" : shocks never fully decay. The unconditional variance does not exist. | Model | Persistence | Unconditional Variance | Use Case | |-------|-------------|------------------------|----------| | **GARCH(1,1)** | $\alpha_1 + \beta_1 < 1$ | Exists, finite | Standard applications | | **IGARCH(1,1)** | $\alpha_1 + \beta_1 = 1$ | Does not exist | Very persistent volatility | | **FIGARCH** | Fractional integration | Exists | Long memory without unit root | ::: {.callout-warning} ## The IGARCH Puzzle Finding $\alpha_1 + \beta_1 \approx 1$ is common empirically, but concerning theoretically : it implies volatility shocks persist forever. This may indicate: - Occasional level shifts in volatility (regime changes) - Structural breaks misinterpreted as persistence - Need for a regime-switching specification ::: ## Multivariate Volatility: DCC and Portfolio Applications When managing portfolios, we need not just individual asset volatilities but **covariances** between assets. Multivariate GARCH models capture time-varying correlations. ### Dynamic Conditional Correlation (DCC) The **DCC model** (@engle2002dynamic) separates volatility dynamics from correlation dynamics: $$r_t = \mu_t + D_t z_t$$ Where $D_t$ is a diagonal matrix of univariate GARCH volatilities, and correlations evolve as: $$Q_t = (1 - \theta_1 - \theta_2)\bar{Q} + \theta_1 \epsilon_{t-1}\epsilon'_{t-1} + \theta_2 Q_{t-1}$$ The correlation matrix is then $R_t = \text{diag}(Q_t)^{-1/2} Q_t \text{diag}(Q_t)^{-1/2}$. ### Application: Time-Varying Hedge Ratios A key application of multivariate GARCH is computing **optimal hedge ratios** that vary over time. The minimum variance hedge ratio is: $$h_t = \rho_t \frac{\sigma_{s,t}}{\sigma_{f,t}}$$ Where $\rho_t$ is the conditional correlation between spot and futures returns, and $\sigma_{s,t}, \sigma_{f,t}$ are their conditional volatilities. @brooks2019introductory shows that time-varying hedge ratios from multivariate GARCH often outperform static hedges, particularly during volatile periods. ## Practical Application: VIX and Realised Volatility ### The VIX Index The **VIX** (CBOE Volatility Index) measures the market's expectation of 30-day volatility, extracted from S&P 500 option prices. It represents **implied volatility** : what traders expect. ```{python} #| label: fig-vix-analysis #| fig-cap: "VIX (implied) vs SPY realised volatility" # Load VIX from Bloomberg database vix = df[df['ticker'] == 'VIX'].set_index('date')['PX_LAST'] spy_close = df[df['ticker'] == 'SPY'].set_index('date')['PX_LAST'] # Calculate 30-day realised volatility (annualised) spy_ret = spy_close.pct_change() realised_vol = spy_ret.rolling(30).std() * np.sqrt(252) * 100 # Align dates common_idx = vix.index.intersection(realised_vol.dropna().index) fig, ax = plt.subplots(figsize=(10, 5)) ax.plot(common_idx, vix.loc[common_idx], label='VIX (Implied)', color='coral') ax.plot(common_idx, realised_vol.loc[common_idx], label='30-Day Realised Vol', color='steelblue', alpha=0.7) ax.legend() ax.set_ylabel('Volatility (%)') ax.set_title('Implied vs Realised Volatility') plt.tight_layout() plt.show() ``` ### Volatility Risk Premium Implied volatility typically exceeds realised volatility : the **volatility risk premium**. This reflects compensation for bearing volatility risk. ## Bayesian Approaches to Volatility While GARCH models are typically estimated by maximum likelihood, Bayesian methods offer several advantages for volatility modelling : particularly when dealing with parameter uncertainty, model comparison, and complex specifications. ### Why Bayesian Volatility Modelling? | Challenge | MLE Approach | Bayesian Solution | |-----------|--------------|-------------------| | **Parameter uncertainty** | Point estimates + asymptotic SE | Full posterior distributions | | **Model comparison** | Information criteria (AIC, BIC) | Bayes factors, posterior model probabilities | | **Small samples** | Unreliable estimates | Priors stabilise inference | | **Complex models** | Numerical optimisation often fails | MCMC handles high dimensions | ### Stochastic Volatility Models GARCH treats volatility as a deterministic function of past shocks. **Stochastic volatility (SV)** models treat volatility itself as a random process: $$r_t = \exp(h_t/2) \varepsilon_t, \quad \varepsilon_t \sim N(0,1)$$ $$h_t = \mu + \phi(h_{t-1} - \mu) + \sigma_\eta \eta_t, \quad \eta_t \sim N(0,1)$$ Where $h_t = \ln(\sigma^2_t)$ is log-volatility, which follows an AR(1) process. The parameter $\phi$ captures volatility persistence; $\sigma_\eta$ captures volatility-of-volatility. ::: {.callout-note} ## SV vs GARCH - **GARCH**: Volatility is a known function of past data; likelihood is tractable - **SV**: Volatility is a latent (unobserved) process; requires MCMC or particle filters SV models often fit financial data better than GARCH (lower in-sample MSE), but GARCH is easier to estimate and produces similar forecasts. The choice depends on whether you need full uncertainty quantification. ::: ### Bayesian GARCH Even standard GARCH can benefit from Bayesian treatment: $$\sigma^2_t = \alpha_0 + \alpha_1 \varepsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}$$ With priors: $$\alpha_0 \sim \text{Gamma}(\cdot), \quad \alpha_1, \beta_1 \sim \text{Beta}(\cdot)$$ The Beta priors on $\alpha_1$ and $\beta_1$ automatically enforce the constraint $0 \leq \alpha_1, \beta_1 \leq 1$. A prior favouring $\alpha_1 + \beta_1$ close to (but less than) 1 incorporates the stylised fact that volatility is highly persistent. ### Practical Benefits Bayesian approaches to volatility offer three key practical benefits. First, uncertainty propagation: when forecasting volatility, Bayesian methods produce predictive distributions rather than point forecasts, which is essential for risk management since the uncertainty in your volatility estimate should feed into your VaR calculation. Second, shrinkage and regularisation: Bayesian priors prevent extreme parameter estimates that can arise with MLE in small samples, analogous to ridge regression stabilising coefficient estimates. Third, model averaging: rather than selecting a single model, Bayesian model averaging weights predictions from multiple models by their posterior probability, hedging against model misspecification. ::: {.callout-tip} ## The Pragmatic View For routine volatility forecasting, MLE-GARCH remains the workhorse. Bayesian methods add value when: - You need full uncertainty quantification for downstream decisions - You're combining volatility estimates with other uncertain inputs - You're estimating complex models where MLE struggles - You want principled model comparison across specifications ::: ## Time-Varying Parameter Models So far, we've modelled volatility as varying while treating other parameters (mean, coefficients) as constant. But what if the *relationship* between variables changes over time? **State-space models** and the **Kalman filter** provide a general framework for time-varying parameters. ### The State-Space Framework A state-space model separates what we observe from what we want to estimate: **Measurement equation** (what we see): $$y_t = Z_t \alpha_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, H_t)$$ **Transition equation** (how states evolve): $$\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t)$$ Where: - $y_t$ is the observed variable (e.g., returns) - $\alpha_t$ is the **state vector** (e.g., time-varying beta) - $Z_t, T_t$ are system matrices defining the model structure @tsay2010analysis notes: "The Kalman filter is a recursive algorithm for computing the optimal estimator of the state vector at time $t$ based on information available at time $t$." ### Application: Time-Varying Beta (CAPM) The static CAPM assumes beta is constant: $$r_{i,t} - r_f = \alpha + \beta (r_{m,t} - r_f) + \varepsilon_t$$ But betas change as firms' risk profiles evolve. A state-space formulation allows beta to drift: **Measurement:** $r_{i,t} = \alpha_t + \beta_t r_{m,t} + \varepsilon_t$ **State transition:** $\beta_{t+1} = \beta_t + \eta_t$ This is a **random walk** specification for beta : each period's beta equals last period's plus noise. The Kalman filter optimally tracks the evolving beta given the noisy observations. | Approach | Assumption | Use Case | |----------|------------|----------| | **Static regression** | $\beta$ constant forever | Long-run average exposure | | **Rolling window** | $\beta$ constant within window | Simple adaptation | | **Kalman filter** | $\beta$ evolves as random walk | Optimal tracking | | **Regime switching** | $\beta$ jumps between states | Discrete regime changes | ### The Kalman Filter: Predict and Update The Kalman filter operates in two steps. First, predict: use the transition equation to forecast the state and its uncertainty. Second, update: when new data arrive, combine the forecast with the observation. The **Kalman gain** $K_t$ determines how much weight to give new information versus the forecast : it's automatically higher when the forecast is uncertain or the observation is precise. ::: {.callout-note} ## Connection to Bayesian Inference The Kalman filter is **Bayesian inference for linear-Gaussian state-space models**: - The prior is the predicted state distribution - The likelihood comes from the measurement equation - The posterior is the updated (filtered) state distribution For non-linear or non-Gaussian models, extensions like particle filters or MCMC are needed. ::: ### Practical Applications in Finance | Application | State Variable | What It Captures | |-------------|----------------|------------------| | **Time-varying beta** | $\beta_t$ | Evolving systematic risk | | **Hedging** | Hedge ratio $h_t$ | Changing spot-futures relationship | | **Factor models** | Factor loadings | Rotating style exposures | | **Volatility** | $\ln(\sigma^2_t)$ | Stochastic volatility (SV models) | | **Local trend** | $\mu_t$ | Slowly varying expected return | ### When to Use State-Space Models State-space models offer several advantages: they handle missing data naturally, provide optimal filtering for signal extraction, quantify uncertainty at each time step, and are flexible enough that many models emerge as special cases. However, they also present challenges: model specification (choosing $T_t$, $Q_t$) requires careful thought, implementation is more complex than GARCH, and computational costs increase substantially for high-dimensional state vectors. ::: {.callout-important} ## The Big Picture State-space models, regime switching, and GARCH all address the same fundamental problem: financial relationships are not static. They differ in *how* they model non-stationarity: - **GARCH**: Variance changes, but the *structure* is constant - **Regime switching**: Parameters jump between discrete states - **State-space**: Parameters drift continuously The best choice depends on whether you believe changes are gradual (state-space), abrupt (regime), or only affect volatility (GARCH). ::: ## Structural Breaks and Regime Switching GARCH models assume that the parameters governing volatility dynamics remain constant over time. But financial markets undergo episodes where behaviour changes dramatically : what @brooks2019introductory calls "very substantial changes in the properties of a series." These changes may be one-off **structural breaks** or recurring **regime switches**. ### What Causes Structural Breaks? Structural breaks typically result from large-scale events: | Type | Examples | Effect on Volatility | |------|----------|---------------------| | **Policy changes** | Introduction of inflation targeting, QE programmes | May reduce or increase baseline volatility | | **Market microstructure** | Electronic trading ("Big Bang"), decimal pricing | Often reduces transaction-related volatility | | **Financial crises** | 2008 GFC, COVID-19 crash | Dramatic volatility regime shift | | **Regulatory changes** | Basel requirements, Dodd-Frank | May alter risk-taking behaviour | ::: {.callout-important} ## Why Breaks Matter A linear model (including GARCH) estimated over a sample containing a structural break will be misspecified. The estimated parameters will be a weighted average of the true parameters in each regime : accurate for neither. ::: ### Testing for Structural Breaks The **Chow test** is the classical approach: split the sample at a suspected break point and test whether the parameters differ significantly between sub-samples. However, this requires *knowing* when the break occurred. More sophisticated approaches include: - **CUSUM tests**: Monitor cumulative sums of recursive residuals for evidence of parameter instability - **Bai-Perron tests**: Detect multiple unknown break points - **Andrews-Ploberger tests**: Test for breaks at unknown dates ```{python} #| label: fig-structural-break #| fig-cap: "Visual inspection for structural breaks in SPY volatility" # Calculate rolling 60-day volatility spy_df = df[df['ticker'] == 'SPY'].set_index('date').sort_index() spy_df['return'] = spy_df['PX_LAST'].pct_change() spy_df['rolling_vol_60'] = spy_df['return'].rolling(60).std() * np.sqrt(252) * 100 fig, ax = plt.subplots(figsize=(10, 5)) ax.plot(spy_df['rolling_vol_60'].dropna(), color='steelblue', linewidth=1) ax.axhline(spy_df['rolling_vol_60'].mean(), color='coral', linestyle='--', label=f'Mean: {spy_df["rolling_vol_60"].mean():.1f}%') ax.set_ylabel('Annualised Volatility (%)') ax.set_title('Rolling 60-Day Volatility: Evidence of Regime Changes?') ax.legend() plt.tight_layout() plt.show() ``` ### Markov Switching Models Rather than treating regime changes as one-off events, **Markov switching models** allow the series to switch between regimes probabilistically. Developed by Hamilton (1989), these models specify that an unobserved **state variable** $z_t$ governs which regime is active. For a simple two-state model: $$y_t = \begin{cases} \mu_1 + \phi_1 y_{t-1} + \sigma_1 u_t & \text{if } z_t = 1 \text{ (calm regime)} \\ \mu_2 + \phi_2 y_{t-1} + \sigma_2 u_t & \text{if } z_t = 2 \text{ (turbulent regime)} \end{cases}$$ The state evolves according to **transition probabilities**: $$P = \begin{pmatrix} p_{11} & 1 - p_{11} \\ 1 - p_{22} & p_{22} \end{pmatrix}$$ Where $p_{11}$ is the probability of staying in regime 1 given we were in regime 1, and $p_{22}$ is the probability of staying in regime 2. ::: {.callout-note} ## The Markov Property A process has the Markov property if the probability of being in a given state depends *only* on the previous state, not on the entire history. This makes estimation tractable while still capturing regime persistence. ::: ### Threshold Autoregressive (TAR) Models An alternative approach makes regime switches **deterministic** rather than probabilistic. In a **Self-Exciting Threshold AutoRegressive (SETAR)** model, the regime depends on whether a threshold variable exceeds a critical value: $$y_t = \begin{cases} \phi^{(1)}_0 + \phi^{(1)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} \leq c \\ \phi^{(2)}_0 + \phi^{(2)}_1 y_{t-1} + \varepsilon_t & \text{if } y_{t-d} > c \end{cases}$$ Where $c$ is the **threshold** and $d$ is the **delay parameter**. | Model Type | Regime Determination | Estimation | Use Case | |------------|---------------------|------------|----------| | **Markov Switching** | Probabilistic (latent state) | MLE with Hamilton filter | When switching is random/unpredictable | | **SETAR** | Deterministic (observable variable) | NLS with grid search | When switching depends on observed values | ### Regime-Switching GARCH Combining regime switching with GARCH captures both volatility clustering *within* regimes and shifts *between* regimes: $$\sigma^2_t = \begin{cases} \alpha_0^{(1)} + \alpha_1^{(1)} \varepsilon^2_{t-1} + \beta_1^{(1)} \sigma^2_{t-1} & \text{if calm regime} \\ \alpha_0^{(2)} + \alpha_1^{(2)} \varepsilon^2_{t-1} + \beta_1^{(2)} \sigma^2_{t-1} & \text{if turbulent regime} \end{cases}$$ This is particularly useful for modelling financial crises, where not only the *level* of volatility changes but also its *dynamics*. ## From Classical Models to Sequence Learning The models discussed so far : GARCH, Markov switching, TAR : represent the classical econometric approach to time-varying volatility. Modern machine learning offers powerful extensions through **sequence models** that can capture more complex temporal dependencies. ### The Conceptual Bridge Consider the parallel between classical and modern approaches: | Classical Concept | ML Extension | Key Advance | |-------------------|--------------|-------------| | **AR(p)** process | Recurrent Neural Network (RNN) | Non-linear dependencies, learned representations | | **Markov switching** | Hidden Markov Model (HMM) | More flexible state dynamics | | **GARCH** | LSTM/GRU networks | Long-range memory without explicit structure | | **Regime detection** | Change point detection | Automated, data-driven identification | ### Recurrent Neural Networks as Non-Linear AR Models A **recurrent neural network** (RNN) extends the autoregressive framework to allow non-linear dependencies: $$h_t = f(W_h h_{t-1} + W_x x_t + b)$$ $$y_t = g(W_y h_t + c)$$ Where $h_t$ is a **hidden state** that captures information from the entire history. @dixon2020machine describe RNNs as "non-linear time series models [that] generalise classical linear time series models such as AR(p)." ::: {.callout-tip} ## Why This Matters for Finance RNNs can learn complex patterns in volatility that GARCH cannot capture: - Non-linear responses to shocks - Interactions between multiple assets - Regime-dependent dynamics (without pre-specifying regimes) ::: ### Long Short-Term Memory (LSTM) Standard RNNs struggle with **long-range dependencies** : the vanishing gradient problem. **LSTM** networks address this with gated mechanisms that control information flow: a forget gate (what to discard from memory), an input gate (what new information to store), and an output gate (what to reveal to the next layer). This architecture is particularly relevant for financial time series where volatility shocks can have persistent effects, market regimes may last months or years, and multiple timescales interact (daily noise, weekly patterns, monthly cycles). ### Practical Implications The connection between classical econometrics and modern ML is not merely theoretical. Consider four key trade-offs. Interpretability versus flexibility: GARCH parameters have direct economic meaning (persistence, reaction to news), whereas neural networks offer flexibility but less transparency. Data requirements: GARCH can be estimated reliably with hundreds of observations, but deep learning typically requires thousands or millions. Out-of-sample performance: simple GARCH often outperforms complex ML models for volatility forecasting : a reminder that more parameters does not equal better predictions. Finally, hybrid approaches that combine GARCH-type structure with neural network flexibility (such as Neural GARCH) may offer the best of both worlds. ::: {.callout-important} ## The Fundamental Challenge Whether using GARCH or LSTM, the core challenge remains: volatility is **unobservable**. We estimate it from returns, but we never see the "true" volatility to evaluate our models against. This is why comparing implied vs realised volatility, and testing forecasts against future squared returns, remains essential. ::: ## Summary | Concept | Key Point | |---------|-----------| | **Volatility clustering** | Large returns follow large returns | | **Fat tails** | Extreme events more common than normal | | **Leverage effect** | Bad news increases volatility more than good news | | **ARCH** | Conditional variance depends on past shocks | | **GARCH(1,1)** | Adds persistence; often sufficient | | **Asymmetric GARCH** | GJR-GARCH, EGARCH capture leverage | | **News impact curve** | Visualises asymmetric volatility response | | **GARCH-M** | Volatility in mean equation; risk premium | | **IGARCH** | Unit-root persistence; unconditional variance undefined | | **DCC** | Dynamic correlations for multivariate volatility | | **Stochastic volatility** | Volatility as latent process; requires MCMC | | **Bayesian GARCH** | Full uncertainty quantification; regularisation | | **State-space models** | Time-varying parameters via Kalman filter | | **Time-varying beta** | Evolving systematic risk exposure | | **Structural breaks** | One-off changes in series behaviour | | **Markov switching** | Probabilistic regime changes (Hamilton) | | **TAR/SETAR** | Deterministic threshold-based regimes | | **Sequence learning** | RNN/LSTM as non-linear AR extensions | | **VIX** | Market's expectation of future volatility | ::: {.callout-note} ## What's Next? In the lab, you'll estimate GARCH models on the Bloomberg data, compare implied vs realised volatility, explore whether asymmetric models improve forecasts, and investigate evidence of regime changes. ::: ## References