Lab 7: Digital Asset Data Analysis

Market structure, volatility, and efficiency testing

Expected time

Core lab: ≈ 75 minutes
Optional extensions: +30–60 minutes

Open in Colab

1 Before You Code: The Big Picture

Cryptocurrencies promise financial inclusion, decentralization, and censorship resistance. But do they deliver? Let’s test the claims empirically using market microstructure analysis.

The Crypto Promise vs. Reality

The Promise:

Inclusion: Banking for the 1.7 billion unbanked (World Bank)
Efficiency: Near-zero transaction costs, instant settlement
Decentralization: No intermediaries, no gatekeepers
Transparency: All transactions on public blockchain

The Reality (Empirical Evidence):

Volatility: Bitcoin std dev ~80% annualized (vs. S&P 500 ~15%)
Correlation: Bitcoin-S&P correlation increased from ~0 (2015) to ~0.5 (2022): no longer diversifying
Efficiency: Autocorrelation tests show predictability (inefficient markets)
Inclusion: 95% of crypto holders are speculators, not unbanked users (Makarov & Schoar 2022, JF)
Costs: During congestion, Ethereum gas fees reached $50+ per transaction

The Academic Debate:

Academic skeptics (no financial stake): Paul Krugman (Nobel Prize-winning economist, argues crypto lacks intrinsic value and serves primarily for illegal transactions) and Nouriel Roubini (NYU economist who predicted 2008 crisis, calls crypto “the mother of all scams”) view cryptocurrency as a speculative bubble with no fundamental value, poor unit of account properties, and dominated by fraud. Their critique comes from outside the crypto ecosystem with no personal financial interest.
Industry advocates (significant skin in the game): Andreas M. Antonopoulos (author of Mastering Bitcoin, emphasizes censorship resistance and financial sovereignty) and Vitalik Buterin (Ethereum co-founder, argues for programmable money and decentralized applications beyond payments) counter that crypto is early-stage infrastructure: like the internet in 1995: requiring time for legitimate use cases to mature beyond speculation. Both have deep financial and reputational stakes in crypto’s success.
Evidence-based approach (This lab): Understanding incentives matters. Academic critics risk nothing by being wrong; industry advocates benefit financially from adoption. Rather than choosing sides, we test empirical claims with data: volatility patterns, correlation dynamics, market efficiency, and actual usage statistics.

1.1 What You’ll Build Today

By the end of this lab, you will have:

✅ Real-time crypto data from public APIs (CoinGecko)
✅ Volatility analysis comparing crypto to traditional assets
✅ Return distribution analysis (fat tails, skewness)
✅ Market efficiency tests (autocorrelation, mean reversion)
✅ Critical perspective on crypto’s actual use cases

Why This Matters

Crypto is either the future of finance or a trillion-dollar speculative bubble. Your job as a data scientist: test the claims empirically, not ideologically. This lab shows you how.

2 Learning Objectives

By the end of this lab, you will be able to:

Access cryptocurrency market data using public APIs
Calculate and compare volatility across crypto and traditional assets
Analyze return distributions and identify tail risk
Measure correlation patterns (within-crypto and cross-asset)
Test market efficiency using autocorrelation and arbitrage analysis
Visualize price dynamics and microstructure features
Evaluate crypto financial inclusion claims empirically

3 Setup and Dependencies

Show code

# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# For reading data files
try:
    import requests  # For downloading from GitHub
except ImportError:
    print("Installing requests...")
    !pip install -q requests
    import requests

# Note: openpyxl only needed if reading Bloomberg Excel directly
# We're using CSV from GitHub, so not required for students

# For statistical tests
try:
    import statsmodels.api as sm
    from statsmodels.tsa.stattools import adfuller, acf
except ImportError:
    print("Installing statsmodels...")
    !pip install -q statsmodels

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("✓ Setup complete - ready for crypto market analysis")

4 Exercise 1: Accessing Cryptocurrency Market Data

4.1 Understanding Crypto Data Sources

Unlike traditional finance where Bloomberg terminals and licensed data vendors dominate, cryptocurrency data comes from public APIs provided by exchanges and aggregators. This democratizes access: you can get the same data professionals use: but also creates challenges around data quality, fragmentation, and standardization.

Key data sources:

Aggregators: CoinGecko, CoinMarketCap (volume-weighted prices across exchanges)
Exchanges: Coinbase Pro, Binance, Kraken (order books, trade data, official prices)
Blockchain explorers: On-chain data (transaction volumes, addresses, mining)
Derivatives: CME, Deribit (futures, options implied volatility)

We’ll use real data from Bloomberg Terminal, downloaded via the Excel add-in. This provides institutional-quality pricing with proper corporate actions handling and validated sources.

Bloomberg Terminal Data

This lab uses data downloaded from Bloomberg Terminal (XBTUSD Curncy, ETHUSD Curncy, etc.). Bloomberg provides the most reliable crypto pricing for institutional use. If you don’t have Terminal access, alternatives include Yahoo Finance (free but less reliable) or CoinGecko Pro (paid API).

4.2 Loading Bloomberg Data from GitHub

Show code

def load_bloomberg_crypto(github_url='https://quinfer.github.io/financial-data-science/data/chapter07/crypto_bloomberg.csv'):
    """
    Load cryptocurrency data from Bloomberg Terminal (CSV format).
    
    This function loads data from GitHub Pages by default (works in Colab).
    Falls back to local file if GitHub Pages is unavailable.
    
    Parameters
    ----------
    github_url : str
        URL to Bloomberg crypto CSV on GitHub Pages
        
    Returns
    -------
    pd.DataFrame
        Bitcoin price data with date index
    """
    # Try GitHub Pages first (default for Colab/remote students)
    try:
        print("📥 Loading Bloomberg data from GitHub Pages...")
        df = pd.read_csv(github_url, parse_dates=['date'])
        df = df.set_index('date')
        print(f"✅ Loaded Bloomberg data: {len(df)} rows")
        print(f"   Source: GitHub Pages (quinfer.github.io/financial-data-science)")
        return df
    except Exception as e:
        print(f"⚠️  Could not load from GitHub Pages: {e}")
        pass
    
    # Fallback to data root (config) or repo data/
    for local_path in [data_root / "chapter07/crypto_bloomberg.csv", Path("data/chapter07/crypto_bloomberg.csv")]:
        try:
            df = pd.read_csv(local_path, parse_dates=['date'])
            df = df.set_index('date')
            print(f"✅ Loaded local Bloomberg data: {len(df)} rows")
            return df
        except (FileNotFoundError, OSError):
            continue
    return None

# Load Bitcoin data from Bloomberg Terminal
print("Loading cryptocurrency data from Bloomberg Terminal...")
btc_bloomberg = load_bloomberg_crypto()

if btc_bloomberg is not None:
    # Use real Bloomberg data
    btc_data = btc_bloomberg[['price']].copy()
    btc_data['volume'] = btc_bloomberg['volume'] if 'volume' in btc_bloomberg else None
    
    print(f"✅ Bitcoin (Bloomberg): {len(btc_data)} days of data")
    print(f"  Date range: {btc_data.index.min().date()} to {btc_data.index.max().date()}")
    print(f"  Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}")
    
    # Use last 2 years for analysis (to match typical lab scope)
    cutoff_date = btc_data.index.max() - pd.Timedelta(days=730)
    btc_data = btc_data[btc_data.index >= cutoff_date]
    print(f"  Using last 2 years: {len(btc_data)} days")
    
    # Create synthetic ETH and BNB for comparison (scaled from BTC)
    # Real multi-asset Bloomberg data would require separate Terminal queries
    eth_data = btc_data.copy()
    eth_data['price'] = btc_data['price'] * 0.05  # Roughly ETH/BTC ratio
    bnb_data = btc_data.copy()
    bnb_data['price'] = btc_data['price'] * 0.01  # Roughly BNB/BTC ratio
else:
    # Fallback to synthetic data if Bloomberg not available
    print("⚠️  Bloomberg data not available, using synthetic data...")
    dates = pd.date_range(end=pd.Timestamp.now(), periods=730, freq='D')
    btc_data = pd.DataFrame({
        'price': 30000 + np.cumsum(np.random.randn(730) * 500),
        'volume': np.random.rand(730) * 1e9
    }, index=dates)
    eth_data = btc_data.copy()
    eth_data['price'] = btc_data['price'] * 0.05
    bnb_data = btc_data.copy()
    bnb_data['price'] = btc_data['price'] * 0.01

if btc_data is not None:
    print(f"✓ Retrieved {len(btc_data)} days of Bitcoin data")
    print(f"  Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}")
    print("\nSample data:")
    print(btc_data.head())

4.3 Visualizing Price Trends

Show code

# Create comprehensive price visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin price
axes[0, 0].plot(btc_data.index, btc_data['price'], color='orange', linewidth=2)
axes[0, 0].set_title('Bitcoin Price (USD)', fontsize=13, fontweight='bold')
axes[0, 0].set_ylabel('Price ($)')
axes[0, 0].grid(alpha=0.3)
axes[0, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

# Ethereum price
axes[0, 1].plot(eth_data.index, eth_data['price'], color='blue', linewidth=2)
axes[0, 1].set_title('Ethereum Price (USD)', fontsize=13, fontweight='bold')
axes[0, 1].set_ylabel('Price ($)')
axes[0, 1].grid(alpha=0.3)

# Trading volumes
axes[1, 0].plot(btc_data.index, btc_data['volume'], color='green', alpha=0.7, linewidth=1.5)
axes[1, 0].set_title('Bitcoin Trading Volume', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Volume ($)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].grid(alpha=0.3)
axes[1, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1e9:.1f}B'))

# Price comparison (normalized to 100)
btc_norm = 100 * btc_data['price'] / btc_data['price'].iloc[0]
eth_norm = 100 * eth_data['price'] / eth_data['price'].iloc[0]
bnb_norm = 100 * bnb_data['price'] / bnb_data['price'].iloc[0]

axes[1, 1].plot(btc_norm.index, btc_norm, label='Bitcoin', color='orange', linewidth=2)
axes[1, 1].plot(eth_norm.index, eth_norm, label='Ethereum', color='blue', linewidth=2)
axes[1, 1].plot(bnb_norm.index, bnb_norm, label='BNB', color='gold', linewidth=2)
axes[1, 1].set_title('Comparative Performance (Base = 100)', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Index Value')
axes[1, 1].set_xlabel('Date')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS (2-year period)")
print("="*60)
for name, data in [('Bitcoin', btc_data), ('Ethereum', eth_data), ('BNB', bnb_data)]:
    total_return = (data['price'].iloc[-1] / data['price'].iloc[0] - 1) * 100
    max_price = data['price'].max()
    min_price = data['price'].min()
    drawdown = ((data['price'] / data['price'].cummax()) - 1).min() * 100
    
    print(f"\n{name}:")
    print(f"  Total Return: {total_return:+.1f}%")
    print(f"  Price Range: ${min_price:,.0f} - ${max_price:,.0f}")
    print(f"  Max Drawdown: {drawdown:.1f}%")

4.4 Reflection Questions (Exercise 1)

Write 200-250 words addressing:

Data Quality: What challenges might arise from using free aggregator APIs versus licensed data feeds? How might wash trading on some exchanges affect aggregate data quality?
Price Fragmentation: CoinGecko aggregates prices across exchanges. Why might Bitcoin trade at different prices simultaneously on different venues? What arbitrage mechanisms should eliminate these spreads?
Volume Interpretation: How should we interpret trading volume data knowing that significant portion might be wash trading? What alternative metrics could measure genuine market activity?

5 Exercise 2: Volatility and Risk Analysis

5.1 Calculating Returns and Volatility

Show code

# Calculate log returns
btc_data['returns'] = np.log(btc_data['price'] / btc_data['price'].shift(1))
eth_data['returns'] = np.log(eth_data['price'] / eth_data['price'].shift(1))
bnb_data['returns'] = np.log(bnb_data['price'] / bnb_data['price'].shift(1))

# Remove NaN values
btc_returns = btc_data['returns'].dropna()
eth_returns = eth_data['returns'].dropna()
bnb_returns = bnb_data['returns'].dropna()

# Calculate volatility metrics
def calculate_volatility_metrics(returns, name):
    """
    Calculate comprehensive volatility statistics for cryptocurrency returns.
    
    Computes key risk metrics used by portfolio managers: realized volatility,
    tail risk measures (VaR), and distribution shape (skewness, kurtosis).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns (log or simple returns)
    name : str
        Asset name for display in output
        
    Returns
    -------
    dict
        Dictionary with keys:
        - 'daily_vol' : float, daily standard deviation
        - 'annual_vol' : float, annualized standard deviation (daily * sqrt(365))
        - 'rolling_vol' : pd.Series, 30-day rolling volatility
        - 'skew' : float, skewness (negative = left tail)
        - 'kurt' : float, excess kurtosis (> 0 = fat tails)
        - 'var_95' : float, 5th percentile return (1-day VaR at 95%)
        - 'var_99' : float, 1st percentile return (1-day VaR at 99%)
        
    Notes
    -----
    - Annualization assumes 365 trading days (crypto markets trade 24/7)
    - Traditional equity markets use 252 trading days
    - VaR is historical (empirical percentiles), not parametric (Gaussian assumption)
    - Fat tails (kurtosis > 3) mean VaR underestimates extreme losses
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> metrics = calculate_volatility_metrics(btc_returns, 'Bitcoin')
    >>> metrics['annual_vol']
    0.65  # 65% annualized volatility (vs. S&P 500 ~15%)
    """
    daily_vol = returns.std()
    annual_vol = daily_vol * np.sqrt(365)
    
    # Rolling volatility (30-day window)
    rolling_vol = returns.rolling(30).std() * np.sqrt(365)
    
    # Skewness and kurtosis
    skew = stats.skew(returns.dropna())
    kurt = stats.kurtosis(returns.dropna())
    
    # Value at Risk (95% and 99%)
    var_95 = np.percentile(returns.dropna(), 5)
    var_99 = np.percentile(returns.dropna(), 1)
    
    print(f"\n{name} Volatility Metrics:")
    print(f"  Daily Volatility: {daily_vol*100:.2f}%")
    print(f"  Annualized Volatility: {annual_vol*100:.1f}%")
    print(f"  Skewness: {skew:.3f} {'(negative tail)' if skew < 0 else '(positive tail)'}")
    print(f"  Kurtosis: {kurt:.3f} {'(fat tails)' if kurt > 3 else '(thin tails)'}")
    print(f"  VaR (95%): {var_95*100:.2f}% (1-day)")
    print(f"  VaR (99%): {var_99*100:.2f}% (1-day)")
    
    return {
        'daily_vol': daily_vol,
        'annual_vol': annual_vol,
        'rolling_vol': rolling_vol,
        'skew': skew,
        'kurt': kurt,
        'var_95': var_95,
        'var_99': var_99
    }

print("="*70)
print("VOLATILITY ANALYSIS")
print("="*70)

btc_vol = calculate_volatility_metrics(btc_returns, "Bitcoin")
eth_vol = calculate_volatility_metrics(eth_returns, "Ethereum")
bnb_vol = calculate_volatility_metrics(bnb_returns, "BNB")

# Compare to traditional assets (typical values for reference)
print("\n" + "-"*70)
print("COMPARISON TO TRADITIONAL ASSETS (typical values):")
print("-"*70)
print("S&P 500:      Annual Vol ~15-20%, Skew ~-0.5, Kurtosis ~5-8")
print("Gold:         Annual Vol ~15-18%, Skew ~0.2, Kurtosis ~3-5")
print("Treasury Bonds: Annual Vol ~5-8%, Skew ~0.0, Kurtosis ~3-4")
print("\nCryptocurrency volatility is 3-5x higher than traditional assets!")

5.2 Visualizing Return Distributions

Show code

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Bitcoin return distribution
axes[0, 0].hist(btc_returns * 100, bins=50, alpha=0.7, color='orange', edgecolor='black')
axes[0, 0].axvline(btc_returns.mean() * 100, color='red', linestyle='--', linewidth=2, label=f'Mean: {btc_returns.mean()*100:.2f}%')
axes[0, 0].set_title('Bitcoin Daily Returns Distribution', fontsize=13, fontweight='bold')
axes[0, 0].set_xlabel('Daily Return (%)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# QQ plot for normality test
stats.probplot(btc_returns.dropna(), dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot: Bitcoin Returns vs Normal Distribution', fontsize=13, fontweight='bold')
axes[0, 1].grid(alpha=0.3)

# Rolling volatility
axes[1, 0].plot(btc_vol['rolling_vol'].index, btc_vol['rolling_vol'] * 100, 
                color='purple', linewidth=2, label='BTC Rolling Vol (30d)')
axes[1, 0].plot(eth_vol['rolling_vol'].index, eth_vol['rolling_vol'] * 100, 
                color='blue', linewidth=2, alpha=0.7, label='ETH Rolling Vol (30d)')
axes[1, 0].set_title('Rolling Volatility (30-day window)', fontsize=13, fontweight='bold')
axes[1, 0].set_ylabel('Annualized Volatility (%)')
axes[1, 0].set_xlabel('Date')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Volatility comparison bar chart
vol_comparison = pd.DataFrame({
    'Bitcoin': [btc_vol['annual_vol'] * 100],
    'Ethereum': [eth_vol['annual_vol'] * 100],
    'BNB': [bnb_vol['annual_vol'] * 100],
    'S&P 500': [17.5],  # Typical value
    'Gold': [16.5]  # Typical value
})

vol_comparison.T.plot(kind='bar', ax=axes[1, 1], legend=False, color=['orange', 'blue', 'gold', 'green', 'brown'])
axes[1, 1].set_title('Annualized Volatility Comparison', fontsize=13, fontweight='bold')
axes[1, 1].set_ylabel('Volatility (%)')
axes[1, 1].set_xlabel('Asset')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha='right')
axes[1, 1].axhline(y=20, color='red', linestyle='--', alpha=0.5, label='20% threshold')
axes[1, 1].grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Statistical tests for normality
print("\n" + "="*70)
print("NORMALITY TESTS")
print("="*70)

for name, returns in [('Bitcoin', btc_returns), ('Ethereum', eth_returns)]:
    # Jarque-Bera test
    jb_stat, jb_pval = stats.jarque_bera(returns.dropna())
    
    # Shapiro-Wilk test (sample if too large)
    sample_returns = returns.dropna().sample(min(5000, len(returns)))
    sw_stat, sw_pval = stats.shapiro(sample_returns)
    
    print(f"\n{name}:")
    print(f"  Jarque-Bera test: statistic={jb_stat:.2f}, p-value={jb_pval:.4f}")
    print(f"    {'Reject normality' if jb_pval < 0.05 else 'Cannot reject normality'} (α=0.05)")
    print(f"  Shapiro-Wilk test: statistic={sw_stat:.4f}, p-value={sw_pval:.4f}")
    print(f"    {'Reject normality' if sw_pval < 0.05 else 'Cannot reject normality'} (α=0.05)")

print("\n💡 Returns exhibit fat tails and deviate significantly from normal distribution!")

5.3 Correlation Analysis

Show code

# Combine returns into single DataFrame
returns_df = pd.DataFrame({
    'BTC': btc_returns,
    'ETH': eth_returns,
    'BNB': bnb_returns
}).dropna()

# Calculate correlation matrix
corr_matrix = returns_df.corr()

# Visualize correlations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Heatmap
sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='RdYlGn', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=axes[0])
axes[0].set_title('Cryptocurrency Correlation Matrix', fontsize=13, fontweight='bold')

# Scatter plot: BTC vs ETH
axes[1].scatter(returns_df['BTC'] * 100, returns_df['ETH'] * 100, alpha=0.5, s=20)
axes[1].set_xlabel('Bitcoin Daily Return (%)')
axes[1].set_ylabel('Ethereum Daily Return (%)')
axes[1].set_title(f'BTC-ETH Correlation: {corr_matrix.loc["BTC", "ETH"]:.3f}', 
                  fontsize=13, fontweight='bold')
axes[1].axhline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].axvline(0, color='black', linewidth=0.5, alpha=0.3)
axes[1].grid(alpha=0.3)

# Add regression line
z = np.polyfit(returns_df['BTC'], returns_df['ETH'], 1)
p = np.poly1d(z)
axes[1].plot(returns_df['BTC'] * 100, p(returns_df['BTC']) * 100, 
             "r--", alpha=0.8, linewidth=2, label=f'Regression line')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("CORRELATION ANALYSIS")
print("="*70)
print("\nWithin-Crypto Correlations:")
print(corr_matrix)
print("\n💡 High correlations (0.5-0.8) limit diversification within cryptocurrency portfolios")

# Rolling correlation
rolling_corr_btc_eth = returns_df['BTC'].rolling(90).corr(returns_df['ETH'])

plt.figure(figsize=(12, 5))
plt.plot(rolling_corr_btc_eth.index, rolling_corr_btc_eth, linewidth=2, color='purple')
plt.axhline(y=rolling_corr_btc_eth.mean(), color='red', linestyle='--', 
            label=f'Mean: {rolling_corr_btc_eth.mean():.3f}')
plt.title('Rolling Correlation: Bitcoin vs Ethereum (90-day window)', fontsize=13, fontweight='bold')
plt.ylabel('Correlation')
plt.xlabel('Date')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nRolling BTC-ETH correlation: Mean={rolling_corr_btc_eth.mean():.3f}, "
      f"Std={rolling_corr_btc_eth.std():.3f}")
print("Note: Correlation increases during volatile periods (contagion effect)")

5.4 Reflection Questions (Exercise 2)

Write 250-300 words addressing:

Volatility Implications: Bitcoin’s 60-80% annualized volatility is 3-4x higher than equities. What does this mean for: (a) using Bitcoin as currency (purchasing power stability)? (b) portfolio allocation (risk contribution)? (c) options pricing and risk management?
Fat Tails and Risk Models: The Q-Q plot shows Bitcoin returns deviate from normality with fat tails. Why do standard risk models (VaR assuming normal distribution) underestimate tail risk? What practical consequences does this have?
Correlation Patterns: Cryptocurrencies show high correlation with each other (0.5-0.8) but time-varying correlation with equities. What does this mean for diversification benefits within crypto portfolios versus across asset classes?

6 Exercise 3: Market Efficiency Testing

6.1 Autocorrelation Analysis

Show code

# Test for autocorrelation (do past returns predict future returns?)
def test_autocorrelation(returns, name, max_lag=20):
    """
    Test for serial correlation in returns (market efficiency diagnostic).
    
    Autocorrelation measures whether past returns predict future returns. If
    significant autocorrelation exists, markets are inefficient (predictable).
    
    Parameters
    ----------
    returns : pd.Series
        Daily returns series
    name : str
        Asset name for display
    max_lag : int, default=20
        Maximum lag to test (20 days = ~1 month)
        
    Returns
    -------
    None
        Prints test results and displays ACF plot
        
    Notes
    -----
    **Ljung-Box Test:**
    - Null hypothesis: No autocorrelation up to lag k
    - p-value < 0.05 → Reject H0 → Significant autocorrelation (inefficiency)
    
    **Efficient Market Hypothesis (weak form):**
    - If markets are efficient, past prices shouldn't predict future prices
    - ACF should be ~0 at all lags (within 95% confidence bands)
    - Crypto often shows significant autocorrelation (inefficient)
    
    **Why Crypto Markets Are Inefficient:**
    - Fragmented liquidity across hundreds of exchanges
    - High transaction costs (gas fees, spreads)
    - Retail-dominated (fewer arbitrageurs)
    - 24/7 trading → slower price discovery
    
    Examples
    --------
    >>> btc_returns = btc_data['price'].pct_change()
    >>> test_autocorrelation(btc_returns, 'Bitcoin', max_lag=20)
    Bitcoin Autocorrelation Analysis:
      Lag-1 Autocorrelation: 0.0234
      Ljung-Box p-value (lag 10): 0.0012  # Reject H0 → Inefficient!
    """
    
    # Calculate autocorrelation function
    acf_values = acf(returns.dropna(), nlags=max_lag, fft=False)
    
    # Ljung-Box test for joint significance
    from statsmodels.stats.diagnostic import acorr_ljungbox
    lb_test = acorr_ljungbox(returns.dropna(), lags=[5, 10, 20], return_df=True)
    
    print(f"\n{name} Autocorrelation Analysis:")
    print(f"  Lag-1 Autocorrelation: {acf_values[1]:.4f}")
    print(f"  Lag-5 Autocorrelation: {acf_values[5]:.4f}")
    print("\nLjung-Box Test (joint significance):")
    print(lb_test)
    
    # Plot ACF
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.stem(range(len(acf_values)), acf_values, basefmt=" ")
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.axhline(y=1.96/np.sqrt(len(returns)), color='red', linestyle='--', label='95% CI')
    ax.axhline(y=-1.96/np.sqrt(len(returns)), color='red', linestyle='--')
    ax.set_title(f'{name} Autocorrelation Function', fontsize=13, fontweight='bold')
    ax.set_xlabel('Lag (days)')
    ax.set_ylabel('Autocorrelation')
    ax.legend()
    ax.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    return acf_values

print("="*70)
print("MARKET EFFICIENCY: AUTOCORRELATION TESTS")
print("="*70)

btc_acf = test_autocorrelation(btc_returns, "Bitcoin")
eth_acf = test_autocorrelation(eth_returns, "Ethereum")

print("\n💡 Interpretation: Significant autocorrelation suggests predictability (market inefficiency)")
print("   Small correlations may not be economically significant after transaction costs")

6.2 Momentum Strategy Backtest

Show code

# Simple momentum strategy: buy if price > 50-day MA, sell otherwise
def momentum_strategy(prices, short_window=10, long_window=50):
    """
    Backtest simple moving average crossover momentum strategy.
    
    Classic technical analysis strategy: buy when short MA crosses above long MA,
    sell when it crosses below. Tests whether momentum exists in crypto markets.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    short_window : int, default=10
        Short moving average window (days)
    long_window : int, default=50
        Long moving average window (days)
        
    Returns
    -------
    pd.DataFrame
        Columns:
        - 'price' : original prices
        - 'MA_short' : short-window moving average
        - 'MA_long' : long-window moving average
        - 'signal' : trading position (+1 = long, -1 = short, 0 = no position)
        - 'returns' : buy-and-hold returns
        - 'strategy_returns' : strategy returns (position × market return)
        - 'cum_returns' : cumulative buy-and-hold
        - 'cum_strategy' : cumulative strategy performance
        
    Notes
    -----
    **Strategy Logic:**
    - Golden Cross: Short MA > Long MA → Buy signal
    - Death Cross: Short MA < Long MA → Sell signal
    
    **Reality Check:**
    - This is a **naive backtest** (ignores transaction costs, slippage, fees)
    - Crypto trading fees ~0.1-0.5% per trade → eats into profits
    - No position sizing, risk management, or stop-losses
    - Past performance ≠ future returns (overfitting risk)
    
    **Academic Evidence:**
    - Momentum works in equities (Jegadeesh & Titman 1993, JF)
    - Crypto momentum: mixed evidence, high volatility dominates
    - Transaction costs often exceed strategy alpha
    
    Examples
    --------
    >>> btc_momentum = momentum_strategy(btc_data['price'])
    >>> strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
    >>> print(f"Strategy return: {strategy_return:.1f}%")
    Strategy return: -5.2%  # Often underperforms buy-and-hold after costs
    """
    df = pd.DataFrame({'price': prices})
    
    # Calculate moving averages
    df['MA_short'] = df['price'].rolling(short_window).mean()
    df['MA_long'] = df['price'].rolling(long_window).mean()
    
    # Generate signals
    df['signal'] = 0
    df.loc[df['MA_short'] > df['MA_long'], 'signal'] = 1  # Buy signal
    df.loc[df['MA_short'] < df['MA_long'], 'signal'] = -1  # Sell signal
    
    # Calculate returns
    df['returns'] = df['price'].pct_change()
    df['strategy_returns'] = df['signal'].shift(1) * df['returns']
    
    # Cumulative returns
    df['cum_returns'] = (1 + df['returns']).cumprod()
    df['cum_strategy'] = (1 + df['strategy_returns']).cumprod()
    
    return df

# Run momentum strategy
btc_momentum = momentum_strategy(btc_data['price'])

# Calculate performance metrics
total_return = (btc_momentum['cum_returns'].iloc[-1] - 1) * 100
strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100
excess_return = strategy_return - total_return

print("\n" + "="*70)
print("MOMENTUM STRATEGY BACKTEST")
print("="*70)
print(f"\nBuy-and-Hold Return: {total_return:.2f}%")
print(f"Strategy Return: {strategy_return:.2f}%")
print(f"Excess Return: {excess_return:+.2f}%")

# Visualize strategy performance
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Price and moving averages
axes[0].plot(btc_momentum.index, btc_momentum['price'], label='Bitcoin Price', color='orange', linewidth=2)
axes[0].plot(btc_momentum.index, btc_momentum['MA_short'], label=f'{10}D MA', color='blue', linewidth=1.5)
axes[0].plot(btc_momentum.index, btc_momentum['MA_long'], label=f'{50}D MA', color='red', linewidth=1.5)
axes[0].set_title('Bitcoin Price with Moving Averages', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Cumulative returns comparison
axes[1].plot(btc_momentum.index, btc_momentum['cum_returns'], 
             label='Buy and Hold', color='gray', linewidth=2)
axes[1].plot(btc_momentum.index, btc_momentum['cum_strategy'], 
             label='Momentum Strategy', color='green', linewidth=2)
axes[1].set_title('Cumulative Returns: Strategy vs Buy-and-Hold', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Cumulative Return (Base = 1)')
axes[1].set_xlabel('Date')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n⚠️  Note: This backtest ignores transaction costs, slippage, and taxes")
print("    Real-world implementation would have lower returns")

6.3 Mean Reversion Test

Show code

# Augmented Dickey-Fuller test for stationarity/mean reversion
def test_mean_reversion(prices, name):
    """
    Test for mean reversion using Augmented Dickey-Fuller (ADF) test.
    
    Tests whether prices follow a random walk (unit root) or revert to a mean
    (stationary). Critical for pairs trading and mean-reversion strategies.
    
    Parameters
    ----------
    prices : pd.Series
        Daily closing prices
    name : str
        Asset name for display
        
    Returns
    -------
    tuple
        ADF test results: (statistic, p-value, lags_used, nobs, critical_values, icbest)
        
    Notes
    -----
    **Augmented Dickey-Fuller Test:**
    - Null hypothesis (H0): Series has unit root (random walk, NOT mean-reverting)
    - Alternative (H1): Series is stationary (mean-reverting)
    - p-value < 0.05 → Reject H0 → Prices are stationary (mean-reverting)
    - p-value > 0.05 → Cannot reject H0 → Random walk (efficient market)
    
    **Implications for Trading:**
    - **Random walk** (efficient): Momentum strategies may work, mean-reversion won't
    - **Mean-reverting** (inefficient): Pairs trading, statistical arbitrage possible
    
    **Why This Matters:**
    - Most financial time series have unit roots (Campbell, Lo, MacKinlay 1997)
    - Crypto markets often show mixed evidence (regime-dependent)
    - Low p-values may be spurious (structural breaks, volatility clustering)
    
    **Technical Details:**
    - Test performed on log prices (handles exponential growth)
    - Regression includes constant term ('c') but not trend
    - AIC criterion selects optimal lag length
    
    Examples
    --------
    >>> btc_adf = test_mean_reversion(btc_data['price'], 'Bitcoin')
    Bitcoin - Augmented Dickey-Fuller Test:
      ADF Statistic: -1.234
      P-value: 0.658  # Cannot reject unit root → Random walk
    """
    
    # ADF test on log prices
    log_prices = np.log(prices)
    adf_result = adfuller(log_prices.dropna(), maxlag=20, regression='c', autolag='AIC')
    
    print(f"\n{name} - Augmented Dickey-Fuller Test:")
    print(f"  ADF Statistic: {adf_result[0]:.4f}")
    print(f"  P-value: {adf_result[1]:.4f}")
    print(f"  Critical Values:")
    for key, value in adf_result[4].items():
        print(f"    {key}: {value:.4f}")
    
    if adf_result[1] < 0.05:
        print(f"  ✓ Reject unit root (prices are stationary/mean-reverting)")
    else:
        print(f"  ✗ Cannot reject unit root (prices have unit root/random walk)")
    
    return adf_result

print("\n" + "="*70)
print("MEAN REVERSION TESTS")
print("="*70)

btc_adf = test_mean_reversion(btc_data['price'], "Bitcoin")
eth_adf = test_mean_reversion(eth_data['price'], "Ethereum")

print("\n💡 If prices follow random walk, past prices don't predict future prices")
print("   This supports weak-form market efficiency")

6.4 Reflection Questions (Exercise 3)

Write 200-250 words addressing:

Efficiency Interpretation: What do your autocorrelation and momentum results suggest about Bitcoin market efficiency? Can small autocorrelations or strategy profits coexist with efficient markets?
Transaction Costs Matter: The momentum strategy showed [profit/loss] before transaction costs. Cryptocurrency trading costs 0.1-0.5% per trade. Would your strategy be profitable after accounting for costs? Show rough calculations.
Limits to Arbitrage: Even if inefficiencies exist (predictable patterns), what practical barriers prevent traders from exploiting them and eliminating the patterns?

7 Exercise 4: GARCH Volatility Modeling & Structural Breaks

Learning Objectives: - Apply GARCH models to cryptocurrency returns (Week 3, §3.4 theory) - Test for volatility clustering and asymmetric effects - Evaluate volatility forecasting accuracy (Mincer-Zarnowitz regression) - Detect structural breaks and regime shifts in volatility

Connection to Ch 03: Volatility Modelling & Ch 07: Cryptocurrency

This exercise applies Week 3 GARCH theory to Bitcoin data. You’ll estimate time-varying volatility, test for asymmetric effects (leverage), forecast volatility, and test whether high GARCH persistence is real or an artifact of regime shifts.

Key statistical concepts: Volatility clustering, fat tails, leverage effect, out-of-sample validation, structural breaks.

7.1 Part A: Statistical Tests for Volatility Properties

Before fitting GARCH, test for its key assumptions:

Show code

from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import jarque_bera

# Bitcoin returns (from Exercise 2)
returns = btc_data['return'].dropna()

# Test 1: Ljung-Box test on squared returns (volatility clustering)
lb_test = acorr_ljungbox(returns**2, lags=[10], return_df=True)
lb_stat = lb_test['lb_stat'].values[0]
lb_pval = lb_test['lb_pvalue'].values[0]

# Test 2: Jarque-Bera test (normality)
jb_stat, jb_pval = jarque_bera(returns)

print("=" * 70)
print("STATISTICAL TESTS FOR GARCH ASSUMPTIONS")
print("=" * 70)

print("\n1. Ljung-Box Test (Volatility Clustering)")
print(f"   H₀: No autocorrelation in squared returns")
print(f"   Statistic: {lb_stat:.2f} | p-value: {lb_pval:.4f}")
if lb_pval < 0.05:
    print(f"   ✓ REJECT H₀ → Significant volatility clustering (GARCH warranted)")
else:
    print(f"   ✗ Cannot reject H₀ → No volatility clustering")

print("\n2. Jarque-Bera Test (Normality)")
print(f"   H₀: Returns are normally distributed")
print(f"   Statistic: {jb_stat:.2f} | p-value: {jb_pval:.6f}")
if jb_pval < 0.05:
    print(f"   ✓ REJECT H₀ → Non-normal distribution (fat tails present)")
else:
    print(f"   ✗ Cannot reject H₀ → Normal distribution")

print("\n💡 Both tests should reject H₀ for cryptocurrency data")
print("   This justifies using GARCH with Student's t distribution")

Interpretation: Bitcoin should show p < 0.001 for both tests: strong volatility clustering and fat tails.

7.2 Part B: GARCH(1,1) and GJR-GARCH Estimation

Fit symmetric GARCH(1,1) and asymmetric GJR-GARCH:

Show code

from arch import arch_model

# Convert returns to percentage for numerical stability
returns_pct = returns * 100

# Model 1: GARCH(1,1) with Student's t distribution (fat tails)
model_garch = arch_model(returns_pct, vol='GARCH', p=1, q=1, dist='t')
garch_fit = model_garch.fit(disp='off')

# Model 2: GJR-GARCH (asymmetric, captures leverage effect)
model_gjr = arch_model(returns_pct, vol='GARCH', p=1, o=1, q=1, dist='t')
gjr_fit = model_gjr.fit(disp='off')

# Extract parameters
print("\n" + "=" * 70)
print("GARCH MODEL ESTIMATION RESULTS")
print("=" * 70)

print("\n1. GARCH(1,1) with Student's t:")
print(f"   ω (baseline):     {garch_fit.params['omega']:>8.4f}")
print(f"   α (news impact):  {garch_fit.params['alpha[1]']:>8.4f}")
print(f"   β (persistence):  {garch_fit.params['beta[1]']:>8.4f}")
print(f"   α + β:            {garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']:>8.4f}")
print(f"   df (tail):        {garch_fit.params['nu']:>8.2f} (normal = ∞)")
print(f"   AIC:              {garch_fit.aic:>8.2f}")

print("\n2. GJR-GARCH (asymmetric):")
print(f"   ω (baseline):     {gjr_fit.params['omega']:>8.4f}")
print(f"   α (positive):     {gjr_fit.params['alpha[1]']:>8.4f}")
print(f"   γ (asymmetry):    {gjr_fit.params['gamma[1]']:>8.4f}")
print(f"   β (persistence):  {gjr_fit.params['beta[1]']:>8.4f}")
print(f"   α + γ (negative): {gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']:>8.4f}")
print(f"   df (tail):        {gjr_fit.params['nu']:>8.2f}")
print(f"   AIC:              {gjr_fit.aic:>8.2f}")

# Model comparison
if gjr_fit.aic < garch_fit.aic:
    improvement = garch_fit.aic - gjr_fit.aic
    print(f"\n✓ GJR-GARCH preferred (AIC lower by {improvement:.2f})")
    print(f"   Negative shocks increase volatility {(gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']) / gjr_fit.params['alpha[1]']:.2f}× more")
else:
    print(f"\n  GARCH(1,1) preferred (symmetric effects)")

# Visualize conditional volatility
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Panel 1: Returns with ±1σ GARCH bands
conditional_vol_garch = garch_fit.conditional_volatility
axes[0].plot(returns_pct.index, returns_pct, linewidth=0.5, alpha=0.7, label='Returns', color='blue')
axes[0].fill_between(returns_pct.index, -conditional_vol_garch, conditional_vol_garch,
                      alpha=0.2, color='red', label='±1σ (GARCH volatility)')
axes[0].set_ylabel('Return (%)', fontsize=11)
axes[0].set_title('Bitcoin Returns with GARCH(1,1) Conditional Volatility', fontsize=13)
axes[0].legend(fontsize=10)
axes[0].grid(alpha=0.3)

# Panel 2: GARCH vs GJR volatility over time
conditional_vol_gjr = gjr_fit.conditional_volatility
axes[1].plot(conditional_vol_garch.index, conditional_vol_garch, linewidth=1.5, 
             color='blue', label='GARCH(1,1)', alpha=0.7)
axes[1].plot(conditional_vol_gjr.index, conditional_vol_gjr, linewidth=1.5,
             color='red', label='GJR-GARCH', alpha=0.7)
axes[1].set_xlabel('Date', fontsize=11)
axes[1].set_ylabel('Conditional Volatility (%)', fontsize=11)
axes[1].set_title('Time-Varying Volatility: GARCH vs GJR-GARCH', fontsize=13)
axes[1].legend(fontsize=10)
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 GARCH captures volatility spikes (2018 crash, 2020 COVID, 2021 bull run)")
print("   GJR-GARCH shows asymmetry: bad news increases volatility more")

Interpreting GARCH Parameters

Persistence (α + β): - ~0.95-0.99: High persistence: volatility shocks decay slowly (typical for crypto) - Half-life = ln(0.5) / ln(α + β). If α+β=0.98, half-life ≈ 35 days

Asymmetry (γ in GJR): - γ > 0: Negative shocks (bad news) increase volatility more than positive shocks - Leverage effect: -5% drop increases vol more than +5% rally (typical for all assets)

Degrees of freedom (df): - df < 10: Very fat tails (extreme events common) - df → ∞: Normal distribution (no fat tails)

7.3 Part C: News Impact Curves (Asymmetry Visualization)

Visualize how shocks of different sizes/signs affect volatility:

Show code

# Generate news impact curves
shocks = np.linspace(-10, 10, 200)  # -10% to +10% returns

# GARCH(1,1) impact (symmetric)
alpha_garch = garch_fit.params['alpha[1]']
impact_garch = alpha_garch * shocks**2

# GJR-GARCH impact (asymmetric)
alpha_gjr = gjr_fit.params['alpha[1]']
gamma_gjr = gjr_fit.params['gamma[1]']
impact_gjr = alpha_gjr * shocks**2 + gamma_gjr * (shocks < 0) * shocks**2

# Plot
plt.figure(figsize=(12, 7))
plt.plot(shocks, impact_garch, linewidth=2.5, linestyle='--', color='blue', label='GARCH (symmetric)', alpha=0.8)
plt.plot(shocks, impact_gjr, linewidth=2.5, color='red', label='GJR-GARCH (asymmetric)')

# Highlight key points
plt.axvline(0, color='black', linestyle=':', linewidth=1.5, alpha=0.5)
plt.axvline(-5, color='red', linestyle=':', linewidth=1, alpha=0.5, label='Example: -5% shock')
plt.axvline(+5, color='green', linestyle=':', linewidth=1, alpha=0.5, label='Example: +5% shock')

# Annotate asymmetry
neg5_impact = alpha_gjr * 25 + gamma_gjr * 25
pos5_impact = alpha_gjr * 25
plt.scatter([-5, 5], [neg5_impact, pos5_impact], s=150, c=['red', 'green'], 
            edgecolors='black', zorder=5, alpha=0.8)
plt.text(-5, neg5_impact + 0.5, f'Impact: {neg5_impact:.2f}', ha='center', fontsize=10, fontweight='bold')
plt.text(5, pos5_impact + 0.5, f'Impact: {pos5_impact:.2f}', ha='center', fontsize=10, fontweight='bold')

plt.xlabel('News Shock (% return)', fontsize=12)
plt.ylabel('Impact on Conditional Variance', fontsize=12)
plt.title('News Impact Curves: How Shocks Affect Volatility', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate asymmetry ratio
asymmetry_ratio = neg5_impact / pos5_impact
print(f"\nAsymmetry Ratio:")
print(f"  -5% shock impact / +5% shock impact = {asymmetry_ratio:.2f}")
print(f"  → Bad news increases volatility {asymmetry_ratio:.1f}× more than good news")

Interpretation: For Bitcoin, negative shocks typically increase volatility ~1.5-2× more than positive shocks.

7.4 Part D: Volatility Forecasting & Out-of-Sample Validation

Test if GARCH forecasts future volatility accurately (Mincer-Zarnowitz regression):

Show code

from scipy.stats import linregress

# Rolling-window forecast
forecast_horizon = 22  # days (1 month ahead)
train_size = 252 * 2   # 2 years training window

forecasts_garch = []
realized_vols = []

print("\n" + "=" * 70)
print("ROLLING-WINDOW VOLATILITY FORECASTING")
print("=" * 70)
print(f"Training window: {train_size} days | Forecast horizon: {forecast_horizon} days")

for start in range(train_size, len(returns_pct) - forecast_horizon, forecast_horizon):
    # Train GARCH on historical data
    train_data = returns_pct.iloc[start - train_size:start]
    model_train = arch_model(train_data, vol='GARCH', p=1, q=1, dist='t')
    fit_train = model_train.fit(disp='off')
    
    # Forecast next month's volatility
    forecast = fit_train.forecast(horizon=forecast_horizon)
    forecast_vol = np.sqrt(forecast.variance.values[-1, :].mean())  # Average over horizon
    
    # Realized volatility (actual)
    test_data = returns_pct.iloc[start:start + forecast_horizon]
    realized_vol = test_data.std()
    
    forecasts_garch.append(forecast_vol)
    realized_vols.append(realized_vol)

forecasts_garch = np.array(forecasts_garch)
realized_vols = np.array(realized_vols)

# Mincer-Zarnowitz regression: Realized = α + β × Forecast + ε
slope, intercept, r_value, p_value, std_err = linregress(forecasts_garch, realized_vols)

# Calculate RMSE
rmse = np.sqrt(((realized_vols - forecasts_garch)**2).mean())

print(f"\nNumber of forecasts: {len(forecasts_garch)}")
print(f"\nMincer-Zarnowitz Regression Results:")
print(f"  Intercept (α):  {intercept:>8.3f} (ideal = 0)")
print(f"  Slope (β):      {slope:>8.3f} (ideal = 1)")
print(f"  R²:             {r_value**2:>8.3f}")
print(f"  RMSE:           {rmse:>8.2f}%")

if abs(intercept) < 1 and abs(slope - 1) < 0.2:
    print(f"\n✓ Forecast is approximately unbiased (α≈0, β≈1)")
else:
    print(f"\n⚠️  Forecast shows bias (α≠0 or β≠1)")

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Panel 1: Mincer-Zarnowitz scatter
ax1.scatter(forecasts_garch, realized_vols, alpha=0.6, s=80, edgecolor='black', linewidth=0.5)
ax1.plot([forecasts_garch.min(), forecasts_garch.max()],
         [forecasts_garch.min(), forecasts_garch.max()],
         'r--', linewidth=2.5, label='Perfect forecast (45° line)')
ax1.plot(forecasts_garch, intercept + slope * forecasts_garch,
         'b-', linewidth=2.5, label=f'Fitted: y={intercept:.2f}+{slope:.2f}x (R²={r_value**2:.2f})')
ax1.set_xlabel('GARCH Forecast Volatility (%)', fontsize=12)
ax1.set_ylabel('Realized Volatility (%)', fontsize=12)
ax1.set_title('Mincer-Zarnowitz: Forecast Accuracy', fontsize=13)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3)

# Panel 2: Forecast errors over time
errors = realized_vols - forecasts_garch
ax2.plot(errors, linewidth=1.5, color='red', alpha=0.7)
ax2.axhline(0, color='black', linestyle='--', linewidth=1.5)
ax2.fill_between(range(len(errors)), 0, errors, alpha=0.3, color='red')
ax2.set_xlabel('Forecast Period', fontsize=12)
ax2.set_ylabel('Forecast Error (Realized - Forecast, %)', fontsize=12)
ax2.set_title('Forecast Errors Over Time', fontsize=13)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 GARCH forecasts Bitcoin volatility reasonably (R²~0.5-0.7)")
print("   But underestimates during extreme events (crashes, manias)")

Connection to Week 1, §0.6: Out-of-Sample Validation

Mincer-Zarnowitz regression tests forecast unbiasedness: - α = 0: No systematic over/under-prediction - β = 1: Forecast correctly captures volatility scale - High R²: Forecast explains realized volatility well

This is honest evaluation: train on past, test on future (no look-ahead bias).

7.5 Part E: Structural Breaks Detection

Test if high GARCH persistence (α+β ≈ 0.98) is real or artifact of regime shifts:

Show code

# Step 1: Visual regime identification (rolling volatility)
rolling_vol = returns_pct.rolling(window=30).std() * np.sqrt(252)

plt.figure(figsize=(14, 7))
plt.plot(rolling_vol.index, rolling_vol, linewidth=1.5, color='blue', label='30-day rolling volatility')

# Regime thresholds
calm_threshold = 30
turbulent_threshold = 60
plt.axhline(calm_threshold, color='green', linestyle='--', linewidth=2, alpha=0.7, label=f'Calm threshold ({calm_threshold}%)')
plt.axhline(turbulent_threshold, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Turbulent threshold ({turbulent_threshold}%)')
plt.fill_between(rolling_vol.index, 0, calm_threshold, alpha=0.1, color='green', label='Calm regime')
plt.fill_between(rolling_vol.index, turbulent_threshold, rolling_vol.max(), alpha=0.1, color='red', label='Turbulent regime')

plt.xlabel('Date', fontsize=12)
plt.ylabel('Annualized Volatility (%)', fontsize=12)
plt.title('Bitcoin Rolling Volatility: Regime Identification', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate regime statistics
calm_pct = (rolling_vol < calm_threshold).sum() / len(rolling_vol.dropna()) * 100
turbulent_pct = (rolling_vol > turbulent_threshold).sum() / len(rolling_vol.dropna()) * 100

print("\n" + "=" * 70)
print("REGIME IDENTIFICATION")
print("=" * 70)
print(f"\nRegime Statistics:")
print(f"  Calm regime (<{calm_threshold}% vol):      {calm_pct:>6.1f}% of days")
print(f"  Normal regime ({calm_threshold}-{turbulent_threshold}% vol):  {100 - calm_pct - turbulent_pct:>6.1f}% of days")
print(f"  Turbulent regime (>{turbulent_threshold}% vol): {turbulent_pct:>6.1f}% of days")

# Step 2: Sub-period GARCH comparison
mid_point = len(returns_pct) // 2
returns_first = returns_pct.iloc[:mid_point]
returns_second = returns_pct.iloc[mid_point:]

# Fit GARCH to each sub-period
model_first = arch_model(returns_first, vol='GARCH', p=1, q=1, dist='t')
garch_first = model_first.fit(disp='off')
persistence_first = garch_first.params['alpha[1]'] + garch_first.params['beta[1]']

model_second = arch_model(returns_second, vol='GARCH', p=1, q=1, dist='t')
garch_second = model_second.fit(disp='off')
persistence_second = garch_second.params['alpha[1]'] + garch_second.params['beta[1]']

# Full-sample persistence (from earlier)
persistence_full = garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']

print("\n" + "=" * 70)
print("STRUCTURAL BREAKS TEST: SUB-PERIOD GARCH PERSISTENCE")
print("=" * 70)
print(f"\nFull sample persistence:   α+β = {persistence_full:.4f}")
print(f"First half persistence:    α+β = {persistence_first:.4f}")
print(f"Second half persistence:   α+β = {persistence_second:.4f}")
print(f"Absolute difference:       Δ   = {abs(persistence_first - persistence_second):.4f}")

if abs(persistence_first - persistence_second) > 0.05:
    print(f"\n⚠️  LARGE difference suggests REGIME SHIFTS, not true persistence!")
    print(f"    Full-sample GARCH overestimates persistence by confusing regime changes with gradual decay.")
else:
    print(f"\n✓  Similar persistence suggests GARCH model is stable across periods.")

# Model comparison (AIC)
print(f"\nModel Comparison (AIC, lower = better):")
print(f"  Full-sample GARCH:         {garch_fit.aic:.2f}")
print(f"  Sub-periods total:         {garch_first.aic + garch_second.aic:.2f}")

if (garch_first.aic + garch_second.aic) < garch_fit.aic:
    improvement = garch_fit.aic - (garch_first.aic + garch_second.aic)
    print(f"\n✓ Sub-period models fit BETTER (AIC improvement: {improvement:.2f})")
    print(f"  → Evidence of structural breaks / regime shifts")
else:
    print(f"\n  Full-sample model fits as well (no strong evidence of breaks)")

print("\n💡 Key finding: If persistence differs across sub-periods,")
print("   full-sample GARCH is OVERESTIMATING true persistence!")

Implication: GARCH Persistence Partly Spurious

If sub-period persistence is significantly lower than full-sample (e.g., 0.93 vs 0.98), this suggests:

Regime shifts exist: Bitcoin alternates between calm and turbulent volatility states
Full-sample GARCH confuses regimes with persistence: Mistakes regime changes for gradual decay
Better models needed: Markov-switching GARCH, regime-dependent models, threshold models

Practical impact: Risk models using single-regime GARCH overestimate how long volatility shocks persist → wrong hedging ratios, wrong VaR forecasts.

See Ch 03, §3.5: Structural Breaks

7.6 Reflection Questions (Exercise 4)

Write 250-300 words addressing:

GARCH vs GJR: Did asymmetric GJR-GARCH fit better than symmetric GARCH? What does the asymmetry parameter (γ) tell you about Bitcoin’s volatility response to good vs bad news?
Forecast accuracy: How well did GARCH forecast future volatility (R² from Mincer-Zarnowitz)? When did forecasts fail most badly (look at error plot)?
Structural breaks: Did sub-period persistence differ from full-sample? If yes, what does this imply about using full-sample GARCH for risk management?
Practical implications: If you were designing a crypto risk model, would you use single-regime GARCH or a regime-switching model? Justify your choice.

8 Summary and Integration

8.1 What We’ve Learned

Through these exercises, you’ve:

Accessed real cryptocurrency market data using public APIs, experiencing data quality challenges and fragmentation
Quantified extreme volatility (60-80% annualized) that makes cryptocurrency unsuitable as currency and challenging as investment
Documented fat tail distributions that violate normal distribution assumptions and cause standard risk models to underestimate tail risk
Measured high correlations within crypto (0.5-0.8) limiting diversification benefits
Tested market efficiency finding mixed evidence: some weak predictability but likely not exploitable after costs
Evaluated inclusion claims implicitly through data analysis: if crypto were banking the unbanked, we’d see different adoption and usage patterns

8.2 Connections to Course Themes

Week 2 (APIs): Cryptocurrency data is openly accessible via APIs, democratizing financial data but creating standardization challenges
Week 3 (Platforms): Exchanges are platforms matching buyers/sellers; fragmentation creates arbitrage opportunities but liquidity challenges
Week 6 (Financial Inclusion): Mobile money (M-Pesa) showed rigorous welfare evidence; cryptocurrency shows speculative usage among wealthy
Week 8 (Blockchain): Next week explores blockchain technology and fraud detection more deeply

8.3 Critical Evaluation Framework

When evaluating cryptocurrency or any FinTech innovation:

Examine actual data (adoption, usage, outcomes) versus marketing claims
Measure risks quantitatively (volatility, correlations, tail risk)
Compare to alternatives (mobile money, traditional finance)
Demand welfare evidence (does it help intended beneficiaries?)
Account for barriers (technical, knowledge, economic)

8.4 Assessment Preparation

If your assessment involves a short research report or reflective analysis, this lab gives you two strong pathways:

Empirical analysis of crypto returns (momentum, volatility, correlations, tail risk)
Evidence-based evaluation of “crypto for inclusion” claims using data, mechanisms, and limitations

8.5 Further Exploration

If interested in extending your analysis:

Cross-asset correlations: Download S&P 500 or gold data; analyze Bitcoin-equity correlation dynamics
Volatility forecasting: Implement GARCH models to forecast future volatility
Arbitrage opportunities: Compare prices across multiple exchanges in real-time
DeFi analysis: Examine yield farming APYs, liquidity pool dynamics, or stablecoin deviations from peg
On-chain metrics: Analyze blockchain data (active addresses, transaction volumes) as predictors

Excellent work! You’ve completed rigorous empirical analysis of cryptocurrency markets, connecting data to theory and claims to evidence.

--- title: "Lab 7: Digital Asset Data Analysis" subtitle: "Market structure, volatility, and efficiency testing" format: html: toc: true number-sections: true execute: echo: true eval: false warning: false message: false --- ::: callout-note ### Expected time - Core lab: ≈ 75 minutes - Optional extensions: +30–60 minutes ::: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/quinfer/financial-data-science/blob/main/labs/notebooks/lab07_crypto.ipynb) ```{python} #| include: false from pathlib import Path import yaml try: with open(Path("config/data_root.yml")) as f: cfg = yaml.safe_load(f) data_root = Path(cfg.get("data_root", "data")).expanduser().resolve() except Exception: data_root = Path("data") ``` ## Before You Code: The Big Picture Cryptocurrencies promise **financial inclusion, decentralization, and censorship resistance**. But do they deliver? Let's test the claims empirically using market microstructure analysis. ::: {.callout-note} ## The Crypto Promise vs. Reality **The Promise:** 1. **Inclusion**: Banking for the 1.7 billion unbanked (World Bank) 2. **Efficiency**: Near-zero transaction costs, instant settlement 3. **Decentralization**: No intermediaries, no gatekeepers 4. **Transparency**: All transactions on public blockchain **The Reality (Empirical Evidence):** - **Volatility**: Bitcoin std dev ~80% annualized (vs. S&P 500 ~15%) - **Correlation**: Bitcoin-S&P correlation increased from ~0 (2015) to ~0.5 (2022): no longer diversifying - **Efficiency**: Autocorrelation tests show predictability (inefficient markets) - **Inclusion**: 95% of crypto holders are speculators, not unbanked users (Makarov & Schoar 2022, JF) - **Costs**: During congestion, Ethereum gas fees reached $50+ per transaction **The Academic Debate:** - **Academic skeptics** (no financial stake): [Paul Krugman](https://en.wikipedia.org/wiki/Paul_Krugman) (Nobel Prize-winning economist, argues crypto lacks intrinsic value and serves primarily for illegal transactions) and [Nouriel Roubini](https://en.wikipedia.org/wiki/Nouriel_Roubini) (NYU economist who predicted 2008 crisis, calls crypto "the mother of all scams") view cryptocurrency as a speculative bubble with no fundamental value, poor unit of account properties, and dominated by fraud. Their critique comes from outside the crypto ecosystem with no personal financial interest. - **Industry advocates** (significant skin in the game): [Andreas M. Antonopoulos](https://aantonop.com/) (author of *Mastering Bitcoin*, emphasizes censorship resistance and financial sovereignty) and [Vitalik Buterin](https://vitalik.eth.limo/) (Ethereum co-founder, argues for programmable money and decentralized applications beyond payments) counter that crypto is early-stage infrastructure: like the internet in 1995: requiring time for legitimate use cases to mature beyond speculation. Both have deep financial and reputational stakes in crypto's success. - **Evidence-based approach** (This lab): Understanding incentives matters. Academic critics risk nothing by being wrong; industry advocates benefit financially from adoption. Rather than choosing sides, we test empirical claims with data: volatility patterns, correlation dynamics, market efficiency, and actual usage statistics. ::: ### What You'll Build Today By the end of this lab, you will have: - ✅ Real-time crypto data from public APIs (CoinGecko) - ✅ Volatility analysis comparing crypto to traditional assets - ✅ Return distribution analysis (fat tails, skewness) - ✅ Market efficiency tests (autocorrelation, mean reversion) - ✅ Critical perspective on crypto's actual use cases ::: {.callout-important} ## Why This Matters Crypto is either the future of finance or a trillion-dollar speculative bubble. Your job as a data scientist: **test the claims empirically**, not ideologically. This lab shows you how. ::: ## Learning Objectives By the end of this lab, you will be able to: - Access cryptocurrency market data using public APIs - Calculate and compare volatility across crypto and traditional assets - Analyze return distributions and identify tail risk - Measure correlation patterns (within-crypto and cross-asset) - Test market efficiency using autocorrelation and arbitrage analysis - Visualize price dynamics and microstructure features - Evaluate crypto financial inclusion claims empirically ## Setup and Dependencies ```{python} # Core libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from scipy import stats from datetime import datetime, timedelta import warnings warnings.filterwarnings('ignore') # For reading data files try: import requests # For downloading from GitHub except ImportError: print("Installing requests...") !pip install -q requests import requests # Note: openpyxl only needed if reading Bloomberg Excel directly # We're using CSV from GitHub, so not required for students # For statistical tests try: import statsmodels.api as sm from statsmodels.tsa.stattools import adfuller, acf except ImportError: print("Installing statsmodels...") !pip install -q statsmodels # Visualization settings sns.set_style("whitegrid") plt.rcParams['figure.figsize'] = (12, 6) plt.rcParams['font.size'] = 11 print("✓ Setup complete - ready for crypto market analysis") ``` ## Exercise 1: Accessing Cryptocurrency Market Data ### Understanding Crypto Data Sources Unlike traditional finance where Bloomberg terminals and licensed data vendors dominate, cryptocurrency data comes from public APIs provided by exchanges and aggregators. This democratizes access: you can get the same data professionals use: but also creates challenges around data quality, fragmentation, and standardization. **Key data sources:** - **Aggregators**: CoinGecko, CoinMarketCap (volume-weighted prices across exchanges) - **Exchanges**: Coinbase Pro, Binance, Kraken (order books, trade data, official prices) - **Blockchain explorers**: On-chain data (transaction volumes, addresses, mining) - **Derivatives**: CME, Deribit (futures, options implied volatility) We'll use **real data from Bloomberg Terminal**, downloaded via the Excel add-in. This provides institutional-quality pricing with proper corporate actions handling and validated sources. ::: {.callout-note} ## Bloomberg Terminal Data This lab uses data downloaded from Bloomberg Terminal (XBTUSD Curncy, ETHUSD Curncy, etc.). Bloomberg provides the most reliable crypto pricing for institutional use. If you don't have Terminal access, alternatives include Yahoo Finance (free but less reliable) or CoinGecko Pro (paid API). ::: ### Loading Bloomberg Data from GitHub ```{python} def load_bloomberg_crypto(github_url='https://quinfer.github.io/financial-data-science/data/chapter07/crypto_bloomberg.csv'): """ Load cryptocurrency data from Bloomberg Terminal (CSV format). This function loads data from GitHub Pages by default (works in Colab). Falls back to local file if GitHub Pages is unavailable. Parameters ---------- github_url : str URL to Bloomberg crypto CSV on GitHub Pages Returns ------- pd.DataFrame Bitcoin price data with date index """ # Try GitHub Pages first (default for Colab/remote students) try: print("📥 Loading Bloomberg data from GitHub Pages...") df = pd.read_csv(github_url, parse_dates=['date']) df = df.set_index('date') print(f"✅ Loaded Bloomberg data: {len(df)} rows") print(f" Source: GitHub Pages (quinfer.github.io/financial-data-science)") return df except Exception as e: print(f"⚠️ Could not load from GitHub Pages: {e}") pass # Fallback to data root (config) or repo data/ for local_path in [data_root / "chapter07/crypto_bloomberg.csv", Path("data/chapter07/crypto_bloomberg.csv")]: try: df = pd.read_csv(local_path, parse_dates=['date']) df = df.set_index('date') print(f"✅ Loaded local Bloomberg data: {len(df)} rows") return df except (FileNotFoundError, OSError): continue return None # Load Bitcoin data from Bloomberg Terminal print("Loading cryptocurrency data from Bloomberg Terminal...") btc_bloomberg = load_bloomberg_crypto() if btc_bloomberg is not None: # Use real Bloomberg data btc_data = btc_bloomberg[['price']].copy() btc_data['volume'] = btc_bloomberg['volume'] if 'volume' in btc_bloomberg else None print(f"✅ Bitcoin (Bloomberg): {len(btc_data)} days of data") print(f" Date range: {btc_data.index.min().date()} to {btc_data.index.max().date()}") print(f" Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}") # Use last 2 years for analysis (to match typical lab scope) cutoff_date = btc_data.index.max() - pd.Timedelta(days=730) btc_data = btc_data[btc_data.index >= cutoff_date] print(f" Using last 2 years: {len(btc_data)} days") # Create synthetic ETH and BNB for comparison (scaled from BTC) # Real multi-asset Bloomberg data would require separate Terminal queries eth_data = btc_data.copy() eth_data['price'] = btc_data['price'] * 0.05 # Roughly ETH/BTC ratio bnb_data = btc_data.copy() bnb_data['price'] = btc_data['price'] * 0.01 # Roughly BNB/BTC ratio else: # Fallback to synthetic data if Bloomberg not available print("⚠️ Bloomberg data not available, using synthetic data...") dates = pd.date_range(end=pd.Timestamp.now(), periods=730, freq='D') btc_data = pd.DataFrame({ 'price': 30000 + np.cumsum(np.random.randn(730) * 500), 'volume': np.random.rand(730) * 1e9 }, index=dates) eth_data = btc_data.copy() eth_data['price'] = btc_data['price'] * 0.05 bnb_data = btc_data.copy() bnb_data['price'] = btc_data['price'] * 0.01 if btc_data is not None: print(f"✓ Retrieved {len(btc_data)} days of Bitcoin data") print(f" Price range: ${btc_data['price'].min():,.0f} - ${btc_data['price'].max():,.0f}") print("\nSample data:") print(btc_data.head()) ``` ### Visualizing Price Trends ```{python} # Create comprehensive price visualization fig, axes = plt.subplots(2, 2, figsize=(15, 10)) # Bitcoin price axes[0, 0].plot(btc_data.index, btc_data['price'], color='orange', linewidth=2) axes[0, 0].set_title('Bitcoin Price (USD)', fontsize=13, fontweight='bold') axes[0, 0].set_ylabel('Price ($)') axes[0, 0].grid(alpha=0.3) axes[0, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K')) # Ethereum price axes[0, 1].plot(eth_data.index, eth_data['price'], color='blue', linewidth=2) axes[0, 1].set_title('Ethereum Price (USD)', fontsize=13, fontweight='bold') axes[0, 1].set_ylabel('Price ($)') axes[0, 1].grid(alpha=0.3) # Trading volumes axes[1, 0].plot(btc_data.index, btc_data['volume'], color='green', alpha=0.7, linewidth=1.5) axes[1, 0].set_title('Bitcoin Trading Volume', fontsize=13, fontweight='bold') axes[1, 0].set_ylabel('Volume ($)') axes[1, 0].set_xlabel('Date') axes[1, 0].grid(alpha=0.3) axes[1, 0].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1e9:.1f}B')) # Price comparison (normalized to 100) btc_norm = 100 * btc_data['price'] / btc_data['price'].iloc[0] eth_norm = 100 * eth_data['price'] / eth_data['price'].iloc[0] bnb_norm = 100 * bnb_data['price'] / bnb_data['price'].iloc[0] axes[1, 1].plot(btc_norm.index, btc_norm, label='Bitcoin', color='orange', linewidth=2) axes[1, 1].plot(eth_norm.index, eth_norm, label='Ethereum', color='blue', linewidth=2) axes[1, 1].plot(bnb_norm.index, bnb_norm, label='BNB', color='gold', linewidth=2) axes[1, 1].set_title('Comparative Performance (Base = 100)', fontsize=13, fontweight='bold') axes[1, 1].set_ylabel('Index Value') axes[1, 1].set_xlabel('Date') axes[1, 1].legend() axes[1, 1].grid(alpha=0.3) plt.tight_layout() plt.show() # Calculate summary statistics print("\n" + "="*60) print("SUMMARY STATISTICS (2-year period)") print("="*60) for name, data in [('Bitcoin', btc_data), ('Ethereum', eth_data), ('BNB', bnb_data)]: total_return = (data['price'].iloc[-1] / data['price'].iloc[0] - 1) * 100 max_price = data['price'].max() min_price = data['price'].min() drawdown = ((data['price'] / data['price'].cummax()) - 1).min() * 100 print(f"\n{name}:") print(f" Total Return: {total_return:+.1f}%") print(f" Price Range: ${min_price:,.0f} - ${max_price:,.0f}") print(f" Max Drawdown: {drawdown:.1f}%") ``` ### Reflection Questions (Exercise 1) Write 200-250 words addressing: 1. **Data Quality**: What challenges might arise from using free aggregator APIs versus licensed data feeds? How might wash trading on some exchanges affect aggregate data quality? 2. **Price Fragmentation**: CoinGecko aggregates prices across exchanges. Why might Bitcoin trade at different prices simultaneously on different venues? What arbitrage mechanisms should eliminate these spreads? 3. **Volume Interpretation**: How should we interpret trading volume data knowing that significant portion might be wash trading? What alternative metrics could measure genuine market activity? ## Exercise 2: Volatility and Risk Analysis ### Calculating Returns and Volatility ```{python} # Calculate log returns btc_data['returns'] = np.log(btc_data['price'] / btc_data['price'].shift(1)) eth_data['returns'] = np.log(eth_data['price'] / eth_data['price'].shift(1)) bnb_data['returns'] = np.log(bnb_data['price'] / bnb_data['price'].shift(1)) # Remove NaN values btc_returns = btc_data['returns'].dropna() eth_returns = eth_data['returns'].dropna() bnb_returns = bnb_data['returns'].dropna() # Calculate volatility metrics def calculate_volatility_metrics(returns, name): """ Calculate comprehensive volatility statistics for cryptocurrency returns. Computes key risk metrics used by portfolio managers: realized volatility, tail risk measures (VaR), and distribution shape (skewness, kurtosis). Parameters ---------- returns : pd.Series Daily returns (log or simple returns) name : str Asset name for display in output Returns ------- dict Dictionary with keys: - 'daily_vol' : float, daily standard deviation - 'annual_vol' : float, annualized standard deviation (daily * sqrt(365)) - 'rolling_vol' : pd.Series, 30-day rolling volatility - 'skew' : float, skewness (negative = left tail) - 'kurt' : float, excess kurtosis (> 0 = fat tails) - 'var_95' : float, 5th percentile return (1-day VaR at 95%) - 'var_99' : float, 1st percentile return (1-day VaR at 99%) Notes ----- - Annualization assumes 365 trading days (crypto markets trade 24/7) - Traditional equity markets use 252 trading days - VaR is historical (empirical percentiles), not parametric (Gaussian assumption) - Fat tails (kurtosis > 3) mean VaR underestimates extreme losses Examples -------- >>> btc_returns = btc_data['price'].pct_change() >>> metrics = calculate_volatility_metrics(btc_returns, 'Bitcoin') >>> metrics['annual_vol'] 0.65 # 65% annualized volatility (vs. S&P 500 ~15%) """ daily_vol = returns.std() annual_vol = daily_vol * np.sqrt(365) # Rolling volatility (30-day window) rolling_vol = returns.rolling(30).std() * np.sqrt(365) # Skewness and kurtosis skew = stats.skew(returns.dropna()) kurt = stats.kurtosis(returns.dropna()) # Value at Risk (95% and 99%) var_95 = np.percentile(returns.dropna(), 5) var_99 = np.percentile(returns.dropna(), 1) print(f"\n{name} Volatility Metrics:") print(f" Daily Volatility: {daily_vol*100:.2f}%") print(f" Annualized Volatility: {annual_vol*100:.1f}%") print(f" Skewness: {skew:.3f} {'(negative tail)' if skew < 0 else '(positive tail)'}") print(f" Kurtosis: {kurt:.3f} {'(fat tails)' if kurt > 3 else '(thin tails)'}") print(f" VaR (95%): {var_95*100:.2f}% (1-day)") print(f" VaR (99%): {var_99*100:.2f}% (1-day)") return { 'daily_vol': daily_vol, 'annual_vol': annual_vol, 'rolling_vol': rolling_vol, 'skew': skew, 'kurt': kurt, 'var_95': var_95, 'var_99': var_99 } print("="*70) print("VOLATILITY ANALYSIS") print("="*70) btc_vol = calculate_volatility_metrics(btc_returns, "Bitcoin") eth_vol = calculate_volatility_metrics(eth_returns, "Ethereum") bnb_vol = calculate_volatility_metrics(bnb_returns, "BNB") # Compare to traditional assets (typical values for reference) print("\n" + "-"*70) print("COMPARISON TO TRADITIONAL ASSETS (typical values):") print("-"*70) print("S&P 500: Annual Vol ~15-20%, Skew ~-0.5, Kurtosis ~5-8") print("Gold: Annual Vol ~15-18%, Skew ~0.2, Kurtosis ~3-5") print("Treasury Bonds: Annual Vol ~5-8%, Skew ~0.0, Kurtosis ~3-4") print("\nCryptocurrency volatility is 3-5x higher than traditional assets!") ``` ### Visualizing Return Distributions ```{python} fig, axes = plt.subplots(2, 2, figsize=(15, 10)) # Bitcoin return distribution axes[0, 0].hist(btc_returns * 100, bins=50, alpha=0.7, color='orange', edgecolor='black') axes[0, 0].axvline(btc_returns.mean() * 100, color='red', linestyle='--', linewidth=2, label=f'Mean: {btc_returns.mean()*100:.2f}%') axes[0, 0].set_title('Bitcoin Daily Returns Distribution', fontsize=13, fontweight='bold') axes[0, 0].set_xlabel('Daily Return (%)') axes[0, 0].set_ylabel('Frequency') axes[0, 0].legend() axes[0, 0].grid(alpha=0.3) # QQ plot for normality test stats.probplot(btc_returns.dropna(), dist="norm", plot=axes[0, 1]) axes[0, 1].set_title('Q-Q Plot: Bitcoin Returns vs Normal Distribution', fontsize=13, fontweight='bold') axes[0, 1].grid(alpha=0.3) # Rolling volatility axes[1, 0].plot(btc_vol['rolling_vol'].index, btc_vol['rolling_vol'] * 100, color='purple', linewidth=2, label='BTC Rolling Vol (30d)') axes[1, 0].plot(eth_vol['rolling_vol'].index, eth_vol['rolling_vol'] * 100, color='blue', linewidth=2, alpha=0.7, label='ETH Rolling Vol (30d)') axes[1, 0].set_title('Rolling Volatility (30-day window)', fontsize=13, fontweight='bold') axes[1, 0].set_ylabel('Annualized Volatility (%)') axes[1, 0].set_xlabel('Date') axes[1, 0].legend() axes[1, 0].grid(alpha=0.3) # Volatility comparison bar chart vol_comparison = pd.DataFrame({ 'Bitcoin': [btc_vol['annual_vol'] * 100], 'Ethereum': [eth_vol['annual_vol'] * 100], 'BNB': [bnb_vol['annual_vol'] * 100], 'S&P 500': [17.5], # Typical value 'Gold': [16.5] # Typical value }) vol_comparison.T.plot(kind='bar', ax=axes[1, 1], legend=False, color=['orange', 'blue', 'gold', 'green', 'brown']) axes[1, 1].set_title('Annualized Volatility Comparison', fontsize=13, fontweight='bold') axes[1, 1].set_ylabel('Volatility (%)') axes[1, 1].set_xlabel('Asset') axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha='right') axes[1, 1].axhline(y=20, color='red', linestyle='--', alpha=0.5, label='20% threshold') axes[1, 1].grid(alpha=0.3, axis='y') plt.tight_layout() plt.show() # Statistical tests for normality print("\n" + "="*70) print("NORMALITY TESTS") print("="*70) for name, returns in [('Bitcoin', btc_returns), ('Ethereum', eth_returns)]: # Jarque-Bera test jb_stat, jb_pval = stats.jarque_bera(returns.dropna()) # Shapiro-Wilk test (sample if too large) sample_returns = returns.dropna().sample(min(5000, len(returns))) sw_stat, sw_pval = stats.shapiro(sample_returns) print(f"\n{name}:") print(f" Jarque-Bera test: statistic={jb_stat:.2f}, p-value={jb_pval:.4f}") print(f" {'Reject normality' if jb_pval < 0.05 else 'Cannot reject normality'} (α=0.05)") print(f" Shapiro-Wilk test: statistic={sw_stat:.4f}, p-value={sw_pval:.4f}") print(f" {'Reject normality' if sw_pval < 0.05 else 'Cannot reject normality'} (α=0.05)") print("\n💡 Returns exhibit fat tails and deviate significantly from normal distribution!") ``` ### Correlation Analysis ```{python} # Combine returns into single DataFrame returns_df = pd.DataFrame({ 'BTC': btc_returns, 'ETH': eth_returns, 'BNB': bnb_returns }).dropna() # Calculate correlation matrix corr_matrix = returns_df.corr() # Visualize correlations fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Heatmap sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='RdYlGn', center=0, square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=axes[0]) axes[0].set_title('Cryptocurrency Correlation Matrix', fontsize=13, fontweight='bold') # Scatter plot: BTC vs ETH axes[1].scatter(returns_df['BTC'] * 100, returns_df['ETH'] * 100, alpha=0.5, s=20) axes[1].set_xlabel('Bitcoin Daily Return (%)') axes[1].set_ylabel('Ethereum Daily Return (%)') axes[1].set_title(f'BTC-ETH Correlation: {corr_matrix.loc["BTC", "ETH"]:.3f}', fontsize=13, fontweight='bold') axes[1].axhline(0, color='black', linewidth=0.5, alpha=0.3) axes[1].axvline(0, color='black', linewidth=0.5, alpha=0.3) axes[1].grid(alpha=0.3) # Add regression line z = np.polyfit(returns_df['BTC'], returns_df['ETH'], 1) p = np.poly1d(z) axes[1].plot(returns_df['BTC'] * 100, p(returns_df['BTC']) * 100, "r--", alpha=0.8, linewidth=2, label=f'Regression line') axes[1].legend() plt.tight_layout() plt.show() print("\n" + "="*70) print("CORRELATION ANALYSIS") print("="*70) print("\nWithin-Crypto Correlations:") print(corr_matrix) print("\n💡 High correlations (0.5-0.8) limit diversification within cryptocurrency portfolios") # Rolling correlation rolling_corr_btc_eth = returns_df['BTC'].rolling(90).corr(returns_df['ETH']) plt.figure(figsize=(12, 5)) plt.plot(rolling_corr_btc_eth.index, rolling_corr_btc_eth, linewidth=2, color='purple') plt.axhline(y=rolling_corr_btc_eth.mean(), color='red', linestyle='--', label=f'Mean: {rolling_corr_btc_eth.mean():.3f}') plt.title('Rolling Correlation: Bitcoin vs Ethereum (90-day window)', fontsize=13, fontweight='bold') plt.ylabel('Correlation') plt.xlabel('Date') plt.legend() plt.grid(alpha=0.3) plt.tight_layout() plt.show() print(f"\nRolling BTC-ETH correlation: Mean={rolling_corr_btc_eth.mean():.3f}, " f"Std={rolling_corr_btc_eth.std():.3f}") print("Note: Correlation increases during volatile periods (contagion effect)") ``` ### Reflection Questions (Exercise 2) Write 250-300 words addressing: 1. **Volatility Implications**: Bitcoin's 60-80% annualized volatility is 3-4x higher than equities. What does this mean for: (a) using Bitcoin as currency (purchasing power stability)? (b) portfolio allocation (risk contribution)? (c) options pricing and risk management? 2. **Fat Tails and Risk Models**: The Q-Q plot shows Bitcoin returns deviate from normality with fat tails. Why do standard risk models (VaR assuming normal distribution) underestimate tail risk? What practical consequences does this have? 3. **Correlation Patterns**: Cryptocurrencies show high correlation with each other (0.5-0.8) but time-varying correlation with equities. What does this mean for diversification benefits within crypto portfolios versus across asset classes? ## Exercise 3: Market Efficiency Testing ### Autocorrelation Analysis ```{python} # Test for autocorrelation (do past returns predict future returns?) def test_autocorrelation(returns, name, max_lag=20): """ Test for serial correlation in returns (market efficiency diagnostic). Autocorrelation measures whether past returns predict future returns. If significant autocorrelation exists, markets are inefficient (predictable). Parameters ---------- returns : pd.Series Daily returns series name : str Asset name for display max_lag : int, default=20 Maximum lag to test (20 days = ~1 month) Returns ------- None Prints test results and displays ACF plot Notes ----- **Ljung-Box Test:** - Null hypothesis: No autocorrelation up to lag k - p-value < 0.05 → Reject H0 → Significant autocorrelation (inefficiency) **Efficient Market Hypothesis (weak form):** - If markets are efficient, past prices shouldn't predict future prices - ACF should be ~0 at all lags (within 95% confidence bands) - Crypto often shows significant autocorrelation (inefficient) **Why Crypto Markets Are Inefficient:** - Fragmented liquidity across hundreds of exchanges - High transaction costs (gas fees, spreads) - Retail-dominated (fewer arbitrageurs) - 24/7 trading → slower price discovery Examples -------- >>> btc_returns = btc_data['price'].pct_change() >>> test_autocorrelation(btc_returns, 'Bitcoin', max_lag=20) Bitcoin Autocorrelation Analysis: Lag-1 Autocorrelation: 0.0234 Ljung-Box p-value (lag 10): 0.0012 # Reject H0 → Inefficient! """ # Calculate autocorrelation function acf_values = acf(returns.dropna(), nlags=max_lag, fft=False) # Ljung-Box test for joint significance from statsmodels.stats.diagnostic import acorr_ljungbox lb_test = acorr_ljungbox(returns.dropna(), lags=[5, 10, 20], return_df=True) print(f"\n{name} Autocorrelation Analysis:") print(f" Lag-1 Autocorrelation: {acf_values[1]:.4f}") print(f" Lag-5 Autocorrelation: {acf_values[5]:.4f}") print("\nLjung-Box Test (joint significance):") print(lb_test) # Plot ACF fig, ax = plt.subplots(figsize=(12, 4)) ax.stem(range(len(acf_values)), acf_values, basefmt=" ") ax.axhline(y=0, color='black', linewidth=0.5) ax.axhline(y=1.96/np.sqrt(len(returns)), color='red', linestyle='--', label='95% CI') ax.axhline(y=-1.96/np.sqrt(len(returns)), color='red', linestyle='--') ax.set_title(f'{name} Autocorrelation Function', fontsize=13, fontweight='bold') ax.set_xlabel('Lag (days)') ax.set_ylabel('Autocorrelation') ax.legend() ax.grid(alpha=0.3) plt.tight_layout() plt.show() return acf_values print("="*70) print("MARKET EFFICIENCY: AUTOCORRELATION TESTS") print("="*70) btc_acf = test_autocorrelation(btc_returns, "Bitcoin") eth_acf = test_autocorrelation(eth_returns, "Ethereum") print("\n💡 Interpretation: Significant autocorrelation suggests predictability (market inefficiency)") print(" Small correlations may not be economically significant after transaction costs") ``` ### Momentum Strategy Backtest ```{python} # Simple momentum strategy: buy if price > 50-day MA, sell otherwise def momentum_strategy(prices, short_window=10, long_window=50): """ Backtest simple moving average crossover momentum strategy. Classic technical analysis strategy: buy when short MA crosses above long MA, sell when it crosses below. Tests whether momentum exists in crypto markets. Parameters ---------- prices : pd.Series Daily closing prices short_window : int, default=10 Short moving average window (days) long_window : int, default=50 Long moving average window (days) Returns ------- pd.DataFrame Columns: - 'price' : original prices - 'MA_short' : short-window moving average - 'MA_long' : long-window moving average - 'signal' : trading position (+1 = long, -1 = short, 0 = no position) - 'returns' : buy-and-hold returns - 'strategy_returns' : strategy returns (position × market return) - 'cum_returns' : cumulative buy-and-hold - 'cum_strategy' : cumulative strategy performance Notes ----- **Strategy Logic:** - Golden Cross: Short MA > Long MA → Buy signal - Death Cross: Short MA < Long MA → Sell signal **Reality Check:** - This is a **naive backtest** (ignores transaction costs, slippage, fees) - Crypto trading fees ~0.1-0.5% per trade → eats into profits - No position sizing, risk management, or stop-losses - Past performance ≠ future returns (overfitting risk) **Academic Evidence:** - Momentum works in equities (Jegadeesh & Titman 1993, JF) - Crypto momentum: mixed evidence, high volatility dominates - Transaction costs often exceed strategy alpha Examples -------- >>> btc_momentum = momentum_strategy(btc_data['price']) >>> strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100 >>> print(f"Strategy return: {strategy_return:.1f}%") Strategy return: -5.2% # Often underperforms buy-and-hold after costs """ df = pd.DataFrame({'price': prices}) # Calculate moving averages df['MA_short'] = df['price'].rolling(short_window).mean() df['MA_long'] = df['price'].rolling(long_window).mean() # Generate signals df['signal'] = 0 df.loc[df['MA_short'] > df['MA_long'], 'signal'] = 1 # Buy signal df.loc[df['MA_short'] < df['MA_long'], 'signal'] = -1 # Sell signal # Calculate returns df['returns'] = df['price'].pct_change() df['strategy_returns'] = df['signal'].shift(1) * df['returns'] # Cumulative returns df['cum_returns'] = (1 + df['returns']).cumprod() df['cum_strategy'] = (1 + df['strategy_returns']).cumprod() return df # Run momentum strategy btc_momentum = momentum_strategy(btc_data['price']) # Calculate performance metrics total_return = (btc_momentum['cum_returns'].iloc[-1] - 1) * 100 strategy_return = (btc_momentum['cum_strategy'].iloc[-1] - 1) * 100 excess_return = strategy_return - total_return print("\n" + "="*70) print("MOMENTUM STRATEGY BACKTEST") print("="*70) print(f"\nBuy-and-Hold Return: {total_return:.2f}%") print(f"Strategy Return: {strategy_return:.2f}%") print(f"Excess Return: {excess_return:+.2f}%") # Visualize strategy performance fig, axes = plt.subplots(2, 1, figsize=(14, 10)) # Price and moving averages axes[0].plot(btc_momentum.index, btc_momentum['price'], label='Bitcoin Price', color='orange', linewidth=2) axes[0].plot(btc_momentum.index, btc_momentum['MA_short'], label=f'{10}D MA', color='blue', linewidth=1.5) axes[0].plot(btc_momentum.index, btc_momentum['MA_long'], label=f'{50}D MA', color='red', linewidth=1.5) axes[0].set_title('Bitcoin Price with Moving Averages', fontsize=13, fontweight='bold') axes[0].set_ylabel('Price ($)') axes[0].legend() axes[0].grid(alpha=0.3) # Cumulative returns comparison axes[1].plot(btc_momentum.index, btc_momentum['cum_returns'], label='Buy and Hold', color='gray', linewidth=2) axes[1].plot(btc_momentum.index, btc_momentum['cum_strategy'], label='Momentum Strategy', color='green', linewidth=2) axes[1].set_title('Cumulative Returns: Strategy vs Buy-and-Hold', fontsize=13, fontweight='bold') axes[1].set_ylabel('Cumulative Return (Base = 1)') axes[1].set_xlabel('Date') axes[1].legend() axes[1].grid(alpha=0.3) plt.tight_layout() plt.show() print("\n⚠️ Note: This backtest ignores transaction costs, slippage, and taxes") print(" Real-world implementation would have lower returns") ``` ### Mean Reversion Test ```{python} # Augmented Dickey-Fuller test for stationarity/mean reversion def test_mean_reversion(prices, name): """ Test for mean reversion using Augmented Dickey-Fuller (ADF) test. Tests whether prices follow a random walk (unit root) or revert to a mean (stationary). Critical for pairs trading and mean-reversion strategies. Parameters ---------- prices : pd.Series Daily closing prices name : str Asset name for display Returns ------- tuple ADF test results: (statistic, p-value, lags_used, nobs, critical_values, icbest) Notes ----- **Augmented Dickey-Fuller Test:** - Null hypothesis (H0): Series has unit root (random walk, NOT mean-reverting) - Alternative (H1): Series is stationary (mean-reverting) - p-value < 0.05 → Reject H0 → Prices are stationary (mean-reverting) - p-value > 0.05 → Cannot reject H0 → Random walk (efficient market) **Implications for Trading:** - **Random walk** (efficient): Momentum strategies may work, mean-reversion won't - **Mean-reverting** (inefficient): Pairs trading, statistical arbitrage possible **Why This Matters:** - Most financial time series have unit roots (Campbell, Lo, MacKinlay 1997) - Crypto markets often show mixed evidence (regime-dependent) - Low p-values may be spurious (structural breaks, volatility clustering) **Technical Details:** - Test performed on log prices (handles exponential growth) - Regression includes constant term ('c') but not trend - AIC criterion selects optimal lag length Examples -------- >>> btc_adf = test_mean_reversion(btc_data['price'], 'Bitcoin') Bitcoin - Augmented Dickey-Fuller Test: ADF Statistic: -1.234 P-value: 0.658 # Cannot reject unit root → Random walk """ # ADF test on log prices log_prices = np.log(prices) adf_result = adfuller(log_prices.dropna(), maxlag=20, regression='c', autolag='AIC') print(f"\n{name} - Augmented Dickey-Fuller Test:") print(f" ADF Statistic: {adf_result[0]:.4f}") print(f" P-value: {adf_result[1]:.4f}") print(f" Critical Values:") for key, value in adf_result[4].items(): print(f" {key}: {value:.4f}") if adf_result[1] < 0.05: print(f" ✓ Reject unit root (prices are stationary/mean-reverting)") else: print(f" ✗ Cannot reject unit root (prices have unit root/random walk)") return adf_result print("\n" + "="*70) print("MEAN REVERSION TESTS") print("="*70) btc_adf = test_mean_reversion(btc_data['price'], "Bitcoin") eth_adf = test_mean_reversion(eth_data['price'], "Ethereum") print("\n💡 If prices follow random walk, past prices don't predict future prices") print(" This supports weak-form market efficiency") ``` ### Reflection Questions (Exercise 3) Write 200-250 words addressing: 1. **Efficiency Interpretation**: What do your autocorrelation and momentum results suggest about Bitcoin market efficiency? Can small autocorrelations or strategy profits coexist with efficient markets? 2. **Transaction Costs Matter**: The momentum strategy showed [profit/loss] before transaction costs. Cryptocurrency trading costs 0.1-0.5% per trade. Would your strategy be profitable after accounting for costs? Show rough calculations. 3. **Limits to Arbitrage**: Even if inefficiencies exist (predictable patterns), what practical barriers prevent traders from exploiting them and eliminating the patterns? --- ## Exercise 4: GARCH Volatility Modeling & Structural Breaks **Learning Objectives:** - Apply GARCH models to cryptocurrency returns (Week 3, §3.4 theory) - Test for volatility clustering and asymmetric effects - Evaluate volatility forecasting accuracy (Mincer-Zarnowitz regression) - Detect structural breaks and regime shifts in volatility ::: {.callout-tip} ## Connection to [Ch 03: Volatility Modelling](../chapters/03_volatility_modelling.qmd) & [Ch 07: Cryptocurrency](../chapters/07_cryptocurrency_digital_currency.qmd#sec-garch) This exercise applies **Week 3 GARCH theory** to Bitcoin data. You'll estimate time-varying volatility, test for asymmetric effects (leverage), forecast volatility, and test whether high GARCH persistence is real or an artifact of regime shifts. **Key statistical concepts**: Volatility clustering, fat tails, leverage effect, out-of-sample validation, structural breaks. ::: ### Part A: Statistical Tests for Volatility Properties **Before fitting GARCH, test for its key assumptions:** ```{python} from statsmodels.stats.diagnostic import acorr_ljungbox from scipy.stats import jarque_bera # Bitcoin returns (from Exercise 2) returns = btc_data['return'].dropna() # Test 1: Ljung-Box test on squared returns (volatility clustering) lb_test = acorr_ljungbox(returns**2, lags=[10], return_df=True) lb_stat = lb_test['lb_stat'].values[0] lb_pval = lb_test['lb_pvalue'].values[0] # Test 2: Jarque-Bera test (normality) jb_stat, jb_pval = jarque_bera(returns) print("=" * 70) print("STATISTICAL TESTS FOR GARCH ASSUMPTIONS") print("=" * 70) print("\n1. Ljung-Box Test (Volatility Clustering)") print(f" H₀: No autocorrelation in squared returns") print(f" Statistic: {lb_stat:.2f} | p-value: {lb_pval:.4f}") if lb_pval < 0.05: print(f" ✓ REJECT H₀ → Significant volatility clustering (GARCH warranted)") else: print(f" ✗ Cannot reject H₀ → No volatility clustering") print("\n2. Jarque-Bera Test (Normality)") print(f" H₀: Returns are normally distributed") print(f" Statistic: {jb_stat:.2f} | p-value: {jb_pval:.6f}") if jb_pval < 0.05: print(f" ✓ REJECT H₀ → Non-normal distribution (fat tails present)") else: print(f" ✗ Cannot reject H₀ → Normal distribution") print("\n💡 Both tests should reject H₀ for cryptocurrency data") print(" This justifies using GARCH with Student's t distribution") ``` **Interpretation**: Bitcoin should show p < 0.001 for both tests: strong volatility clustering and fat tails. ### Part B: GARCH(1,1) and GJR-GARCH Estimation **Fit symmetric GARCH(1,1) and asymmetric GJR-GARCH:** ```{python} from arch import arch_model # Convert returns to percentage for numerical stability returns_pct = returns * 100 # Model 1: GARCH(1,1) with Student's t distribution (fat tails) model_garch = arch_model(returns_pct, vol='GARCH', p=1, q=1, dist='t') garch_fit = model_garch.fit(disp='off') # Model 2: GJR-GARCH (asymmetric, captures leverage effect) model_gjr = arch_model(returns_pct, vol='GARCH', p=1, o=1, q=1, dist='t') gjr_fit = model_gjr.fit(disp='off') # Extract parameters print("\n" + "=" * 70) print("GARCH MODEL ESTIMATION RESULTS") print("=" * 70) print("\n1. GARCH(1,1) with Student's t:") print(f" ω (baseline): {garch_fit.params['omega']:>8.4f}") print(f" α (news impact): {garch_fit.params['alpha[1]']:>8.4f}") print(f" β (persistence): {garch_fit.params['beta[1]']:>8.4f}") print(f" α + β: {garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]']:>8.4f}") print(f" df (tail): {garch_fit.params['nu']:>8.2f} (normal = ∞)") print(f" AIC: {garch_fit.aic:>8.2f}") print("\n2. GJR-GARCH (asymmetric):") print(f" ω (baseline): {gjr_fit.params['omega']:>8.4f}") print(f" α (positive): {gjr_fit.params['alpha[1]']:>8.4f}") print(f" γ (asymmetry): {gjr_fit.params['gamma[1]']:>8.4f}") print(f" β (persistence): {gjr_fit.params['beta[1]']:>8.4f}") print(f" α + γ (negative): {gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']:>8.4f}") print(f" df (tail): {gjr_fit.params['nu']:>8.2f}") print(f" AIC: {gjr_fit.aic:>8.2f}") # Model comparison if gjr_fit.aic < garch_fit.aic: improvement = garch_fit.aic - gjr_fit.aic print(f"\n✓ GJR-GARCH preferred (AIC lower by {improvement:.2f})") print(f" Negative shocks increase volatility {(gjr_fit.params['alpha[1]'] + gjr_fit.params['gamma[1]']) / gjr_fit.params['alpha[1]']:.2f}× more") else: print(f"\n GARCH(1,1) preferred (symmetric effects)") # Visualize conditional volatility fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True) # Panel 1: Returns with ±1σ GARCH bands conditional_vol_garch = garch_fit.conditional_volatility axes[0].plot(returns_pct.index, returns_pct, linewidth=0.5, alpha=0.7, label='Returns', color='blue') axes[0].fill_between(returns_pct.index, -conditional_vol_garch, conditional_vol_garch, alpha=0.2, color='red', label='±1σ (GARCH volatility)') axes[0].set_ylabel('Return (%)', fontsize=11) axes[0].set_title('Bitcoin Returns with GARCH(1,1) Conditional Volatility', fontsize=13) axes[0].legend(fontsize=10) axes[0].grid(alpha=0.3) # Panel 2: GARCH vs GJR volatility over time conditional_vol_gjr = gjr_fit.conditional_volatility axes[1].plot(conditional_vol_garch.index, conditional_vol_garch, linewidth=1.5, color='blue', label='GARCH(1,1)', alpha=0.7) axes[1].plot(conditional_vol_gjr.index, conditional_vol_gjr, linewidth=1.5, color='red', label='GJR-GARCH', alpha=0.7) axes[1].set_xlabel('Date', fontsize=11) axes[1].set_ylabel('Conditional Volatility (%)', fontsize=11) axes[1].set_title('Time-Varying Volatility: GARCH vs GJR-GARCH', fontsize=13) axes[1].legend(fontsize=10) axes[1].grid(alpha=0.3) plt.tight_layout() plt.show() print("\n💡 GARCH captures volatility spikes (2018 crash, 2020 COVID, 2021 bull run)") print(" GJR-GARCH shows asymmetry: bad news increases volatility more") ``` ::: {.callout-tip} ## Interpreting GARCH Parameters **Persistence (α + β)**: - **~0.95-0.99**: High persistence: volatility shocks decay slowly (typical for crypto) - **Half-life** = ln(0.5) / ln(α + β). If α+β=0.98, half-life ≈ 35 days **Asymmetry (γ in GJR)**: - **γ > 0**: Negative shocks (bad news) increase volatility more than positive shocks - **Leverage effect**: -5% drop increases vol more than +5% rally (typical for all assets) **Degrees of freedom (df)**: - **df < 10**: Very fat tails (extreme events common) - **df → ∞**: Normal distribution (no fat tails) ::: ### Part C: News Impact Curves (Asymmetry Visualization) **Visualize how shocks of different sizes/signs affect volatility:** ```{python} # Generate news impact curves shocks = np.linspace(-10, 10, 200) # -10% to +10% returns # GARCH(1,1) impact (symmetric) alpha_garch = garch_fit.params['alpha[1]'] impact_garch = alpha_garch * shocks**2 # GJR-GARCH impact (asymmetric) alpha_gjr = gjr_fit.params['alpha[1]'] gamma_gjr = gjr_fit.params['gamma[1]'] impact_gjr = alpha_gjr * shocks**2 + gamma_gjr * (shocks < 0) * shocks**2 # Plot plt.figure(figsize=(12, 7)) plt.plot(shocks, impact_garch, linewidth=2.5, linestyle='--', color='blue', label='GARCH (symmetric)', alpha=0.8) plt.plot(shocks, impact_gjr, linewidth=2.5, color='red', label='GJR-GARCH (asymmetric)') # Highlight key points plt.axvline(0, color='black', linestyle=':', linewidth=1.5, alpha=0.5) plt.axvline(-5, color='red', linestyle=':', linewidth=1, alpha=0.5, label='Example: -5% shock') plt.axvline(+5, color='green', linestyle=':', linewidth=1, alpha=0.5, label='Example: +5% shock') # Annotate asymmetry neg5_impact = alpha_gjr * 25 + gamma_gjr * 25 pos5_impact = alpha_gjr * 25 plt.scatter([-5, 5], [neg5_impact, pos5_impact], s=150, c=['red', 'green'], edgecolors='black', zorder=5, alpha=0.8) plt.text(-5, neg5_impact + 0.5, f'Impact: {neg5_impact:.2f}', ha='center', fontsize=10, fontweight='bold') plt.text(5, pos5_impact + 0.5, f'Impact: {pos5_impact:.2f}', ha='center', fontsize=10, fontweight='bold') plt.xlabel('News Shock (% return)', fontsize=12) plt.ylabel('Impact on Conditional Variance', fontsize=12) plt.title('News Impact Curves: How Shocks Affect Volatility', fontsize=14, fontweight='bold') plt.legend(fontsize=11) plt.grid(alpha=0.3) plt.tight_layout() plt.show() # Calculate asymmetry ratio asymmetry_ratio = neg5_impact / pos5_impact print(f"\nAsymmetry Ratio:") print(f" -5% shock impact / +5% shock impact = {asymmetry_ratio:.2f}") print(f" → Bad news increases volatility {asymmetry_ratio:.1f}× more than good news") ``` **Interpretation**: For Bitcoin, negative shocks typically increase volatility ~1.5-2× more than positive shocks. ### Part D: Volatility Forecasting & Out-of-Sample Validation **Test if GARCH forecasts future volatility accurately (Mincer-Zarnowitz regression):** ```{python} from scipy.stats import linregress # Rolling-window forecast forecast_horizon = 22 # days (1 month ahead) train_size = 252 * 2 # 2 years training window forecasts_garch = [] realized_vols = [] print("\n" + "=" * 70) print("ROLLING-WINDOW VOLATILITY FORECASTING") print("=" * 70) print(f"Training window: {train_size} days | Forecast horizon: {forecast_horizon} days") for start in range(train_size, len(returns_pct) - forecast_horizon, forecast_horizon): # Train GARCH on historical data train_data = returns_pct.iloc[start - train_size:start] model_train = arch_model(train_data, vol='GARCH', p=1, q=1, dist='t') fit_train = model_train.fit(disp='off') # Forecast next month's volatility forecast = fit_train.forecast(horizon=forecast_horizon) forecast_vol = np.sqrt(forecast.variance.values[-1, :].mean()) # Average over horizon # Realized volatility (actual) test_data = returns_pct.iloc[start:start + forecast_horizon] realized_vol = test_data.std() forecasts_garch.append(forecast_vol) realized_vols.append(realized_vol) forecasts_garch = np.array(forecasts_garch) realized_vols = np.array(realized_vols) # Mincer-Zarnowitz regression: Realized = α + β × Forecast + ε slope, intercept, r_value, p_value, std_err = linregress(forecasts_garch, realized_vols) # Calculate RMSE rmse = np.sqrt(((realized_vols - forecasts_garch)**2).mean()) print(f"\nNumber of forecasts: {len(forecasts_garch)}") print(f"\nMincer-Zarnowitz Regression Results:") print(f" Intercept (α): {intercept:>8.3f} (ideal = 0)") print(f" Slope (β): {slope:>8.3f} (ideal = 1)") print(f" R²: {r_value**2:>8.3f}") print(f" RMSE: {rmse:>8.2f}%") if abs(intercept) < 1 and abs(slope - 1) < 0.2: print(f"\n✓ Forecast is approximately unbiased (α≈0, β≈1)") else: print(f"\n⚠️ Forecast shows bias (α≠0 or β≠1)") # Visualize fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) # Panel 1: Mincer-Zarnowitz scatter ax1.scatter(forecasts_garch, realized_vols, alpha=0.6, s=80, edgecolor='black', linewidth=0.5) ax1.plot([forecasts_garch.min(), forecasts_garch.max()], [forecasts_garch.min(), forecasts_garch.max()], 'r--', linewidth=2.5, label='Perfect forecast (45° line)') ax1.plot(forecasts_garch, intercept + slope * forecasts_garch, 'b-', linewidth=2.5, label=f'Fitted: y={intercept:.2f}+{slope:.2f}x (R²={r_value**2:.2f})') ax1.set_xlabel('GARCH Forecast Volatility (%)', fontsize=12) ax1.set_ylabel('Realized Volatility (%)', fontsize=12) ax1.set_title('Mincer-Zarnowitz: Forecast Accuracy', fontsize=13) ax1.legend(fontsize=10) ax1.grid(alpha=0.3) # Panel 2: Forecast errors over time errors = realized_vols - forecasts_garch ax2.plot(errors, linewidth=1.5, color='red', alpha=0.7) ax2.axhline(0, color='black', linestyle='--', linewidth=1.5) ax2.fill_between(range(len(errors)), 0, errors, alpha=0.3, color='red') ax2.set_xlabel('Forecast Period', fontsize=12) ax2.set_ylabel('Forecast Error (Realized - Forecast, %)', fontsize=12) ax2.set_title('Forecast Errors Over Time', fontsize=13) ax2.grid(alpha=0.3) plt.tight_layout() plt.show() print("\n💡 GARCH forecasts Bitcoin volatility reasonably (R²~0.5-0.7)") print(" But underestimates during extreme events (crashes, manias)") ``` ::: {.callout-important} ## Connection to [Week 1, §0.6: Out-of-Sample Validation](../chapters/01_foundations.qmd#sec-model-selection) **Mincer-Zarnowitz regression** tests forecast unbiasedness: - **α = 0**: No systematic over/under-prediction - **β = 1**: Forecast correctly captures volatility scale - **High R²**: Forecast explains realized volatility well This is **honest evaluation**: train on past, test on future (no look-ahead bias). ::: ### Part E: Structural Breaks Detection **Test if high GARCH persistence (α+β ≈ 0.98) is real or artifact of regime shifts:** ```{python} # Step 1: Visual regime identification (rolling volatility) rolling_vol = returns_pct.rolling(window=30).std() * np.sqrt(252) plt.figure(figsize=(14, 7)) plt.plot(rolling_vol.index, rolling_vol, linewidth=1.5, color='blue', label='30-day rolling volatility') # Regime thresholds calm_threshold = 30 turbulent_threshold = 60 plt.axhline(calm_threshold, color='green', linestyle='--', linewidth=2, alpha=0.7, label=f'Calm threshold ({calm_threshold}%)') plt.axhline(turbulent_threshold, color='red', linestyle='--', linewidth=2, alpha=0.7, label=f'Turbulent threshold ({turbulent_threshold}%)') plt.fill_between(rolling_vol.index, 0, calm_threshold, alpha=0.1, color='green', label='Calm regime') plt.fill_between(rolling_vol.index, turbulent_threshold, rolling_vol.max(), alpha=0.1, color='red', label='Turbulent regime') plt.xlabel('Date', fontsize=12) plt.ylabel('Annualized Volatility (%)', fontsize=12) plt.title('Bitcoin Rolling Volatility: Regime Identification', fontsize=14, fontweight='bold') plt.legend(fontsize=10) plt.grid(alpha=0.3) plt.tight_layout() plt.show() # Calculate regime statistics calm_pct = (rolling_vol < calm_threshold).sum() / len(rolling_vol.dropna()) * 100 turbulent_pct = (rolling_vol > turbulent_threshold).sum() / len(rolling_vol.dropna()) * 100 print("\n" + "=" * 70) print("REGIME IDENTIFICATION") print("=" * 70) print(f"\nRegime Statistics:") print(f" Calm regime (<{calm_threshold}% vol): {calm_pct:>6.1f}% of days") print(f" Normal regime ({calm_threshold}-{turbulent_threshold}% vol): {100 - calm_pct - turbulent_pct:>6.1f}% of days") print(f" Turbulent regime (>{turbulent_threshold}% vol): {turbulent_pct:>6.1f}% of days") # Step 2: Sub-period GARCH comparison mid_point = len(returns_pct) // 2 returns_first = returns_pct.iloc[:mid_point] returns_second = returns_pct.iloc[mid_point:] # Fit GARCH to each sub-period model_first = arch_model(returns_first, vol='GARCH', p=1, q=1, dist='t') garch_first = model_first.fit(disp='off') persistence_first = garch_first.params['alpha[1]'] + garch_first.params['beta[1]'] model_second = arch_model(returns_second, vol='GARCH', p=1, q=1, dist='t') garch_second = model_second.fit(disp='off') persistence_second = garch_second.params['alpha[1]'] + garch_second.params['beta[1]'] # Full-sample persistence (from earlier) persistence_full = garch_fit.params['alpha[1]'] + garch_fit.params['beta[1]'] print("\n" + "=" * 70) print("STRUCTURAL BREAKS TEST: SUB-PERIOD GARCH PERSISTENCE") print("=" * 70) print(f"\nFull sample persistence: α+β = {persistence_full:.4f}") print(f"First half persistence: α+β = {persistence_first:.4f}") print(f"Second half persistence: α+β = {persistence_second:.4f}") print(f"Absolute difference: Δ = {abs(persistence_first - persistence_second):.4f}") if abs(persistence_first - persistence_second) > 0.05: print(f"\n⚠️ LARGE difference suggests REGIME SHIFTS, not true persistence!") print(f" Full-sample GARCH overestimates persistence by confusing regime changes with gradual decay.") else: print(f"\n✓ Similar persistence suggests GARCH model is stable across periods.") # Model comparison (AIC) print(f"\nModel Comparison (AIC, lower = better):") print(f" Full-sample GARCH: {garch_fit.aic:.2f}") print(f" Sub-periods total: {garch_first.aic + garch_second.aic:.2f}") if (garch_first.aic + garch_second.aic) < garch_fit.aic: improvement = garch_fit.aic - (garch_first.aic + garch_second.aic) print(f"\n✓ Sub-period models fit BETTER (AIC improvement: {improvement:.2f})") print(f" → Evidence of structural breaks / regime shifts") else: print(f"\n Full-sample model fits as well (no strong evidence of breaks)") print("\n💡 Key finding: If persistence differs across sub-periods,") print(" full-sample GARCH is OVERESTIMATING true persistence!") ``` ::: {.callout-warning} ## Implication: GARCH Persistence Partly Spurious If sub-period persistence is significantly lower than full-sample (e.g., 0.93 vs 0.98), this suggests: 1. **Regime shifts exist**: Bitcoin alternates between calm and turbulent volatility states 2. **Full-sample GARCH confuses regimes with persistence**: Mistakes regime changes for gradual decay 3. **Better models needed**: Markov-switching GARCH, regime-dependent models, threshold models **Practical impact**: Risk models using single-regime GARCH **overestimate** how long volatility shocks persist → wrong hedging ratios, wrong VaR forecasts. See [Ch 03, §3.5: Structural Breaks](../chapters/03_volatility_modelling.qmd#sec-structural-breaks) ::: ### Reflection Questions (Exercise 4) Write 250-300 words addressing: 1. **GARCH vs GJR**: Did asymmetric GJR-GARCH fit better than symmetric GARCH? What does the asymmetry parameter (γ) tell you about Bitcoin's volatility response to good vs bad news? 2. **Forecast accuracy**: How well did GARCH forecast future volatility (R² from Mincer-Zarnowitz)? When did forecasts fail most badly (look at error plot)? 3. **Structural breaks**: Did sub-period persistence differ from full-sample? If yes, what does this imply about using full-sample GARCH for risk management? 4. **Practical implications**: If you were designing a crypto risk model, would you use single-regime GARCH or a regime-switching model? Justify your choice. --- ## Summary and Integration ### What We've Learned Through these exercises, you've: 1. **Accessed real cryptocurrency market data** using public APIs, experiencing data quality challenges and fragmentation 2. **Quantified extreme volatility** (60-80% annualized) that makes cryptocurrency unsuitable as currency and challenging as investment 3. **Documented fat tail distributions** that violate normal distribution assumptions and cause standard risk models to underestimate tail risk 4. **Measured high correlations** within crypto (0.5-0.8) limiting diversification benefits 5. **Tested market efficiency** finding mixed evidence: some weak predictability but likely not exploitable after costs 6. **Evaluated inclusion claims** implicitly through data analysis: if crypto were banking the unbanked, we'd see different adoption and usage patterns ### Connections to Course Themes - **Week 2 (APIs)**: Cryptocurrency data is openly accessible via APIs, democratizing financial data but creating standardization challenges - **Week 3 (Platforms)**: Exchanges are platforms matching buyers/sellers; fragmentation creates arbitrage opportunities but liquidity challenges - **Week 6 (Financial Inclusion)**: Mobile money (M-Pesa) showed rigorous welfare evidence; cryptocurrency shows speculative usage among wealthy - **Week 8 (Blockchain)**: Next week explores blockchain technology and fraud detection more deeply ### Critical Evaluation Framework When evaluating cryptocurrency or any FinTech innovation: 1. **Examine actual data** (adoption, usage, outcomes) versus marketing claims 2. **Measure risks quantitatively** (volatility, correlations, tail risk) 3. **Compare to alternatives** (mobile money, traditional finance) 4. **Demand welfare evidence** (does it help intended beneficiaries?) 5. **Account for barriers** (technical, knowledge, economic) ### Assessment Preparation If your assessment involves a short research report or reflective analysis, this lab gives you two strong pathways: - Empirical analysis of crypto returns (momentum, volatility, correlations, tail risk) - Evidence-based evaluation of “crypto for inclusion” claims using data, mechanisms, and limitations ### Further Exploration If interested in extending your analysis: - **Cross-asset correlations**: Download S&P 500 or gold data; analyze Bitcoin-equity correlation dynamics - **Volatility forecasting**: Implement GARCH models to forecast future volatility - **Arbitrage opportunities**: Compare prices across multiple exchanges in real-time - **DeFi analysis**: Examine yield farming APYs, liquidity pool dynamics, or stablecoin deviations from peg - **On-chain metrics**: Analyze blockchain data (active addresses, transaction volumes) as predictors --- **Excellent work! You've completed rigorous empirical analysis of cryptocurrency markets, connecting data to theory and claims to evidence.**