Causal Time Series Analysis for Financial Data

Author

Professor Barry Quinn

Financial time series analysis has evolved beyond traditional correlation-based methods to embrace causal reasoning. This chapter combines the technical mastery of time series analysis from “Python for Finance” with the analytical rigor of causal inference from “Causal AI.” You’ll learn to not just identify patterns in financial time series, but to understand the underlying causal mechanisms that drive market movements - a crucial skill for final year finance students entering an AI-driven industry.

0.1 The Evolution of Financial Time Series Analysis

Traditional time series analysis focuses on patterns, trends, and correlations. While powerful, these methods can lead to spurious relationships and overfitted models. Modern financial analysis requires a deeper understanding of causality:

Traditional Question: “What patterns can we find in stock prices?”
Causal Question: “What actually causes stock price movements, and how can we distinguish real drivers from spurious correlations?”

Why Causal Time Series Analysis Matters

Consider the 2008 financial crisis: Many models predicted continued growth based on historical correlations. A causal approach would have questioned whether these correlations reflected true causal relationships or were artifacts of a specific economic regime. Understanding causality helps build more robust models that perform better during regime changes.

0.2 Integrated Approach: Technical Skills + Causal Reasoning

This chapter integrates two complementary approaches:

0.2.1 Technical Foundation (Python for Finance - Chapter 8)

Master pandas for time series manipulation
Implement ARIMA, GARCH, and advanced models
Build high-performance time series pipelines
Create professional financial visualizations

0.2.2 Causal Enhancement (Causal AI - Chapters 3-4)

Build causal graphs (DAGs) for financial relationships
Test causal assumptions in time series data
Implement interventional analysis
Distinguish correlation from causation over time

0.3 Practical Example: Traditional vs. Causal Time Series Analysis

Let’s demonstrate the integration using a real financial scenario: analyzing the relationship between interest rates, inflation, and stock market performance.

0.3.1 Setup: Data Acquisition and Preparation

# Import libraries from both textbooks
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from datetime import datetime, timedelta

# Causal inference libraries
import dowhy
from dowhy import CausalModel
import pgmpy
from pgmpy.models import BayesianNetwork
from pgmpy.plotting import plot_model

# Time series libraries
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, grangercausalitytests
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Download financial time series data
def get_financial_data():
    """
    Download key financial time series for causal analysis
    """
    # Stock market index (S&P 500)
    sp500 = yf.download('^GSPC', start='2010-01-01', end='2024-01-01')['Adj Close']
    
    # VIX (Volatility Index)
    vix = yf.download('^VIX', start='2010-01-01', end='2024-01-01')['Adj Close']
    
    # 10-Year Treasury Rate (proxy for interest rates)
    treasury = yf.download('^TNX', start='2010-01-01', end='2024-01-01')['Adj Close']
    
    # Combine into single DataFrame
    data = pd.DataFrame({
        'SP500': sp500,
        'VIX': vix,
        'Treasury_Rate': treasury
    }).dropna()
    
    # Calculate returns and changes
    data['SP500_Returns'] = data['SP500'].pct_change()
    data['VIX_Change'] = data['VIX'].pct_change()
    data['Rate_Change'] = data['Treasury_Rate'].diff()
    
    return data.dropna()

# Load and prepare data
financial_data = get_financial_data()
print("Data loaded successfully!")
print(f"Data shape: {financial_data.shape}")
print(f"Date range: {financial_data.index.min()} to {financial_data.index.max()}")

0.3.2 Traditional Time Series Analysis

# Traditional approach: Correlation and Granger Causality
def traditional_analysis(data):
    """
    Perform traditional time series analysis
    """
    # 1. Correlation Analysis
    correlation_matrix = data[['SP500_Returns', 'VIX_Change', 'Rate_Change']].corr()
    
    plt.figure(figsize=(12, 5))
    
    # Plot 1: Correlation heatmap
    plt.subplot(1, 2, 1)
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
    plt.title('Traditional Correlation Analysis')
    
    # Plot 2: Time series
    plt.subplot(1, 2, 2)
    plt.plot(data.index, data['SP500_Returns'], label='S&P 500 Returns', alpha=0.7)
    plt.plot(data.index, data['VIX_Change']/10, label='VIX Change (scaled)', alpha=0.7)
    plt.plot(data.index, data['Rate_Change']*10, label='Rate Change (scaled)', alpha=0.7)
    plt.legend()
    plt.title('Financial Time Series')
    plt.xticks(rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # 2. Granger Causality Tests
    print("\\n=== TRADITIONAL GRANGER CAUSALITY TESTS ===")
    
    # Test if VIX changes Granger-cause S&P 500 returns
    granger_data = data[['SP500_Returns', 'VIX_Change']].dropna()
    granger_result = grangercausalitytests(granger_data, maxlag=5, verbose=False)
    
    print("VIX -> S&P 500 Returns:")
    for lag in range(1, 6):
        p_value = granger_result[lag][0]['ssr_ftest'][1]
        print(f"  Lag {lag}: p-value = {p_value:.4f}")
    
    return correlation_matrix

# Run traditional analysis
traditional_results = traditional_analysis(financial_data)

0.3.3 Enhanced Causal Analysis

# Causal approach: Directed Acyclic Graph (DAG) and Causal Inference
def causal_analysis(data):
    """
    Perform causal analysis using DAGs and causal inference
    """
    # 1. Define causal graph based on economic theory
    causal_graph = """
    digraph {
        "Economic_Conditions" -> "Treasury_Rate";
        "Economic_Conditions" -> "Market_Sentiment";
        "Market_Sentiment" -> "VIX";
        "Market_Sentiment" -> "SP500_Returns";
        "Treasury_Rate" -> "SP500_Returns";
        "VIX" -> "SP500_Returns";
    }
    """
    
    # 2. Prepare data for causal analysis
    # Add simulated economic conditions (in practice, use real economic indicators)
    causal_data = data.copy()
    causal_data['Economic_Conditions'] = np.random.normal(0, 1, len(data))
    causal_data['Market_Sentiment'] = np.random.normal(0, 1, len(data))
    
    # Select relevant columns
    analysis_data = causal_data[['SP500_Returns', 'VIX_Change', 'Rate_Change', 
                                'Economic_Conditions', 'Market_Sentiment']].dropna()
    
    # 3. Build causal model
    model = CausalModel(
        data=analysis_data,
        treatment='VIX_Change',
        outcome='SP500_Returns',
        graph=causal_graph
    )
    
    # 4. Identify causal effect
    identified_estimand = model.identify_effect()
    print("\\n=== CAUSAL ANALYSIS RESULTS ===")
    print("Causal Identification:")
    print(identified_estimand)
    
    # 5. Estimate causal effect
    causal_estimate = model.estimate_effect(
        identified_estimand,
        method_name="backdoor.linear_regression"
    )
    
    print(f"\\nCausal Effect of VIX on S&P 500 Returns: {causal_estimate.value:.6f}")
    print(f"Traditional Correlation: {analysis_data['VIX_Change'].corr(analysis_data['SP500_Returns']):.6f}")
    
    # 6. Validate causal model
    refutation = model.refute_estimate(
        identified_estimand, 
        causal_estimate, 
        method_name="random_common_cause"
    )
    print(f"\\nRobustness Check - Random Common Cause:")
    print(f"New Effect: {refutation.new_effect:.6f}")
    
    return model, causal_estimate

# Run causal analysis
causal_model, causal_results = causal_analysis(financial_data)

0.3.4 Comparison and Insights

def compare_approaches(traditional_corr, causal_effect):
    """
    Compare traditional and causal approaches
    """
    print("\\n=== COMPARISON: TRADITIONAL vs CAUSAL ===")
    
    vix_sp_corr = traditional_corr.loc['VIX_Change', 'SP500_Returns']
    
    print(f"Traditional Correlation (VIX ↔ S&P 500): {vix_sp_corr:.6f}")
    print(f"Causal Effect (VIX → S&P 500): {causal_effect.value:.6f}")
    
    print("\\nKey Insights:")
    print("1. Correlation measures statistical association")
    print("2. Causal effect measures actual impact of intervention")
    print("3. They can differ significantly due to confounding factors")
    print("4. Causal analysis helps identify true drivers vs. spurious correlations")
    
    # Practical implications
    print("\\nPractical Implications for Finance:")
    print("• Portfolio Management: Focus on causal drivers, not just correlations")
    print("• Risk Management: Identify true risk sources")
    print("• Trading Strategies: Build strategies on causal relationships")
    print("• Regulatory Compliance: Explain model decisions with causal reasoning")

# Compare results
compare_approaches(traditional_results, causal_results)

Integration Benefits for Final Year Students

This integrated approach provides several advantages:

Technical Proficiency: Master industry-standard time series methods
Analytical Rigor: Understand when correlations reflect true causation
Career Preparation: Develop skills valued by modern financial institutions
Research Skills: Conduct more robust financial research
Regulatory Compliance: Meet increasing demands for explainable AI

0.4 Advanced Applications: Structural Breaks and Regime Changes

One area where causal analysis particularly shines is in understanding structural breaks and regime changes in financial markets:

# Example: Analyzing COVID-19 impact using causal intervention analysis
def analyze_structural_break():
    """
    Demonstrate causal analysis of structural breaks
    """
    # This would typically involve:
    # 1. Identifying break points using statistical tests
    # 2. Building separate causal models for different regimes
    # 3. Analyzing how causal relationships changed
    # 4. Predicting future behavior under different scenarios
    
    print("Advanced Topic: Structural Break Analysis")
    print("• Traditional methods: Identify break points statistically")
    print("• Causal enhancement: Understand WHY relationships changed")
    print("• Applications: COVID-19, financial crises, policy changes")

analyze_structural_break()

1 Why Study Financial Time Series Econometrics?

At its core, financial time series data is a collection of observations recorded sequentially over time. It encompasses a broad spectrum of data types, including daily stock prices, monthly interest rates, annual GDP figures, and more. Each data point in a time series bears a timestamp, reflecting its unique position in the temporal sequence. This inherent time-dependency is what sets financial time series apart from other statistical data, introducing complexities like trends, seasonality, and autocorrelation.

Time series analysis is the linchpin of economic forecasting. By dissecting historical data, analysts unlock patterns and rhythms—trends, seasonal effects, and cycles. These insights are instrumental in projecting future economic scenarios, informing decisions in areas like portfolio management, risk mitigation, and economic policy development.

The financial markets are a fertile ground for the application of time series analysis. Techniques like ARIMA modeling, volatility forecasting, and cointegration analysis are employed to predict stock prices, evaluate risks, and unearth trading signals. Traders scrutinize past price trajectories to anticipate future movements, while risk managers use time series data to gauge market volatility and shield against potential downturns.

Time series analysis is a cornerstone of modern investment strategy. Investors and portfolio managers rely on these analyses to track market trends, gauge asset performance, and time their buy-and-sell decisions. Sophisticated techniques like GARCH models for volatility forecasting and VAR models for understanding the dynamic interplay between multiple financial variables are integral in shaping well-informed, resilient investment portfolios.

1.1 Challenges and Considerations

While financial time series analysis provides powerful insights, it comes with challenges. Financial markets are influenced by a myriad of factors - economic indicators, political events, investor sentiment - making modeling and prediction complex. Analysts must be wary of overfitting models and remain vigilant to changing market dynamics. Moreover, the assumption of stationarity in time series data often requires careful examination and potential transformation of the data.

Financial time series data is a gateway to deeper insights into the financial universe. Its analysis, through a blend of statistical techniques and domain expertise, equips finance professionals with the tools to navigate the complexities of financial markets. From predicting stock prices to understanding economic trends, time series analysis is an indispensable part of financial decision-making.

In this chapter, we will delve deeper into the methodologies and tools of financial time series analysis. We will explore various models, from simple moving averages to complex ARIMA and GARCH models, and discuss their applications in real-world financial scenarios. The goal is to equip readers with a comprehensive understanding of time series analysis, enabling them to apply these concepts effectively in their professional endeavors in finance.

2 Characteristics of Financial Time Series Data

Financial time series data exhibit unique characteristics that differentiate them from other types of data, making their analysis and modeling crucial for anyone involved in financial markets research or practice. Understanding these features is essential as they not only describe the behavior of financial data but also guide the selection of appropriate quantitative methods.

2.1 Volatility Clustering (VC)

One of the most prominent features of financial time series is volatility clustering (VC), which refers to the tendency of periods of high volatility to be followed by more high volatility periods, and low volatility periods to be followed by more low volatility periods (Engle, 1995). This phenomenon can be described by the GARCH model, a generalization of the Autoregressive Conditional Heteroscedasticity (ARCH) model, which allows for varying levels of volatility over time. VC is particularly evident in stock market data (Bollerslev & Engle, 1992), where large price changes are often followed by similar-sized changes.

Behavioral Economics Perspective: Volatility clustering can be influenced by investor reactions to news or market events, where overreactions or underreactions to new information can lead to periods of heightened or reduced volatility. This behavioral response is often modeled through investor sentiment and its impact on market dynamics.

2.2 Leverage Effects (LE)

Leverage effects (LE) occur when negative asset returns are associated with an increase in volatility, more than positive returns of the same magnitude. This asymmetric volatility challenges the assumption of constant volatility in traditional financial models. LE can be explained by the relationship between firm leverage and stock volatility, where declining stock prices increase the debt-to-equity ratio, making the firm riskier.

Behavioral Economics Perspective: Leverage effects can be partially attributed to investor psychology, where negative news or declining prices trigger more emotional responses than positive news, leading to increased trading activity and higher volatility during market downturns.

2.3 Heavy Tails and Excess Kurtosis

Financial returns often exhibit heavy tails, meaning that extreme events (both positive and negative) occur more frequently than would be predicted by a normal distribution. This characteristic is captured by excess kurtosis, where the distribution has “fatter” tails than the normal distribution. Heavy tails are crucial for risk management, as they indicate that extreme losses (or gains) are more likely than traditional models suggest.

Behavioral Economics Perspective: Heavy tails in financial returns can be linked to herding behavior and market panics, where investors collectively react to extreme events, amplifying their impact and creating more frequent extreme outcomes than would occur under purely rational decision-making.

2.4 Mean Reversion

Mean reversion is the tendency for asset prices or returns to move back toward their long-term average over time. This characteristic suggests that periods of unusually high or low performance are often followed by periods of more typical performance. Mean reversion is particularly important for long-term investment strategies and portfolio rebalancing decisions.

Behavioral Economics Perspective: Mean reversion can be influenced by investor overconfidence and momentum trading, where initial price movements are amplified by trend-following behavior, but eventually, fundamental values reassert themselves as rational analysis prevails.

2.5 Non-stationarity

Most financial time series exhibit non-stationarity, meaning their statistical properties (mean, variance, covariance) change over time. Non-stationarity can manifest as trends, structural breaks, or changing volatility patterns. This characteristic poses challenges for traditional statistical models that assume stationarity and requires special treatment through techniques like differencing or cointegration analysis.

Behavioral Economics Perspective: Non-stationarity in financial markets can reflect changing investor preferences, evolving market structures, regulatory changes, and shifts in economic fundamentals, all of which influence how investors process information and make decisions over time.

2.6 Time Series Modeling Approaches

In financial data analysis, time series data often exhibit patterns, trends, and fluctuations that require appropriate modeling and processing techniques to extract meaningful insights. Two commonly used approaches are ARIMA (Autoregressive Integrated Moving Average) modeling and smoothing techniques.

2.6.1 ARIMA Modeling

ARIMA models are a class of statistical models widely used for time series forecasting and analysis. These models aim to describe the autocorrelations in the data by combining autoregressive (AR) and moving average (MA) components, along with differencing to handle non-stationarity.

The key aspects of ARIMA modeling are:

Stationarity: ARIMA models assume that the time series is stationary, meaning that its statistical properties (mean, variance, and autocorrelation) remain constant over time. If the data is non-stationary, differencing is applied to achieve stationarity.
Autocorrelation: ARIMA models capture the data’s autocorrelation structure, where future values are influenced by past values and/or past errors.
Model Identification: The ARIMA model is specified by three parameters: p (order of the autoregressive component), d (degree of differencing), and q (order of the moving average component). These parameters are determined through an iterative model identification, estimation, and diagnostic checking process.
Forecasting: Once an appropriate ARIMA model is identified and estimated, it can generate forecasts for future periods.

ARIMA models are suitable when the goal is to capture the underlying patterns and dynamics of the time series data, including trends, seasonality, and autocorrelation structures. They are widely used in finance for forecasting stock prices, exchange rates, and economic indicators.

2.6.2 Smoothing Techniques

Smoothing techniques reduce the noise or irregularities in time series data, revealing the underlying trend or signal. These techniques do not explicitly model the autocorrelation structure but rather apply filters or weighted averages to smooth out the fluctuations.

Standard smoothing techniques include:

Simple Moving Average (SMA):
- Description: Calculates the average of a fixed number of data points over a specified window
- Formula: SMA(t) = (y(t) + y(t-1) + … + y(t-n+1)) / n
- Applications: Widely used in technical analysis for identifying trends and generating trading signals
- Advantages: Easy to understand and implement, effective for removing high-frequency noise
- Limitations: Introduces lag in the smoothed series, sensitive to outliers
Exponential Moving Average (EMA):
- Description: Assigns exponentially decreasing weights to older data points, giving more importance to recent observations
- Formula: EMA(t) = α * y(t) + (1 - α) * EMA(t-1)
- Applications: Commonly used in technical analysis, forecasting, and signal processing
- Advantages: Responds more quickly to changes, has less lag than SMA
- Limitations: Requires tuning the smoothing parameter (α)
Weighted Moving Average (WMA):
- Description: Assigns different weights to data points within the window
- Applications: Used in technical analysis and trend analysis
- Advantages: Allows for customized weighting schemes
- Limitations: Requires careful selection of weights
Kalman Filter:
- Description: A recursive algorithm that estimates the actual state of a dynamic system from noisy observations
- Applications: Portfolio optimization, risk management, and forecasting
- Advantages: Optimal for linear systems with Gaussian noise, handles missing data
- Limitations: Assumes linear system model and Gaussian noise

The choice between ARIMA modeling and smoothing techniques depends on the specific objectives and characteristics of the financial time series data. ARIMA models are more appropriate if the goal is to forecast future values while accounting for autocorrelation and capturing underlying patterns. However, smoothing techniques may be more suitable if the focus is on denoising the data and revealing the underlying trend or signal.

2.7 Stationarity and Unit Roots in Financial Time Series

Understanding the concepts of stationarity and unit roots is fundamental in financial time series analysis. These concepts are critical in selecting appropriate models for analysis and ensuring the reliability of statistical inferences.

2.7.1 Stationarity

Definition: A time series is said to be stationary if its statistical properties such as mean, variance, and autocorrelation are constant over time. In finance, this implies that the time series does not evolve in a predictable manner over time.
Importance: Stationarity is a key assumption in many time series models. Non-stationary data can lead to spurious results in statistical tests and forecasts.
Testing for Stationarity: Common tests include the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.

2.7.2 Unit Roots

Definition: A unit root is a characteristic of a time series that makes it non-stationary. Presence of a unit root indicates that the time series is subject to random walks or drifts.
Detection: Unit roots can be detected using tests such as the ADF test, where the null hypothesis is that the time series has a unit root.
Implications: Time series with unit roots require differencing or other transformations to achieve stationarity before further analysis.

2.8 Theory Behind Reproducibility and Replication

2.8.1 Replication

Replicability refers to the ability to duplicate the results of a study by using the same methodology but with different data sets. In other words, if other researchers follow the same procedures and methods but use new data, they should arrive at similar findings or conclusions. In financial data analytics this is particularly important because financial models and algorithms should be robust and consistent across different data sets. For instance, a risk assessment model should yield reliable and consistent risk evaluations across various market conditions or customer profiles.

2.8.2 Reproducibility

Reproducibility, on the other hand, refers to the ability to recreate the results of a study by using the same methodology and the same data. It’s about the precision in the replication of the original study’s setup, including the data and the computational procedures. In economics and finance, reproducibility ensures that if another researcher or practitioner uses the same data and follows the same steps, they would arrive at the same results. This is crucial for validating the findings of financial models, statistical analyses, or data-driven research.

2.8.2.1 Nuances and Differences

Data Used: The key difference lies in the data used. Replicability involves different datasets, whereas reproducibility uses the original dataset.
Purpose:
- Replicability tests the generalizability and robustness of the findings or models across different scenarios or datasets.
- Reproducibility ensures the accuracy and reliability of the original study’s findings and methods.
Challenges:
- Replicability can be challenging due to varying data quality, different market conditions, or evolving financial landscapes.
- Reproducibility can be hindered by incomplete documentation, software version differences, or proprietary data access issues.
Importance in Finance:
- Replicability is vital for model validation and ensuring that financial strategies work across different market conditions.
- Reproducibility is essential for regulatory compliance, peer review, and building trust in financial research and analysis.

Both replicability and reproducibility are fundamental to the credibility and reliability of research and analysis in economics and finance. They ensure that findings are not just one-time occurrences but represent genuine insights that can be trusted and built upon by the broader academic and professional community.

2.9 Practical Implementation with Python

To demonstrate these concepts, let’s consider a practical example using Python’s powerful ecosystem for time series analysis.

Suppose we want to analyze the daily closing prices of a stock (e.g., Apple Inc.). We can employ time series models to forecast future prices, assess volatility, or identify trends.

# Essential imports for time series analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Statistical and time series libraries
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from arch import arch_model
import scipy.stats as stats

# Set random seed for reproducibility
np.random.seed(42)

# Fetch Apple stock data
print("Fetching AAPL stock data...")
aapl = yf.download('AAPL', start='2020-01-01', end='2024-01-01')
aapl_close = aapl['Adj Close']

print(f"Data shape: {aapl_close.shape}")
print(f"Date range: {aapl_close.index.min()} to {aapl_close.index.max()}")

# Basic time series visualization
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# Plot closing prices
axes[0].plot(aapl_close.index, aapl_close.values, linewidth=1.5, color='blue')
axes[0].set_title('AAPL Closing Prices', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].grid(True, alpha=0.3)

# Calculate and plot returns
aapl_returns = aapl_close.pct_change().dropna()
axes[1].plot(aapl_returns.index, aapl_returns.values, linewidth=0.8, color='red', alpha=0.7)
axes[1].set_title('AAPL Daily Returns', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Daily Return')
axes[1].set_xlabel('Date')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Basic statistics
print("AAPL Returns Statistics:")
print(f"Mean daily return: {aapl_returns.mean():.6f}")
print(f"Daily volatility: {aapl_returns.std():.6f}")
print(f"Annualized return: {aapl_returns.mean() * 252:.4f}")
print(f"Annualized volatility: {aapl_returns.std() * np.sqrt(252):.4f}")

2.10 Characteristics of Financial Time Series

2.10.1 1. Volatility Clustering

One of the most prominent features of financial time series is volatility clustering, which refers to the tendency of periods of high volatility to be followed by more high volatility periods, and low volatility periods to be followed by more low volatility periods.

def demonstrate_volatility_clustering():
    """
    Demonstrate volatility clustering in financial returns
    """
    # Calculate rolling volatility
    window = 20
    rolling_vol = aapl_returns.rolling(window=window).std() * np.sqrt(252)
    
    # Create visualization
    fig, axes = plt.subplots(3, 1, figsize=(14, 12))
    
    # Returns
    axes[0].plot(aapl_returns.index, aapl_returns.values, alpha=0.7, linewidth=0.8)
    axes[0].set_title('Daily Returns', fontweight='bold')
    axes[0].set_ylabel('Return')
    axes[0].grid(True, alpha=0.3)
    
    # Absolute returns
    abs_returns = np.abs(aapl_returns)
    axes[1].plot(abs_returns.index, abs_returns.values, color='orange', alpha=0.7, linewidth=0.8)
    axes[1].set_title('Absolute Returns (Proxy for Volatility)', fontweight='bold')
    axes[1].set_ylabel('|Return|')
    axes[1].grid(True, alpha=0.3)
    
    # Rolling volatility
    axes[2].plot(rolling_vol.index, rolling_vol.values, color='red', linewidth=1.5)
    axes[2].set_title(f'{window}-Day Rolling Annualized Volatility', fontweight='bold')
    axes[2].set_ylabel('Volatility')
    axes[2].set_xlabel('Date')
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Statistical test for volatility clustering
    from statsmodels.stats.diagnostic import acorr_ljungbox
    
    # Test autocorrelation in squared returns (sign of volatility clustering)
    squared_returns = aapl_returns ** 2
    lb_test = acorr_ljungbox(squared_returns, lags=10, return_df=True)
    
    print("Ljung-Box Test for Volatility Clustering (Squared Returns):")
    print(f"First lag p-value: {lb_test.iloc[0]['lb_pvalue']:.6f}")
    if lb_test.iloc[0]['lb_pvalue'] < 0.05:
        print("✓ Evidence of volatility clustering detected")
    else:
        print("✗ No significant evidence of volatility clustering")

demonstrate_volatility_clustering()

2.10.2 2. Heavy Tails and Non-Normality

Financial returns often exhibit heavy tails, meaning extreme events occur more frequently than predicted by a normal distribution.

def analyze_return_distribution():
    """
    Analyze the distribution characteristics of financial returns
    """
    # Remove any infinite or NaN values
    clean_returns = aapl_returns.dropna()
    clean_returns = clean_returns[np.isfinite(clean_returns)]
    
    # Calculate distribution statistics
    mean_ret = np.mean(clean_returns)
    std_ret = np.std(clean_returns)
    skewness = stats.skew(clean_returns)
    kurt = stats.kurtosis(clean_returns)
    
    print("Distribution Analysis:")
    print(f"Mean: {mean_ret:.6f}")
    print(f"Standard Deviation: {std_ret:.6f}")
    print(f"Skewness: {skewness:.4f}")
    print(f"Excess Kurtosis: {kurt:.4f}")
    
    # Normality tests
    jb_stat, jb_pvalue = stats.jarque_bera(clean_returns)
    sw_stat, sw_pvalue = stats.shapiro(clean_returns[:5000])  # Shapiro-Wilk for smaller sample
    
    print(f"\nNormality Tests:")
    print(f"Jarque-Bera test p-value: {jb_pvalue:.2e}")
    print(f"Shapiro-Wilk test p-value: {sw_pvalue:.2e}")
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Histogram with normal overlay
    axes[0,0].hist(clean_returns, bins=100, density=True, alpha=0.7, color='skyblue', edgecolor='black')
    
    # Overlay normal distribution
    x_normal = np.linspace(clean_returns.min(), clean_returns.max(), 100)
    normal_pdf = stats.norm.pdf(x_normal, mean_ret, std_ret)
    axes[0,0].plot(x_normal, normal_pdf, 'r-', linewidth=2, label='Normal Distribution')
    axes[0,0].set_title('Return Distribution vs Normal')
    axes[0,0].set_xlabel('Return')
    axes[0,0].set_ylabel('Density')
    axes[0,0].legend()
    axes[0,0].grid(True, alpha=0.3)
    
    # Q-Q plot
    stats.probplot(clean_returns, dist="norm", plot=axes[0,1])
    axes[0,1].set_title('Q-Q Plot vs Normal Distribution')
    axes[0,1].grid(True, alpha=0.3)
    
    # Box plot
    axes[1,0].boxplot(clean_returns, vert=True)
    axes[1,0].set_title('Box Plot of Returns')
    axes[1,0].set_ylabel('Return')
    axes[1,0].grid(True, alpha=0.3)
    
    # Comparison with t-distribution
    # Fit t-distribution
    t_params = stats.t.fit(clean_returns)
    t_pdf = stats.t.pdf(x_normal, *t_params)
    
    axes[1,1].hist(clean_returns, bins=100, density=True, alpha=0.7, color='lightgreen', edgecolor='black')
    axes[1,1].plot(x_normal, normal_pdf, 'r-', linewidth=2, label='Normal')
    axes[1,1].plot(x_normal, t_pdf, 'b-', linewidth=2, label='t-distribution')
    axes[1,1].set_title('Distribution Comparison')
    axes[1,1].set_xlabel('Return')
    axes[1,1].set_ylabel('Density')
    axes[1,1].legend()
    axes[1,1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return clean_returns

clean_returns = analyze_return_distribution()

2.10.3 3. Autocorrelation Analysis

def autocorrelation_analysis(returns, lags=40):
    """
    Analyze autocorrelation in returns and squared returns
    """
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # ACF of returns
    plot_acf(returns, lags=lags, ax=axes[0,0], title='ACF of Returns')
    axes[0,0].grid(True, alpha=0.3)
    
    # PACF of returns
    plot_pacf(returns, lags=lags, ax=axes[0,1], title='PACF of Returns')
    axes[0,1].grid(True, alpha=0.3)
    
    # ACF of squared returns (volatility clustering)
    squared_returns = returns ** 2
    plot_acf(squared_returns, lags=lags, ax=axes[1,0], title='ACF of Squared Returns')
    axes[1,0].grid(True, alpha=0.3)
    
    # ACF of absolute returns
    abs_returns = np.abs(returns)
    plot_acf(abs_returns, lags=lags, ax=axes[1,1], title='ACF of Absolute Returns')
    axes[1,1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Statistical tests
    from statsmodels.stats.diagnostic import acorr_ljungbox
    
    # Test for serial correlation in returns
    lb_returns = acorr_ljungbox(returns, lags=10, return_df=True)
    lb_squared = acorr_ljungbox(squared_returns, lags=10, return_df=True)
    
    print("Autocorrelation Tests:")
    print(f"Returns - First lag p-value: {lb_returns.iloc[0]['lb_pvalue']:.6f}")
    print(f"Squared Returns - First lag p-value: {lb_squared.iloc[0]['lb_pvalue']:.6f}")

autocorrelation_analysis(clean_returns)

2.11 Stationarity Testing

Stationarity is a crucial concept in time series analysis. A stationary time series has statistical properties that don’t change over time.

def stationarity_tests(series, name="Series"):
    """
    Perform comprehensive stationarity tests
    """
    print(f"Stationarity Tests for {name}:")
    print("=" * 50)
    
    # Augmented Dickey-Fuller test
    adf_result = adfuller(series, autolag='AIC')
    print(f"ADF Test:")
    print(f"  ADF Statistic: {adf_result[0]:.6f}")
    print(f"  p-value: {adf_result[1]:.6f}")
    print(f"  Critical Values:")
    for key, value in adf_result[4].items():
        print(f"    {key}: {value:.6f}")
    
    if adf_result[1] <= 0.05:
        print("  ✓ Reject null hypothesis - Series is stationary")
    else:
        print("  ✗ Fail to reject null hypothesis - Series may have unit root")
    
    print()
    
    # KPSS test
    kpss_result = kpss(series, regression='c', nlags="auto")
    print(f"KPSS Test:")
    print(f"  KPSS Statistic: {kpss_result[0]:.6f}")
    print(f"  p-value: {kpss_result[1]:.6f}")
    print(f"  Critical Values:")
    for key, value in kpss_result[3].items():
        print(f"    {key}: {value:.6f}")
    
    if kpss_result[1] >= 0.05:
        print("  ✓ Fail to reject null hypothesis - Series is stationary")
    else:
        print("  ✗ Reject null hypothesis - Series is non-stationary")
    
    print("\n" + "="*50)

# Test stationarity of prices and returns
print("Testing AAPL Prices:")
stationarity_tests(aapl_close.dropna(), "AAPL Prices")

print("\nTesting AAPL Returns:")
stationarity_tests(clean_returns, "AAPL Returns")

2.12 ARIMA Modeling

ARIMA (AutoRegressive Integrated Moving Average) models are fundamental tools for time series forecasting.

def arima_modeling(series, max_p=5, max_d=2, max_q=5):
    """
    Automatic ARIMA model selection and fitting
    """
    from statsmodels.tsa.arima.model import ARIMA
    from itertools import product
    
    # Grid search for best ARIMA parameters
    best_aic = np.inf
    best_order = None
    best_model = None
    
    # Create parameter combinations
    p_values = range(0, max_p + 1)
    d_values = range(0, max_d + 1)
    q_values = range(0, max_q + 1)
    
    print("Searching for optimal ARIMA parameters...")
    
    for p, d, q in product(p_values, d_values, q_values):
        try:
            model = ARIMA(series, order=(p, d, q))
            fitted_model = model.fit()
            
            if fitted_model.aic < best_aic:
                best_aic = fitted_model.aic
                best_order = (p, d, q)
                best_model = fitted_model
                
        except:
            continue
    
    print(f"Best ARIMA order: {best_order}")
    print(f"Best AIC: {best_aic:.4f}")
    
    # Model diagnostics
    print("\nModel Summary:")
    print(best_model.summary())
    
    # Residual analysis
    residuals = best_model.resid
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Residuals plot
    axes[0,0].plot(residuals)
    axes[0,0].set_title('Residuals')
    axes[0,0].grid(True, alpha=0.3)
    
    # Residuals histogram
    axes[0,1].hist(residuals, bins=30, density=True, alpha=0.7)
    axes[0,1].set_title('Residuals Distribution')
    axes[0,1].grid(True, alpha=0.3)
    
    # Q-Q plot of residuals
    stats.probplot(residuals, dist="norm", plot=axes[1,0])
    axes[1,0].set_title('Q-Q Plot of Residuals')
    axes[1,0].grid(True, alpha=0.3)
    
    # ACF of residuals
    plot_acf(residuals, ax=axes[1,1], lags=20, title='ACF of Residuals')
    axes[1,1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Ljung-Box test on residuals
    from statsmodels.stats.diagnostic import acorr_ljungbox
    lb_test = acorr_ljungbox(residuals, lags=10, return_df=True)
    print(f"\nLjung-Box test p-value: {lb_test.iloc[0]['lb_pvalue']:.6f}")
    
    if lb_test.iloc[0]['lb_pvalue'] > 0.05:
        print("✓ Residuals appear to be white noise")
    else:
        print("✗ Residuals show significant autocorrelation")
    
    return best_model, best_order

# Fit ARIMA model to returns
arima_model, arima_order = arima_modeling(clean_returns)

# Forecasting with ARIMA
def arima_forecast(model, steps=30):
    """
    Generate forecasts using fitted ARIMA model
    """
    # Generate forecasts
    forecast_result = model.forecast(steps=steps)
    conf_int = model.get_forecast(steps=steps).conf_int()
    
    # Create forecast dates
    last_date = clean_returns.index[-1]
    forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), 
                                 periods=steps, freq='D')
    
    # Visualization
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Plot historical data (last 100 points)
    historical = clean_returns.tail(100)
    ax.plot(historical.index, historical.values, label='Historical Returns', color='blue')
    
    # Plot forecasts
    ax.plot(forecast_dates, forecast_result, label='Forecast', color='red', linestyle='--')
    
    # Plot confidence intervals
    ax.fill_between(forecast_dates, 
                   conf_int.iloc[:, 0], 
                   conf_int.iloc[:, 1], 
                   color='red', alpha=0.2, label='95% Confidence Interval')
    
    ax.set_title('ARIMA Forecast of AAPL Returns')
    ax.set_xlabel('Date')
    ax.set_ylabel('Return')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return forecast_result, conf_int

forecast_returns, forecast_conf_int = arima_forecast(arima_model)

2.13 GARCH Modeling for Volatility

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models are essential for modeling time-varying volatility.

def garch_modeling(returns):
    """
    Fit GARCH models to financial returns
    """
    from arch import arch_model
    
    # Convert returns to percentage for numerical stability
    returns_pct = returns * 100
    
    # Fit different GARCH specifications
    models = {
        'GARCH(1,1)': arch_model(returns_pct, vol='Garch', p=1, q=1),
        'EGARCH(1,1)': arch_model(returns_pct, vol='EGARCH', p=1, q=1),
        'GJR-GARCH(1,1)': arch_model(returns_pct, vol='GARCH', p=1, o=1, q=1)
    }
    
    results = {}
    
    print("GARCH Model Comparison:")
    print("=" * 50)
    
    for name, model in models.items():
        try:
            fitted_model = model.fit(disp='off')
            results[name] = fitted_model
            
            print(f"{name}:")
            print(f"  Log-Likelihood: {fitted_model.loglikelihood:.4f}")
            print(f"  AIC: {fitted_model.aic:.4f}")
            print(f"  BIC: {fitted_model.bic:.4f}")
            print()
            
        except Exception as e:
            print(f"Error fitting {name}: {e}")
    
    # Select best model based on AIC
    best_model_name = min(results.keys(), key=lambda x: results[x].aic)
    best_garch_model = results[best_model_name]
    
    print(f"Best model: {best_model_name}")
    print(f"AIC: {best_garch_model.aic:.4f}")
    
    # Extract conditional volatility
    conditional_volatility = best_garch_model.conditional_volatility
    
    # Visualization
    fig, axes = plt.subplots(3, 1, figsize=(14, 12))
    
    # Returns
    axes[0].plot(returns.index, returns.values, alpha=0.7, linewidth=0.8)
    axes[0].set_title('Daily Returns')
    axes[0].set_ylabel('Return')
    axes[0].grid(True, alpha=0.3)
    
    # Conditional volatility
    axes[1].plot(conditional_volatility.index, conditional_volatility.values, color='red', linewidth=1.5)
    axes[1].set_title('Conditional Volatility (GARCH)')
    axes[1].set_ylabel('Volatility (%)')
    axes[1].grid(True, alpha=0.3)
    
    # Standardized residuals
    standardized_residuals = best_garch_model.std_resid
    axes[2].plot(standardized_residuals.index, standardized_residuals.values, alpha=0.7, linewidth=0.8, color='green')
    axes[2].set_title('Standardized Residuals')
    axes[2].set_ylabel('Standardized Residual')
    axes[2].set_xlabel('Date')
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Model diagnostics
    print("\nGARCH Model Summary:")
    print(best_garch_model.summary())
    
    return best_garch_model, conditional_volatility

garch_model, cond_volatility = garch_modeling(clean_returns)

2.14 Advanced Time Series Techniques

2.14.1 1. Volatility Forecasting

def volatility_forecasting(garch_model, horizon=30):
    """
    Generate volatility forecasts using GARCH model
    """
    # Generate volatility forecasts
    forecasts = garch_model.forecast(horizon=horizon)
    
    # Extract forecast variance and convert to volatility
    forecast_variance = forecasts.variance.iloc[-1].values
    forecast_volatility = np.sqrt(forecast_variance)
    
    # Create forecast dates
    last_date = garch_model.conditional_volatility.index[-1]
    forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), 
                                 periods=horizon, freq='D')
    
    # Visualization
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Plot historical volatility (last 100 points)
    historical_vol = garch_model.conditional_volatility.tail(100)
    ax.plot(historical_vol.index, historical_vol.values, label='Historical Volatility', color='blue')
    
    # Plot forecasted volatility
    ax.plot(forecast_dates, forecast_volatility, label='Volatility Forecast', 
           color='red', linestyle='--', linewidth=2)
    
    ax.set_title('GARCH Volatility Forecast')
    ax.set_xlabel('Date')
    ax.set_ylabel('Volatility (%)')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Average forecasted volatility: {np.mean(forecast_volatility):.4f}%")
    print(f"Volatility range: {np.min(forecast_volatility):.4f}% - {np.max(forecast_volatility):.4f}%")
    
    return forecast_volatility, forecast_dates

vol_forecast, vol_dates = volatility_forecasting(garch_model)

2.14.2 2. Rolling Window Analysis

def rolling_window_analysis(returns, window_size=252):
    """
    Perform rolling window analysis of time series properties
    """
    # Calculate rolling statistics
    rolling_mean = returns.rolling(window=window_size).mean() * 252  # Annualized
    rolling_std = returns.rolling(window=window_size).std() * np.sqrt(252)  # Annualized
    rolling_skew = returns.rolling(window=window_size).skew()
    rolling_kurt = returns.rolling(window=window_size).kurt()
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Rolling mean
    axes[0,0].plot(rolling_mean.index, rolling_mean.values, color='blue', linewidth=2)
    axes[0,0].set_title(f'Rolling Mean ({window_size}-day window)')
    axes[0,0].set_ylabel('Annualized Return')
    axes[0,0].grid(True, alpha=0.3)
    axes[0,0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
    
    # Rolling volatility
    axes[0,1].plot(rolling_std.index, rolling_std.values, color='red', linewidth=2)
    axes[0,1].set_title(f'Rolling Volatility ({window_size}-day window)')
    axes[0,1].set_ylabel('Annualized Volatility')
    axes[0,1].grid(True, alpha=0.3)
    
    # Rolling skewness
    axes[1,0].plot(rolling_skew.index, rolling_skew.values, color='green', linewidth=2)
    axes[1,0].set_title(f'Rolling Skewness ({window_size}-day window)')
    axes[1,0].set_ylabel('Skewness')
    axes[1,0].set_xlabel('Date')
    axes[1,0].grid(True, alpha=0.3)
    axes[1,0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
    
    # Rolling kurtosis
    axes[1,1].plot(rolling_kurt.index, rolling_kurt.values, color='purple', linewidth=2)
    axes[1,1].set_title(f'Rolling Kurtosis ({window_size}-day window)')
    axes[1,1].set_ylabel('Excess Kurtosis')
    axes[1,1].set_xlabel('Date')
    axes[1,1].grid(True, alpha=0.3)
    axes[1,1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print(f"Rolling Statistics Summary ({window_size}-day window):")
    print("=" * 60)
    print(f"Mean annualized return: {rolling_mean.mean():.4f} ± {rolling_mean.std():.4f}")
    print(f"Mean annualized volatility: {rolling_std.mean():.4f} ± {rolling_std.std():.4f}")
    print(f"Mean skewness: {rolling_skew.mean():.4f} ± {rolling_skew.std():.4f}")
    print(f"Mean kurtosis: {rolling_kurt.mean():.4f} ± {rolling_kurt.std():.4f}")

rolling_window_analysis(clean_returns)

2.15 Multi-Asset Time Series Analysis

def multi_asset_analysis():
    """
    Analyze multiple assets simultaneously
    """
    # Fetch data for multiple assets
    tickers = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'SPY']
    
    print("Fetching multi-asset data...")
    multi_data = yf.download(tickers, start='2020-01-01', end='2024-01-01')['Adj Close']
    
    # Calculate returns
    multi_returns = multi_data.pct_change().dropna()
    
    # Correlation analysis
    correlation_matrix = multi_returns.corr()
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Correlation heatmap
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
                square=True, ax=axes[0,0], cbar_kws={'shrink': 0.8})
    axes[0,0].set_title('Return Correlation Matrix')
    
    # Cumulative returns
    cumulative_returns = (1 + multi_returns).cumprod()
    for ticker in tickers:
        axes[0,1].plot(cumulative_returns.index, cumulative_returns[ticker], 
                      label=ticker, linewidth=2)
    axes[0,1].set_title('Cumulative Returns')
    axes[0,1].set_ylabel('Cumulative Return')
    axes[0,1].legend()
    axes[0,1].grid(True, alpha=0.3)
    
    # Rolling correlations (AAPL vs SPY)
    rolling_corr = multi_returns['AAPL'].rolling(window=60).corr(multi_returns['SPY'])
    axes[1,0].plot(rolling_corr.index, rolling_corr.values, linewidth=2, color='purple')
    axes[1,0].set_title('Rolling Correlation: AAPL vs SPY (60-day window)')
    axes[1,0].set_ylabel('Correlation')
    axes[1,0].set_xlabel('Date')
    axes[1,0].grid(True, alpha=0.3)
    axes[1,0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
    
    # Volatility comparison
    rolling_vol = multi_returns.rolling(window=30).std() * np.sqrt(252)
    for ticker in tickers:
        axes[1,1].plot(rolling_vol.index, rolling_vol[ticker], label=ticker, alpha=0.8)
    axes[1,1].set_title('Rolling Volatility (30-day window)')
    axes[1,1].set_ylabel('Annualized Volatility')
    axes[1,1].set_xlabel('Date')
    axes[1,1].legend()
    axes[1,1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Cointegration test (example: AAPL vs MSFT)
    from statsmodels.tsa.stattools import coint
    
    aapl_prices = multi_data['AAPL'].dropna()
    msft_prices = multi_data['MSFT'].dropna()
    
    # Align the series
    common_index = aapl_prices.index.intersection(msft_prices.index)
    aapl_aligned = aapl_prices.loc[common_index]
    msft_aligned = msft_prices.loc[common_index]
    
    coint_stat, coint_pvalue, coint_crit = coint(aapl_aligned, msft_aligned)
    
    print(f"\nCointegration Test (AAPL vs MSFT):")
    print(f"Cointegration statistic: {coint_stat:.4f}")
    print(f"p-value: {coint_pvalue:.4f}")
    print(f"Critical values: {coint_crit}")
    
    if coint_pvalue < 0.05:
        print("✓ Evidence of cointegration - prices move together in long run")
    else:
        print("✗ No evidence of cointegration")
    
    return multi_returns, correlation_matrix

multi_returns, corr_matrix = multi_asset_analysis()

2.16 Practical Exercises

2.16.1 Exercise 1: Complete Time Series Analysis Pipeline

def time_series_exercise():
    """
    Exercise: Complete time series analysis of a chosen stock
    
    Tasks:
    1. Fetch data for a stock of your choice
    2. Perform stationarity tests
    3. Fit ARIMA model
    4. Fit GARCH model
    5. Generate forecasts
    6. Evaluate model performance
    """
    
    # Choose a stock ticker
    ticker = 'NVDA'  # Example: NVIDIA
    
    print(f"Time Series Analysis Exercise: {ticker}")
    print("=" * 50)
    
    # Step 1: Fetch data
    data = yf.download(ticker, start='2020-01-01', end='2024-01-01')
    prices = data['Adj Close']
    returns = prices.pct_change().dropna()
    
    print(f"Data fetched: {len(prices)} price observations")
    print(f"Returns calculated: {len(returns)} return observations")
    
    # Step 2: Basic statistics
    print(f"\nBasic Statistics:")
    print(f"Mean daily return: {returns.mean():.6f}")
    print(f"Daily volatility: {returns.std():.6f}")
    print(f"Annualized return: {returns.mean() * 252:.4f}")
    print(f"Annualized volatility: {returns.std() * np.sqrt(252):.4f}")
    print(f"Skewness: {stats.skew(returns):.4f}")
    print(f"Kurtosis: {stats.kurtosis(returns):.4f}")
    
    # Step 3: Stationarity tests
    print(f"\nStationarity Test Results:")
    adf_stat, adf_pvalue, _, _, adf_crit, _ = adfuller(returns)
    print(f"ADF test p-value: {adf_pvalue:.6f}")
    if adf_pvalue < 0.05:
        print("✓ Returns are stationary")
    else:
        print("✗ Returns may not be stationary")
    
    # Step 4: Simple ARIMA model
    try:
        arima_model = ARIMA(returns, order=(1, 0, 1))
        arima_fitted = arima_model.fit()
        print(f"\nARIMA(1,0,1) AIC: {arima_fitted.aic:.4f}")
    except:
        print("\nFailed to fit ARIMA model")
    
    # Step 5: Simple GARCH model
    try:
        garch_model = arch_model(returns * 100, vol='Garch', p=1, q=1)
        garch_fitted = garch_model.fit(disp='off')
        print(f"GARCH(1,1) AIC: {garch_fitted.aic:.4f}")
    except:
        print("Failed to fit GARCH model")
    
    # Step 6: Visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Price chart
    axes[0,0].plot(prices.index, prices.values)
    axes[0,0].set_title(f'{ticker} Price')
    axes[0,0].grid(True, alpha=0.3)
    
    # Returns
    axes[0,1].plot(returns.index, returns.values, alpha=0.7)
    axes[0,1].set_title(f'{ticker} Returns')
    axes[0,1].grid(True, alpha=0.3)
    
    # Returns distribution
    axes[1,0].hist(returns, bins=50, alpha=0.7, density=True)
    axes[1,0].set_title('Returns Distribution')
    axes[1,0].grid(True, alpha=0.3)
    
    # ACF of returns
    plot_acf(returns, ax=axes[1,1], lags=20, title='ACF of Returns')
    axes[1,1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    return prices, returns

# Run the exercise
exercise_prices, exercise_returns = time_series_exercise()

2.17 Summary and Key Takeaways

This comprehensive chapter has covered the essential aspects of time series analysis for financial data using Python:

Financial Time Series Characteristics: Understanding volatility clustering, heavy tails, and autocorrelation patterns
Stationarity Testing: Using ADF and KPSS tests to determine if differencing is needed
ARIMA Modeling: Systematic approach to modeling and forecasting returns
GARCH Modeling: Capturing time-varying volatility in financial returns
Advanced Techniques: Rolling window analysis and multi-asset considerations

2.17.1 Key Python Libraries for Time Series:

statsmodels: ARIMA models, statistical tests, diagnostics
arch: GARCH models and volatility modeling
yfinance: Financial data acquisition
pandas: Time series data manipulation
matplotlib/seaborn: Visualization
scipy: Statistical functions and tests

2.17.2 Best Practices:

Always test for stationarity before modeling
Examine residuals for model adequacy
Use information criteria (AIC, BIC) for model selection
Consider volatility clustering in financial data
Validate models with out-of-sample testing
Document assumptions and limitations

This foundation in time series analysis provides the essential skills needed for advanced financial modeling, risk management, and algorithmic trading applications.

--- title: "Causal Time Series Analysis for Financial Data" author: "Professor Barry Quinn" editor: visual embed-resources: true execute: warning: false message: false echo: true eval: false freeze: auto format: html: code-fold: true code-summary: "Show Python code" --- ![](../images/logos/DALL·E%202024-01-22%2010.24.22%20-%20Modify%20the%20second%20logo%20design%20for%20an%20advanced%20financial%20data%20analytics%20course%20by%20removing%20any%20text.%20Maintain%20the%20vibrant%20style%20with%20medium%20level%20of%20de.png){width="30%" style="float: left; margin-right: 10px;"} Financial time series analysis has evolved beyond traditional correlation-based methods to embrace causal reasoning. This chapter combines the technical mastery of time series analysis from "Python for Finance" with the analytical rigor of causal inference from "Causal AI." You'll learn to not just identify patterns in financial time series, but to understand the underlying causal mechanisms that drive market movements - a crucial skill for final year finance students entering an AI-driven industry. ## The Evolution of Financial Time Series Analysis Traditional time series analysis focuses on patterns, trends, and correlations. While powerful, these methods can lead to spurious relationships and overfitted models. Modern financial analysis requires a deeper understanding of causality: - **Traditional Question**: "What patterns can we find in stock prices?" - **Causal Question**: "What actually causes stock price movements, and how can we distinguish real drivers from spurious correlations?" ::: {.callout-important} ## Why Causal Time Series Analysis Matters Consider the 2008 financial crisis: Many models predicted continued growth based on historical correlations. A causal approach would have questioned whether these correlations reflected true causal relationships or were artifacts of a specific economic regime. Understanding causality helps build more robust models that perform better during regime changes. ::: ## Integrated Approach: Technical Skills + Causal Reasoning This chapter integrates two complementary approaches: ### **Technical Foundation (Python for Finance - Chapter 8)** - Master pandas for time series manipulation - Implement ARIMA, GARCH, and advanced models - Build high-performance time series pipelines - Create professional financial visualizations ### **Causal Enhancement (Causal AI - Chapters 3-4)** - Build causal graphs (DAGs) for financial relationships - Test causal assumptions in time series data - Implement interventional analysis - Distinguish correlation from causation over time ## Practical Example: Traditional vs. Causal Time Series Analysis Let's demonstrate the integration using a real financial scenario: analyzing the relationship between interest rates, inflation, and stock market performance. ### Setup: Data Acquisition and Preparation ```python # Import libraries from both textbooks import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import yfinance as yf from datetime import datetime, timedelta # Causal inference libraries import dowhy from dowhy import CausalModel import pgmpy from pgmpy.models import BayesianNetwork from pgmpy.plotting import plot_model # Time series libraries from statsmodels.tsa.arima.model import ARIMA from statsmodels.tsa.stattools import adfuller, grangercausalitytests from statsmodels.graphics.tsaplots import plot_acf, plot_pacf # Download financial time series data def get_financial_data(): """ Download key financial time series for causal analysis """ # Stock market index (S&P 500) sp500 = yf.download('^GSPC', start='2010-01-01', end='2024-01-01')['Adj Close'] # VIX (Volatility Index) vix = yf.download('^VIX', start='2010-01-01', end='2024-01-01')['Adj Close'] # 10-Year Treasury Rate (proxy for interest rates) treasury = yf.download('^TNX', start='2010-01-01', end='2024-01-01')['Adj Close'] # Combine into single DataFrame data = pd.DataFrame({ 'SP500': sp500, 'VIX': vix, 'Treasury_Rate': treasury }).dropna() # Calculate returns and changes data['SP500_Returns'] = data['SP500'].pct_change() data['VIX_Change'] = data['VIX'].pct_change() data['Rate_Change'] = data['Treasury_Rate'].diff() return data.dropna() # Load and prepare data financial_data = get_financial_data() print("Data loaded successfully!") print(f"Data shape: {financial_data.shape}") print(f"Date range: {financial_data.index.min()} to {financial_data.index.max()}") ``` ### Traditional Time Series Analysis ```python # Traditional approach: Correlation and Granger Causality def traditional_analysis(data): """ Perform traditional time series analysis """ # 1. Correlation Analysis correlation_matrix = data[['SP500_Returns', 'VIX_Change', 'Rate_Change']].corr() plt.figure(figsize=(12, 5)) # Plot 1: Correlation heatmap plt.subplot(1, 2, 1) sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0) plt.title('Traditional Correlation Analysis') # Plot 2: Time series plt.subplot(1, 2, 2) plt.plot(data.index, data['SP500_Returns'], label='S&P 500 Returns', alpha=0.7) plt.plot(data.index, data['VIX_Change']/10, label='VIX Change (scaled)', alpha=0.7) plt.plot(data.index, data['Rate_Change']*10, label='Rate Change (scaled)', alpha=0.7) plt.legend() plt.title('Financial Time Series') plt.xticks(rotation=45) plt.tight_layout() plt.show() # 2. Granger Causality Tests print("\\n=== TRADITIONAL GRANGER CAUSALITY TESTS ===") # Test if VIX changes Granger-cause S&P 500 returns granger_data = data[['SP500_Returns', 'VIX_Change']].dropna() granger_result = grangercausalitytests(granger_data, maxlag=5, verbose=False) print("VIX -> S&P 500 Returns:") for lag in range(1, 6): p_value = granger_result[lag][0]['ssr_ftest'][1] print(f" Lag {lag}: p-value = {p_value:.4f}") return correlation_matrix # Run traditional analysis traditional_results = traditional_analysis(financial_data) ``` ### Enhanced Causal Analysis ```python # Causal approach: Directed Acyclic Graph (DAG) and Causal Inference def causal_analysis(data): """ Perform causal analysis using DAGs and causal inference """ # 1. Define causal graph based on economic theory causal_graph = """ digraph { "Economic_Conditions" -> "Treasury_Rate"; "Economic_Conditions" -> "Market_Sentiment"; "Market_Sentiment" -> "VIX"; "Market_Sentiment" -> "SP500_Returns"; "Treasury_Rate" -> "SP500_Returns"; "VIX" -> "SP500_Returns"; } """ # 2. Prepare data for causal analysis # Add simulated economic conditions (in practice, use real economic indicators) causal_data = data.copy() causal_data['Economic_Conditions'] = np.random.normal(0, 1, len(data)) causal_data['Market_Sentiment'] = np.random.normal(0, 1, len(data)) # Select relevant columns analysis_data = causal_data[['SP500_Returns', 'VIX_Change', 'Rate_Change', 'Economic_Conditions', 'Market_Sentiment']].dropna() # 3. Build causal model model = CausalModel( data=analysis_data, treatment='VIX_Change', outcome='SP500_Returns', graph=causal_graph ) # 4. Identify causal effect identified_estimand = model.identify_effect() print("\\n=== CAUSAL ANALYSIS RESULTS ===") print("Causal Identification:") print(identified_estimand) # 5. Estimate causal effect causal_estimate = model.estimate_effect( identified_estimand, method_name="backdoor.linear_regression" ) print(f"\\nCausal Effect of VIX on S&P 500 Returns: {causal_estimate.value:.6f}") print(f"Traditional Correlation: {analysis_data['VIX_Change'].corr(analysis_data['SP500_Returns']):.6f}") # 6. Validate causal model refutation = model.refute_estimate( identified_estimand, causal_estimate, method_name="random_common_cause" ) print(f"\\nRobustness Check - Random Common Cause:") print(f"New Effect: {refutation.new_effect:.6f}") return model, causal_estimate # Run causal analysis causal_model, causal_results = causal_analysis(financial_data) ``` ### Comparison and Insights ```python def compare_approaches(traditional_corr, causal_effect): """ Compare traditional and causal approaches """ print("\\n=== COMPARISON: TRADITIONAL vs CAUSAL ===") vix_sp_corr = traditional_corr.loc['VIX_Change', 'SP500_Returns'] print(f"Traditional Correlation (VIX ↔ S&P 500): {vix_sp_corr:.6f}") print(f"Causal Effect (VIX → S&P 500): {causal_effect.value:.6f}") print("\\nKey Insights:") print("1. Correlation measures statistical association") print("2. Causal effect measures actual impact of intervention") print("3. They can differ significantly due to confounding factors") print("4. Causal analysis helps identify true drivers vs. spurious correlations") # Practical implications print("\\nPractical Implications for Finance:") print("• Portfolio Management: Focus on causal drivers, not just correlations") print("• Risk Management: Identify true risk sources") print("• Trading Strategies: Build strategies on causal relationships") print("• Regulatory Compliance: Explain model decisions with causal reasoning") # Compare results compare_approaches(traditional_results, causal_results) ``` ::: {.callout-tip} ## Integration Benefits for Final Year Students This integrated approach provides several advantages: 1. **Technical Proficiency**: Master industry-standard time series methods 2. **Analytical Rigor**: Understand when correlations reflect true causation 3. **Career Preparation**: Develop skills valued by modern financial institutions 4. **Research Skills**: Conduct more robust financial research 5. **Regulatory Compliance**: Meet increasing demands for explainable AI ::: ## Advanced Applications: Structural Breaks and Regime Changes One area where causal analysis particularly shines is in understanding structural breaks and regime changes in financial markets: ```python # Example: Analyzing COVID-19 impact using causal intervention analysis def analyze_structural_break(): """ Demonstrate causal analysis of structural breaks """ # This would typically involve: # 1. Identifying break points using statistical tests # 2. Building separate causal models for different regimes # 3. Analyzing how causal relationships changed # 4. Predicting future behavior under different scenarios print("Advanced Topic: Structural Break Analysis") print("• Traditional methods: Identify break points statistically") print("• Causal enhancement: Understand WHY relationships changed") print("• Applications: COVID-19, financial crises, policy changes") analyze_structural_break() ``` # Why Study Financial Time Series Econometrics? At its core, financial time series data is a collection of observations recorded sequentially over time. It encompasses a broad spectrum of data types, including daily stock prices, monthly interest rates, annual GDP figures, and more. Each data point in a time series bears a timestamp, reflecting its unique position in the temporal sequence. This inherent time-dependency is what sets financial time series apart from other statistical data, introducing complexities like trends, seasonality, and autocorrelation. Time series analysis is the linchpin of economic forecasting. By dissecting historical data, analysts unlock patterns and rhythms—trends, seasonal effects, and cycles. These insights are instrumental in projecting future economic scenarios, informing decisions in areas like portfolio management, risk mitigation, and economic policy development. The financial markets are a fertile ground for the application of time series analysis. Techniques like ARIMA modeling, volatility forecasting, and cointegration analysis are employed to predict stock prices, evaluate risks, and unearth trading signals. Traders scrutinize past price trajectories to anticipate future movements, while risk managers use time series data to gauge market volatility and shield against potential downturns. Time series analysis is a cornerstone of modern investment strategy. Investors and portfolio managers rely on these analyses to track market trends, gauge asset performance, and time their buy-and-sell decisions. Sophisticated techniques like GARCH models for volatility forecasting and VAR models for understanding the dynamic interplay between multiple financial variables are integral in shaping well-informed, resilient investment portfolios. ## Challenges and Considerations While financial time series analysis provides powerful insights, it comes with challenges. Financial markets are influenced by a myriad of factors - economic indicators, political events, investor sentiment - making modeling and prediction complex. Analysts must be wary of overfitting models and remain vigilant to changing market dynamics. Moreover, the assumption of stationarity in time series data often requires careful examination and potential transformation of the data. Financial time series data is a gateway to deeper insights into the financial universe. Its analysis, through a blend of statistical techniques and domain expertise, equips finance professionals with the tools to navigate the complexities of financial markets. From predicting stock prices to understanding economic trends, time series analysis is an indispensable part of financial decision-making. In this chapter, we will delve deeper into the methodologies and tools of financial time series analysis. We will explore various models, from simple moving averages to complex ARIMA and GARCH models, and discuss their applications in real-world financial scenarios. The goal is to equip readers with a comprehensive understanding of time series analysis, enabling them to apply these concepts effectively in their professional endeavors in finance. # Characteristics of Financial Time Series Data Financial time series data exhibit unique characteristics that differentiate them from other types of data, making their analysis and modeling crucial for anyone involved in financial markets research or practice. Understanding these features is essential as they not only describe the behavior of financial data but also guide the selection of appropriate quantitative methods. ## Volatility Clustering (VC) One of the most prominent features of financial time series is volatility clustering (VC), which refers to the tendency of periods of high volatility to be followed by more high volatility periods, and low volatility periods to be followed by more low volatility periods (Engle, 1995). This phenomenon can be described by the GARCH model, a generalization of the Autoregressive Conditional Heteroscedasticity (ARCH) model, which allows for varying levels of volatility over time. VC is particularly evident in stock market data (Bollerslev & Engle, 1992), where large price changes are often followed by similar-sized changes. **Behavioral Economics Perspective**: Volatility clustering can be influenced by investor reactions to news or market events, where overreactions or underreactions to new information can lead to periods of heightened or reduced volatility. This behavioral response is often modeled through investor sentiment and its impact on market dynamics. ## Leverage Effects (LE) Leverage effects (LE) occur when negative asset returns are associated with an increase in volatility, more than positive returns of the same magnitude. This asymmetric volatility challenges the assumption of constant volatility in traditional financial models. LE can be explained by the relationship between firm leverage and stock volatility, where declining stock prices increase the debt-to-equity ratio, making the firm riskier. **Behavioral Economics Perspective**: Leverage effects can be partially attributed to investor psychology, where negative news or declining prices trigger more emotional responses than positive news, leading to increased trading activity and higher volatility during market downturns. ## Heavy Tails and Excess Kurtosis Financial returns often exhibit heavy tails, meaning that extreme events (both positive and negative) occur more frequently than would be predicted by a normal distribution. This characteristic is captured by excess kurtosis, where the distribution has "fatter" tails than the normal distribution. Heavy tails are crucial for risk management, as they indicate that extreme losses (or gains) are more likely than traditional models suggest. **Behavioral Economics Perspective**: Heavy tails in financial returns can be linked to herding behavior and market panics, where investors collectively react to extreme events, amplifying their impact and creating more frequent extreme outcomes than would occur under purely rational decision-making. ## Mean Reversion Mean reversion is the tendency for asset prices or returns to move back toward their long-term average over time. This characteristic suggests that periods of unusually high or low performance are often followed by periods of more typical performance. Mean reversion is particularly important for long-term investment strategies and portfolio rebalancing decisions. **Behavioral Economics Perspective**: Mean reversion can be influenced by investor overconfidence and momentum trading, where initial price movements are amplified by trend-following behavior, but eventually, fundamental values reassert themselves as rational analysis prevails. ## Non-stationarity Most financial time series exhibit non-stationarity, meaning their statistical properties (mean, variance, covariance) change over time. Non-stationarity can manifest as trends, structural breaks, or changing volatility patterns. This characteristic poses challenges for traditional statistical models that assume stationarity and requires special treatment through techniques like differencing or cointegration analysis. **Behavioral Economics Perspective**: Non-stationarity in financial markets can reflect changing investor preferences, evolving market structures, regulatory changes, and shifts in economic fundamentals, all of which influence how investors process information and make decisions over time. ## Time Series Modeling Approaches In financial data analysis, time series data often exhibit patterns, trends, and fluctuations that require appropriate modeling and processing techniques to extract meaningful insights. Two commonly used approaches are ARIMA (Autoregressive Integrated Moving Average) modeling and smoothing techniques. ### ARIMA Modeling ARIMA models are a class of statistical models widely used for time series forecasting and analysis. These models aim to describe the autocorrelations in the data by combining autoregressive (AR) and moving average (MA) components, along with differencing to handle non-stationarity. The key aspects of ARIMA modeling are: 1. **Stationarity**: ARIMA models assume that the time series is stationary, meaning that its statistical properties (mean, variance, and autocorrelation) remain constant over time. If the data is non-stationary, differencing is applied to achieve stationarity. 2. **Autocorrelation**: ARIMA models capture the data's autocorrelation structure, where future values are influenced by past values and/or past errors. 3. **Model Identification**: The ARIMA model is specified by three parameters: p (order of the autoregressive component), d (degree of differencing), and q (order of the moving average component). These parameters are determined through an iterative model identification, estimation, and diagnostic checking process. 4. **Forecasting**: Once an appropriate ARIMA model is identified and estimated, it can generate forecasts for future periods. ARIMA models are suitable when the goal is to capture the underlying patterns and dynamics of the time series data, including trends, seasonality, and autocorrelation structures. They are widely used in finance for forecasting stock prices, exchange rates, and economic indicators. ### Smoothing Techniques Smoothing techniques reduce the noise or irregularities in time series data, revealing the underlying trend or signal. These techniques do not explicitly model the autocorrelation structure but rather apply filters or weighted averages to smooth out the fluctuations. Standard smoothing techniques include: 1. **Simple Moving Average (SMA)**: - **Description**: Calculates the average of a fixed number of data points over a specified window - **Formula**: SMA(t) = (y(t) + y(t-1) + ... + y(t-n+1)) / n - **Applications**: Widely used in technical analysis for identifying trends and generating trading signals - **Advantages**: Easy to understand and implement, effective for removing high-frequency noise - **Limitations**: Introduces lag in the smoothed series, sensitive to outliers 2. **Exponential Moving Average (EMA)**: - **Description**: Assigns exponentially decreasing weights to older data points, giving more importance to recent observations - **Formula**: EMA(t) = α * y(t) + (1 - α) * EMA(t-1) - **Applications**: Commonly used in technical analysis, forecasting, and signal processing - **Advantages**: Responds more quickly to changes, has less lag than SMA - **Limitations**: Requires tuning the smoothing parameter (α) 3. **Weighted Moving Average (WMA)**: - **Description**: Assigns different weights to data points within the window - **Applications**: Used in technical analysis and trend analysis - **Advantages**: Allows for customized weighting schemes - **Limitations**: Requires careful selection of weights 4. **Kalman Filter**: - **Description**: A recursive algorithm that estimates the actual state of a dynamic system from noisy observations - **Applications**: Portfolio optimization, risk management, and forecasting - **Advantages**: Optimal for linear systems with Gaussian noise, handles missing data - **Limitations**: Assumes linear system model and Gaussian noise The choice between ARIMA modeling and smoothing techniques depends on the specific objectives and characteristics of the financial time series data. ARIMA models are more appropriate if the goal is to forecast future values while accounting for autocorrelation and capturing underlying patterns. However, smoothing techniques may be more suitable if the focus is on denoising the data and revealing the underlying trend or signal. ## Stationarity and Unit Roots in Financial Time Series Understanding the concepts of stationarity and unit roots is fundamental in financial time series analysis. These concepts are critical in selecting appropriate models for analysis and ensuring the reliability of statistical inferences. ### Stationarity - **Definition**: A time series is said to be stationary if its statistical properties such as mean, variance, and autocorrelation are constant over time. In finance, this implies that the time series does not evolve in a predictable manner over time. - **Importance**: Stationarity is a key assumption in many time series models. Non-stationary data can lead to spurious results in statistical tests and forecasts. - **Testing for Stationarity**: Common tests include the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. ### Unit Roots - **Definition**: A unit root is a characteristic of a time series that makes it non-stationary. Presence of a unit root indicates that the time series is subject to random walks or drifts. - **Detection**: Unit roots can be detected using tests such as the ADF test, where the null hypothesis is that the time series has a unit root. - **Implications**: Time series with unit roots require differencing or other transformations to achieve stationarity before further analysis. ## Theory Behind Reproducibility and Replication ### Replication Replicability refers to the ability to duplicate the results of a study by using the same methodology but with different data sets. In other words, if other researchers follow the same procedures and methods but use new data, they should arrive at similar findings or conclusions. In financial data analytics this is particularly important because financial models and algorithms should be robust and consistent across different data sets. For instance, a risk assessment model should yield reliable and consistent risk evaluations across various market conditions or customer profiles. ### Reproducibility Reproducibility, on the other hand, refers to the ability to recreate the results of a study by using the same methodology and the same data. It's about the precision in the replication of the original study's setup, including the data and the computational procedures. In economics and finance, reproducibility ensures that if another researcher or practitioner uses the same data and follows the same steps, they would arrive at the same results. This is crucial for validating the findings of financial models, statistical analyses, or data-driven research. #### Nuances and Differences 1. **Data Used**: The key difference lies in the data used. Replicability involves different datasets, whereas reproducibility uses the original dataset. 2. **Purpose**: - **Replicability** tests the generalizability and robustness of the findings or models across different scenarios or datasets. - **Reproducibility** ensures the accuracy and reliability of the original study's findings and methods. 3. **Challenges**: - **Replicability** can be challenging due to varying data quality, different market conditions, or evolving financial landscapes. - **Reproducibility** can be hindered by incomplete documentation, software version differences, or proprietary data access issues. 4. **Importance in Finance**: - **Replicability** is vital for model validation and ensuring that financial strategies work across different market conditions. - **Reproducibility** is essential for regulatory compliance, peer review, and building trust in financial research and analysis. Both replicability and reproducibility are fundamental to the credibility and reliability of research and analysis in economics and finance. They ensure that findings are not just one-time occurrences but represent genuine insights that can be trusted and built upon by the broader academic and professional community. ## Practical Implementation with Python To demonstrate these concepts, let's consider a practical example using Python's powerful ecosystem for time series analysis. *Suppose we want to analyze the daily closing prices of a stock (e.g., Apple Inc.). We can employ time series models to forecast future prices, assess volatility, or identify trends.* ```python # Essential imports for time series analysis import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import yfinance as yf from datetime import datetime, timedelta import warnings warnings.filterwarnings('ignore') # Statistical and time series libraries import statsmodels.api as sm from statsmodels.tsa.arima.model import ARIMA from statsmodels.tsa.stattools import adfuller, kpss from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from arch import arch_model import scipy.stats as stats # Set random seed for reproducibility np.random.seed(42) # Fetch Apple stock data print("Fetching AAPL stock data...") aapl = yf.download('AAPL', start='2020-01-01', end='2024-01-01') aapl_close = aapl['Adj Close'] print(f"Data shape: {aapl_close.shape}") print(f"Date range: {aapl_close.index.min()} to {aapl_close.index.max()}") ``` ```python # Basic time series visualization fig, axes = plt.subplots(2, 1, figsize=(14, 10)) # Plot closing prices axes[0].plot(aapl_close.index, aapl_close.values, linewidth=1.5, color='blue') axes[0].set_title('AAPL Closing Prices', fontsize=14, fontweight='bold') axes[0].set_ylabel('Price ($)') axes[0].grid(True, alpha=0.3) # Calculate and plot returns aapl_returns = aapl_close.pct_change().dropna() axes[1].plot(aapl_returns.index, aapl_returns.values, linewidth=0.8, color='red', alpha=0.7) axes[1].set_title('AAPL Daily Returns', fontsize=14, fontweight='bold') axes[1].set_ylabel('Daily Return') axes[1].set_xlabel('Date') axes[1].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Basic statistics print("AAPL Returns Statistics:") print(f"Mean daily return: {aapl_returns.mean():.6f}") print(f"Daily volatility: {aapl_returns.std():.6f}") print(f"Annualized return: {aapl_returns.mean() * 252:.4f}") print(f"Annualized volatility: {aapl_returns.std() * np.sqrt(252):.4f}") ``` ## Characteristics of Financial Time Series ### 1. Volatility Clustering One of the most prominent features of financial time series is volatility clustering, which refers to the tendency of periods of high volatility to be followed by more high volatility periods, and low volatility periods to be followed by more low volatility periods. ```python def demonstrate_volatility_clustering(): """ Demonstrate volatility clustering in financial returns """ # Calculate rolling volatility window = 20 rolling_vol = aapl_returns.rolling(window=window).std() * np.sqrt(252) # Create visualization fig, axes = plt.subplots(3, 1, figsize=(14, 12)) # Returns axes[0].plot(aapl_returns.index, aapl_returns.values, alpha=0.7, linewidth=0.8) axes[0].set_title('Daily Returns', fontweight='bold') axes[0].set_ylabel('Return') axes[0].grid(True, alpha=0.3) # Absolute returns abs_returns = np.abs(aapl_returns) axes[1].plot(abs_returns.index, abs_returns.values, color='orange', alpha=0.7, linewidth=0.8) axes[1].set_title('Absolute Returns (Proxy for Volatility)', fontweight='bold') axes[1].set_ylabel('|Return|') axes[1].grid(True, alpha=0.3) # Rolling volatility axes[2].plot(rolling_vol.index, rolling_vol.values, color='red', linewidth=1.5) axes[2].set_title(f'{window}-Day Rolling Annualized Volatility', fontweight='bold') axes[2].set_ylabel('Volatility') axes[2].set_xlabel('Date') axes[2].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Statistical test for volatility clustering from statsmodels.stats.diagnostic import acorr_ljungbox # Test autocorrelation in squared returns (sign of volatility clustering) squared_returns = aapl_returns ** 2 lb_test = acorr_ljungbox(squared_returns, lags=10, return_df=True) print("Ljung-Box Test for Volatility Clustering (Squared Returns):") print(f"First lag p-value: {lb_test.iloc[0]['lb_pvalue']:.6f}") if lb_test.iloc[0]['lb_pvalue'] < 0.05: print("✓ Evidence of volatility clustering detected") else: print("✗ No significant evidence of volatility clustering") demonstrate_volatility_clustering() ``` ### 2. Heavy Tails and Non-Normality Financial returns often exhibit heavy tails, meaning extreme events occur more frequently than predicted by a normal distribution. ```python def analyze_return_distribution(): """ Analyze the distribution characteristics of financial returns """ # Remove any infinite or NaN values clean_returns = aapl_returns.dropna() clean_returns = clean_returns[np.isfinite(clean_returns)] # Calculate distribution statistics mean_ret = np.mean(clean_returns) std_ret = np.std(clean_returns) skewness = stats.skew(clean_returns) kurt = stats.kurtosis(clean_returns) print("Distribution Analysis:") print(f"Mean: {mean_ret:.6f}") print(f"Standard Deviation: {std_ret:.6f}") print(f"Skewness: {skewness:.4f}") print(f"Excess Kurtosis: {kurt:.4f}") # Normality tests jb_stat, jb_pvalue = stats.jarque_bera(clean_returns) sw_stat, sw_pvalue = stats.shapiro(clean_returns[:5000]) # Shapiro-Wilk for smaller sample print(f"\nNormality Tests:") print(f"Jarque-Bera test p-value: {jb_pvalue:.2e}") print(f"Shapiro-Wilk test p-value: {sw_pvalue:.2e}") # Visualization fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # Histogram with normal overlay axes[0,0].hist(clean_returns, bins=100, density=True, alpha=0.7, color='skyblue', edgecolor='black') # Overlay normal distribution x_normal = np.linspace(clean_returns.min(), clean_returns.max(), 100) normal_pdf = stats.norm.pdf(x_normal, mean_ret, std_ret) axes[0,0].plot(x_normal, normal_pdf, 'r-', linewidth=2, label='Normal Distribution') axes[0,0].set_title('Return Distribution vs Normal') axes[0,0].set_xlabel('Return') axes[0,0].set_ylabel('Density') axes[0,0].legend() axes[0,0].grid(True, alpha=0.3) # Q-Q plot stats.probplot(clean_returns, dist="norm", plot=axes[0,1]) axes[0,1].set_title('Q-Q Plot vs Normal Distribution') axes[0,1].grid(True, alpha=0.3) # Box plot axes[1,0].boxplot(clean_returns, vert=True) axes[1,0].set_title('Box Plot of Returns') axes[1,0].set_ylabel('Return') axes[1,0].grid(True, alpha=0.3) # Comparison with t-distribution # Fit t-distribution t_params = stats.t.fit(clean_returns) t_pdf = stats.t.pdf(x_normal, *t_params) axes[1,1].hist(clean_returns, bins=100, density=True, alpha=0.7, color='lightgreen', edgecolor='black') axes[1,1].plot(x_normal, normal_pdf, 'r-', linewidth=2, label='Normal') axes[1,1].plot(x_normal, t_pdf, 'b-', linewidth=2, label='t-distribution') axes[1,1].set_title('Distribution Comparison') axes[1,1].set_xlabel('Return') axes[1,1].set_ylabel('Density') axes[1,1].legend() axes[1,1].grid(True, alpha=0.3) plt.tight_layout() plt.show() return clean_returns clean_returns = analyze_return_distribution() ``` ### 3. Autocorrelation Analysis ```python def autocorrelation_analysis(returns, lags=40): """ Analyze autocorrelation in returns and squared returns """ fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # ACF of returns plot_acf(returns, lags=lags, ax=axes[0,0], title='ACF of Returns') axes[0,0].grid(True, alpha=0.3) # PACF of returns plot_pacf(returns, lags=lags, ax=axes[0,1], title='PACF of Returns') axes[0,1].grid(True, alpha=0.3) # ACF of squared returns (volatility clustering) squared_returns = returns ** 2 plot_acf(squared_returns, lags=lags, ax=axes[1,0], title='ACF of Squared Returns') axes[1,0].grid(True, alpha=0.3) # ACF of absolute returns abs_returns = np.abs(returns) plot_acf(abs_returns, lags=lags, ax=axes[1,1], title='ACF of Absolute Returns') axes[1,1].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Statistical tests from statsmodels.stats.diagnostic import acorr_ljungbox # Test for serial correlation in returns lb_returns = acorr_ljungbox(returns, lags=10, return_df=True) lb_squared = acorr_ljungbox(squared_returns, lags=10, return_df=True) print("Autocorrelation Tests:") print(f"Returns - First lag p-value: {lb_returns.iloc[0]['lb_pvalue']:.6f}") print(f"Squared Returns - First lag p-value: {lb_squared.iloc[0]['lb_pvalue']:.6f}") autocorrelation_analysis(clean_returns) ``` ## Stationarity Testing Stationarity is a crucial concept in time series analysis. A stationary time series has statistical properties that don't change over time. ```python def stationarity_tests(series, name="Series"): """ Perform comprehensive stationarity tests """ print(f"Stationarity Tests for {name}:") print("=" * 50) # Augmented Dickey-Fuller test adf_result = adfuller(series, autolag='AIC') print(f"ADF Test:") print(f" ADF Statistic: {adf_result[0]:.6f}") print(f" p-value: {adf_result[1]:.6f}") print(f" Critical Values:") for key, value in adf_result[4].items(): print(f" {key}: {value:.6f}") if adf_result[1] <= 0.05: print(" ✓ Reject null hypothesis - Series is stationary") else: print(" ✗ Fail to reject null hypothesis - Series may have unit root") print() # KPSS test kpss_result = kpss(series, regression='c', nlags="auto") print(f"KPSS Test:") print(f" KPSS Statistic: {kpss_result[0]:.6f}") print(f" p-value: {kpss_result[1]:.6f}") print(f" Critical Values:") for key, value in kpss_result[3].items(): print(f" {key}: {value:.6f}") if kpss_result[1] >= 0.05: print(" ✓ Fail to reject null hypothesis - Series is stationary") else: print(" ✗ Reject null hypothesis - Series is non-stationary") print("\n" + "="*50) # Test stationarity of prices and returns print("Testing AAPL Prices:") stationarity_tests(aapl_close.dropna(), "AAPL Prices") print("\nTesting AAPL Returns:") stationarity_tests(clean_returns, "AAPL Returns") ``` ## ARIMA Modeling ARIMA (AutoRegressive Integrated Moving Average) models are fundamental tools for time series forecasting. ```python def arima_modeling(series, max_p=5, max_d=2, max_q=5): """ Automatic ARIMA model selection and fitting """ from statsmodels.tsa.arima.model import ARIMA from itertools import product # Grid search for best ARIMA parameters best_aic = np.inf best_order = None best_model = None # Create parameter combinations p_values = range(0, max_p + 1) d_values = range(0, max_d + 1) q_values = range(0, max_q + 1) print("Searching for optimal ARIMA parameters...") for p, d, q in product(p_values, d_values, q_values): try: model = ARIMA(series, order=(p, d, q)) fitted_model = model.fit() if fitted_model.aic < best_aic: best_aic = fitted_model.aic best_order = (p, d, q) best_model = fitted_model except: continue print(f"Best ARIMA order: {best_order}") print(f"Best AIC: {best_aic:.4f}") # Model diagnostics print("\nModel Summary:") print(best_model.summary()) # Residual analysis residuals = best_model.resid fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # Residuals plot axes[0,0].plot(residuals) axes[0,0].set_title('Residuals') axes[0,0].grid(True, alpha=0.3) # Residuals histogram axes[0,1].hist(residuals, bins=30, density=True, alpha=0.7) axes[0,1].set_title('Residuals Distribution') axes[0,1].grid(True, alpha=0.3) # Q-Q plot of residuals stats.probplot(residuals, dist="norm", plot=axes[1,0]) axes[1,0].set_title('Q-Q Plot of Residuals') axes[1,0].grid(True, alpha=0.3) # ACF of residuals plot_acf(residuals, ax=axes[1,1], lags=20, title='ACF of Residuals') axes[1,1].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Ljung-Box test on residuals from statsmodels.stats.diagnostic import acorr_ljungbox lb_test = acorr_ljungbox(residuals, lags=10, return_df=True) print(f"\nLjung-Box test p-value: {lb_test.iloc[0]['lb_pvalue']:.6f}") if lb_test.iloc[0]['lb_pvalue'] > 0.05: print("✓ Residuals appear to be white noise") else: print("✗ Residuals show significant autocorrelation") return best_model, best_order # Fit ARIMA model to returns arima_model, arima_order = arima_modeling(clean_returns) ``` ```python # Forecasting with ARIMA def arima_forecast(model, steps=30): """ Generate forecasts using fitted ARIMA model """ # Generate forecasts forecast_result = model.forecast(steps=steps) conf_int = model.get_forecast(steps=steps).conf_int() # Create forecast dates last_date = clean_returns.index[-1] forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=steps, freq='D') # Visualization fig, ax = plt.subplots(figsize=(14, 8)) # Plot historical data (last 100 points) historical = clean_returns.tail(100) ax.plot(historical.index, historical.values, label='Historical Returns', color='blue') # Plot forecasts ax.plot(forecast_dates, forecast_result, label='Forecast', color='red', linestyle='--') # Plot confidence intervals ax.fill_between(forecast_dates, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='red', alpha=0.2, label='95% Confidence Interval') ax.set_title('ARIMA Forecast of AAPL Returns') ax.set_xlabel('Date') ax.set_ylabel('Return') ax.legend() ax.grid(True, alpha=0.3) plt.tight_layout() plt.show() return forecast_result, conf_int forecast_returns, forecast_conf_int = arima_forecast(arima_model) ``` ## GARCH Modeling for Volatility GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models are essential for modeling time-varying volatility. ```python def garch_modeling(returns): """ Fit GARCH models to financial returns """ from arch import arch_model # Convert returns to percentage for numerical stability returns_pct = returns * 100 # Fit different GARCH specifications models = { 'GARCH(1,1)': arch_model(returns_pct, vol='Garch', p=1, q=1), 'EGARCH(1,1)': arch_model(returns_pct, vol='EGARCH', p=1, q=1), 'GJR-GARCH(1,1)': arch_model(returns_pct, vol='GARCH', p=1, o=1, q=1) } results = {} print("GARCH Model Comparison:") print("=" * 50) for name, model in models.items(): try: fitted_model = model.fit(disp='off') results[name] = fitted_model print(f"{name}:") print(f" Log-Likelihood: {fitted_model.loglikelihood:.4f}") print(f" AIC: {fitted_model.aic:.4f}") print(f" BIC: {fitted_model.bic:.4f}") print() except Exception as e: print(f"Error fitting {name}: {e}") # Select best model based on AIC best_model_name = min(results.keys(), key=lambda x: results[x].aic) best_garch_model = results[best_model_name] print(f"Best model: {best_model_name}") print(f"AIC: {best_garch_model.aic:.4f}") # Extract conditional volatility conditional_volatility = best_garch_model.conditional_volatility # Visualization fig, axes = plt.subplots(3, 1, figsize=(14, 12)) # Returns axes[0].plot(returns.index, returns.values, alpha=0.7, linewidth=0.8) axes[0].set_title('Daily Returns') axes[0].set_ylabel('Return') axes[0].grid(True, alpha=0.3) # Conditional volatility axes[1].plot(conditional_volatility.index, conditional_volatility.values, color='red', linewidth=1.5) axes[1].set_title('Conditional Volatility (GARCH)') axes[1].set_ylabel('Volatility (%)') axes[1].grid(True, alpha=0.3) # Standardized residuals standardized_residuals = best_garch_model.std_resid axes[2].plot(standardized_residuals.index, standardized_residuals.values, alpha=0.7, linewidth=0.8, color='green') axes[2].set_title('Standardized Residuals') axes[2].set_ylabel('Standardized Residual') axes[2].set_xlabel('Date') axes[2].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Model diagnostics print("\nGARCH Model Summary:") print(best_garch_model.summary()) return best_garch_model, conditional_volatility garch_model, cond_volatility = garch_modeling(clean_returns) ``` ## Advanced Time Series Techniques ### 1. Volatility Forecasting ```python def volatility_forecasting(garch_model, horizon=30): """ Generate volatility forecasts using GARCH model """ # Generate volatility forecasts forecasts = garch_model.forecast(horizon=horizon) # Extract forecast variance and convert to volatility forecast_variance = forecasts.variance.iloc[-1].values forecast_volatility = np.sqrt(forecast_variance) # Create forecast dates last_date = garch_model.conditional_volatility.index[-1] forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=horizon, freq='D') # Visualization fig, ax = plt.subplots(figsize=(14, 8)) # Plot historical volatility (last 100 points) historical_vol = garch_model.conditional_volatility.tail(100) ax.plot(historical_vol.index, historical_vol.values, label='Historical Volatility', color='blue') # Plot forecasted volatility ax.plot(forecast_dates, forecast_volatility, label='Volatility Forecast', color='red', linestyle='--', linewidth=2) ax.set_title('GARCH Volatility Forecast') ax.set_xlabel('Date') ax.set_ylabel('Volatility (%)') ax.legend() ax.grid(True, alpha=0.3) plt.tight_layout() plt.show() print(f"Average forecasted volatility: {np.mean(forecast_volatility):.4f}%") print(f"Volatility range: {np.min(forecast_volatility):.4f}% - {np.max(forecast_volatility):.4f}%") return forecast_volatility, forecast_dates vol_forecast, vol_dates = volatility_forecasting(garch_model) ``` ### 2. Rolling Window Analysis ```python def rolling_window_analysis(returns, window_size=252): """ Perform rolling window analysis of time series properties """ # Calculate rolling statistics rolling_mean = returns.rolling(window=window_size).mean() * 252 # Annualized rolling_std = returns.rolling(window=window_size).std() * np.sqrt(252) # Annualized rolling_skew = returns.rolling(window=window_size).skew() rolling_kurt = returns.rolling(window=window_size).kurt() # Visualization fig, axes = plt.subplots(2, 2, figsize=(16, 12)) # Rolling mean axes[0,0].plot(rolling_mean.index, rolling_mean.values, color='blue', linewidth=2) axes[0,0].set_title(f'Rolling Mean ({window_size}-day window)') axes[0,0].set_ylabel('Annualized Return') axes[0,0].grid(True, alpha=0.3) axes[0,0].axhline(y=0, color='red', linestyle='--', alpha=0.5) # Rolling volatility axes[0,1].plot(rolling_std.index, rolling_std.values, color='red', linewidth=2) axes[0,1].set_title(f'Rolling Volatility ({window_size}-day window)') axes[0,1].set_ylabel('Annualized Volatility') axes[0,1].grid(True, alpha=0.3) # Rolling skewness axes[1,0].plot(rolling_skew.index, rolling_skew.values, color='green', linewidth=2) axes[1,0].set_title(f'Rolling Skewness ({window_size}-day window)') axes[1,0].set_ylabel('Skewness') axes[1,0].set_xlabel('Date') axes[1,0].grid(True, alpha=0.3) axes[1,0].axhline(y=0, color='red', linestyle='--', alpha=0.5) # Rolling kurtosis axes[1,1].plot(rolling_kurt.index, rolling_kurt.values, color='purple', linewidth=2) axes[1,1].set_title(f'Rolling Kurtosis ({window_size}-day window)') axes[1,1].set_ylabel('Excess Kurtosis') axes[1,1].set_xlabel('Date') axes[1,1].grid(True, alpha=0.3) axes[1,1].axhline(y=0, color='red', linestyle='--', alpha=0.5) plt.tight_layout() plt.show() # Summary statistics print(f"Rolling Statistics Summary ({window_size}-day window):") print("=" * 60) print(f"Mean annualized return: {rolling_mean.mean():.4f} ± {rolling_mean.std():.4f}") print(f"Mean annualized volatility: {rolling_std.mean():.4f} ± {rolling_std.std():.4f}") print(f"Mean skewness: {rolling_skew.mean():.4f} ± {rolling_skew.std():.4f}") print(f"Mean kurtosis: {rolling_kurt.mean():.4f} ± {rolling_kurt.std():.4f}") rolling_window_analysis(clean_returns) ``` ## Multi-Asset Time Series Analysis ```python def multi_asset_analysis(): """ Analyze multiple assets simultaneously """ # Fetch data for multiple assets tickers = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'SPY'] print("Fetching multi-asset data...") multi_data = yf.download(tickers, start='2020-01-01', end='2024-01-01')['Adj Close'] # Calculate returns multi_returns = multi_data.pct_change().dropna() # Correlation analysis correlation_matrix = multi_returns.corr() # Visualization fig, axes = plt.subplots(2, 2, figsize=(16, 12)) # Correlation heatmap sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, square=True, ax=axes[0,0], cbar_kws={'shrink': 0.8}) axes[0,0].set_title('Return Correlation Matrix') # Cumulative returns cumulative_returns = (1 + multi_returns).cumprod() for ticker in tickers: axes[0,1].plot(cumulative_returns.index, cumulative_returns[ticker], label=ticker, linewidth=2) axes[0,1].set_title('Cumulative Returns') axes[0,1].set_ylabel('Cumulative Return') axes[0,1].legend() axes[0,1].grid(True, alpha=0.3) # Rolling correlations (AAPL vs SPY) rolling_corr = multi_returns['AAPL'].rolling(window=60).corr(multi_returns['SPY']) axes[1,0].plot(rolling_corr.index, rolling_corr.values, linewidth=2, color='purple') axes[1,0].set_title('Rolling Correlation: AAPL vs SPY (60-day window)') axes[1,0].set_ylabel('Correlation') axes[1,0].set_xlabel('Date') axes[1,0].grid(True, alpha=0.3) axes[1,0].axhline(y=0, color='red', linestyle='--', alpha=0.5) # Volatility comparison rolling_vol = multi_returns.rolling(window=30).std() * np.sqrt(252) for ticker in tickers: axes[1,1].plot(rolling_vol.index, rolling_vol[ticker], label=ticker, alpha=0.8) axes[1,1].set_title('Rolling Volatility (30-day window)') axes[1,1].set_ylabel('Annualized Volatility') axes[1,1].set_xlabel('Date') axes[1,1].legend() axes[1,1].grid(True, alpha=0.3) plt.tight_layout() plt.show() # Cointegration test (example: AAPL vs MSFT) from statsmodels.tsa.stattools import coint aapl_prices = multi_data['AAPL'].dropna() msft_prices = multi_data['MSFT'].dropna() # Align the series common_index = aapl_prices.index.intersection(msft_prices.index) aapl_aligned = aapl_prices.loc[common_index] msft_aligned = msft_prices.loc[common_index] coint_stat, coint_pvalue, coint_crit = coint(aapl_aligned, msft_aligned) print(f"\nCointegration Test (AAPL vs MSFT):") print(f"Cointegration statistic: {coint_stat:.4f}") print(f"p-value: {coint_pvalue:.4f}") print(f"Critical values: {coint_crit}") if coint_pvalue < 0.05: print("✓ Evidence of cointegration - prices move together in long run") else: print("✗ No evidence of cointegration") return multi_returns, correlation_matrix multi_returns, corr_matrix = multi_asset_analysis() ``` ## Practical Exercises ### Exercise 1: Complete Time Series Analysis Pipeline ```python def time_series_exercise(): """ Exercise: Complete time series analysis of a chosen stock Tasks: 1. Fetch data for a stock of your choice 2. Perform stationarity tests 3. Fit ARIMA model 4. Fit GARCH model 5. Generate forecasts 6. Evaluate model performance """ # Choose a stock ticker ticker = 'NVDA' # Example: NVIDIA print(f"Time Series Analysis Exercise: {ticker}") print("=" * 50) # Step 1: Fetch data data = yf.download(ticker, start='2020-01-01', end='2024-01-01') prices = data['Adj Close'] returns = prices.pct_change().dropna() print(f"Data fetched: {len(prices)} price observations") print(f"Returns calculated: {len(returns)} return observations") # Step 2: Basic statistics print(f"\nBasic Statistics:") print(f"Mean daily return: {returns.mean():.6f}") print(f"Daily volatility: {returns.std():.6f}") print(f"Annualized return: {returns.mean() * 252:.4f}") print(f"Annualized volatility: {returns.std() * np.sqrt(252):.4f}") print(f"Skewness: {stats.skew(returns):.4f}") print(f"Kurtosis: {stats.kurtosis(returns):.4f}") # Step 3: Stationarity tests print(f"\nStationarity Test Results:") adf_stat, adf_pvalue, _, _, adf_crit, _ = adfuller(returns) print(f"ADF test p-value: {adf_pvalue:.6f}") if adf_pvalue < 0.05: print("✓ Returns are stationary") else: print("✗ Returns may not be stationary") # Step 4: Simple ARIMA model try: arima_model = ARIMA(returns, order=(1, 0, 1)) arima_fitted = arima_model.fit() print(f"\nARIMA(1,0,1) AIC: {arima_fitted.aic:.4f}") except: print("\nFailed to fit ARIMA model") # Step 5: Simple GARCH model try: garch_model = arch_model(returns * 100, vol='Garch', p=1, q=1) garch_fitted = garch_model.fit(disp='off') print(f"GARCH(1,1) AIC: {garch_fitted.aic:.4f}") except: print("Failed to fit GARCH model") # Step 6: Visualization fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # Price chart axes[0,0].plot(prices.index, prices.values) axes[0,0].set_title(f'{ticker} Price') axes[0,0].grid(True, alpha=0.3) # Returns axes[0,1].plot(returns.index, returns.values, alpha=0.7) axes[0,1].set_title(f'{ticker} Returns') axes[0,1].grid(True, alpha=0.3) # Returns distribution axes[1,0].hist(returns, bins=50, alpha=0.7, density=True) axes[1,0].set_title('Returns Distribution') axes[1,0].grid(True, alpha=0.3) # ACF of returns plot_acf(returns, ax=axes[1,1], lags=20, title='ACF of Returns') axes[1,1].grid(True, alpha=0.3) plt.tight_layout() plt.show() return prices, returns # Run the exercise exercise_prices, exercise_returns = time_series_exercise() ``` ## Summary and Key Takeaways This comprehensive chapter has covered the essential aspects of time series analysis for financial data using Python: 1. **Financial Time Series Characteristics**: Understanding volatility clustering, heavy tails, and autocorrelation patterns 2. **Stationarity Testing**: Using ADF and KPSS tests to determine if differencing is needed 3. **ARIMA Modeling**: Systematic approach to modeling and forecasting returns 4. **GARCH Modeling**: Capturing time-varying volatility in financial returns 5. **Advanced Techniques**: Rolling window analysis and multi-asset considerations ### Key Python Libraries for Time Series: - **statsmodels**: ARIMA models, statistical tests, diagnostics - **arch**: GARCH models and volatility modeling - **yfinance**: Financial data acquisition - **pandas**: Time series data manipulation - **matplotlib/seaborn**: Visualization - **scipy**: Statistical functions and tests ### Best Practices: 1. Always test for stationarity before modeling 2. Examine residuals for model adequacy 3. Use information criteria (AIC, BIC) for model selection 4. Consider volatility clustering in financial data 5. Validate models with out-of-sample testing 6. Document assumptions and limitations This foundation in time series analysis provides the essential skills needed for advanced financial modeling, risk management, and algorithmic trading applications.