Datasets
Reliable sources and quick-start code
Financial Datasets: Quick Start
Use these beginner-friendly data sources. Start small (few tickers, short dates) to keep things fast and clear.
Yahoo Finance (Free)
Suitable for equities, indices, ETFs.
import yfinance as yf
# Minimal example
df = yf.download(["AAPL", "MSFT"], start="2022-01-01", end="2023-01-01")
df["Adj Close"].head()
# Simple return calculation
returns = df["Adj Close"].pct_change().dropna()
returns.tail()Notes: data may be adjusted and occasionally revised; verify for assessment work.
FRED (Macro/Economic)
Federal Reserve economic series (GDP, CPI, rates).
import pandas_datareader.data as web
gdp = web.DataReader("GDP", "fred", start="2018-01-01")
gdp.tail()CSV Fallback (Offline-friendly)
If network access is blocked, work from local CSVs.
import pandas as pd
prices = pd.read_csv("prices_sample.csv", parse_dates=["Date"], index_col="Date")
prices.head()How to create a CSV quickly:
# After pulling with yfinance, save for later offline use
df["Adj Close"].to_csv("prices_sample.csv")Good practises
- Start with one or two tickers, short windows
- Check
.info()or documentation for series definitions - Keep a small “data” folder with versioned CSV snapshots
JKP Global Factor Data (Replication resources)
The JKP initiative (Jensen–Kelly–Pedersen) provides a curated, global factor dataset, documentation, and analysis tools. The full dataset (~170 MB) is stored in OneDrive teaching-data/Global-Factor-Data to avoid GitHub size limits. Scripts (create_jkp_master_global.py, etc.) resolve this path automatically; you can override via config/data_root.yml with jkp_data_path.
- Portal: https://jkpfactors.com
- Documentation (factor definitions, availability): https://jkpfactors.s3.amazonaws.com/documents/Documentation.pdf
- JKP/WRDS Guide: https://jkpfactors.com/jkp-wrds-guide
- GitHub (related research replication): https://github.com/bkelly-lab/ReplicationCrisis
Notes and usage - Access may require registration and/or institutional subscriptions (e.g., WRDS). Follow the portal’s terms and documentation. - For coursework, prefer small, well‑documented slices (few factors, limited horizon) and record exactly which series/versions you used. - Context papers: Jensen, Kelly, and Pedersen (2024); methodology links to Kelly, Malamud, and Zhou (2024) and Gu, Kelly, and Xiu (2020) for model design and evaluation.