Statistical Science for Finance

Rigorous Methods for Technology-Enabled Markets

Author

Professor Barry Quinn

Published

May 8, 2026

Statistical Science for Finance: Signal vs Noise :  Generalisation vs Overfitting

Welcome

We take data science to mean the disciplined study of variation and uncertainty in data. Variation is what we model: differences across units, over time, between groups. Uncertainty is what we quantify: the limits of what we can know from finite samples and noisy measurements. Everything in this course builds from that foundation.

Acting in the Dark

Fischer Black, President of the American Finance Association, 1985. Source: Black (1986)

“We are forced to act largely in the dark.” : Fischer Black, Presidential Address (1986)

Financial data is fundamentally noisy. Black (1986) observed that “noise makes it very difficult to test either practical or academic theories about the way that financial markets work” and “keeps us from knowing the expected return on a stock or portfolio.” People trade on noise as if it were information; our observations are imperfect; certainty is illusory.

What does this mean quantitatively? A rigorous way to measure signal is to ask: how much of return variance is predictable? We fit Bayesian AR(1) models to daily returns from our Bloomberg database, where the R² = ρ² (squared autocorrelation) gives the fraction of variance predictable from lagged returns. The Bayesian approach yields not just point estimates but posterior distributions that quantify our uncertainty about predictability.

Figure 1: Posterior distributions of predictable variance (R²) from Bayesian AR(1) models fitted to daily returns. Shaded regions show posterior uncertainty; dashed lines mark medians; brackets show 95% credible intervals. For all assets, R² is below 5%: over 95% of variance is unpredictable noise.

Even accounting for uncertainty, the predictable fraction is consistently under 5%: typically 0.5–2%. This is consistent with academic findings on return predictability (Goyal and Welch 2008; Campbell and Thompson 2008). Notice that the credible intervals are wide (e.g., SPY: 0.08–6.2%): but this width is itself informative. It reveals that we cannot even precisely measure how little predictability there is because the signal is so weak relative to noise. Even at the optimistic upper bounds, over 93% of variance remains unpredictable. The wide intervals do not undermine the conclusion; they reinforce it. This is the fundamental nature of financial markets, not a data quality problem to be solved with better APIs.

The Three Prediction Problems

This near-zero predictability in returns is not a failure of our models: it is a success of markets. Competition arbitrages away predictable patterns in expected returns, leaving the conditional mean unforecastable. But this applies specifically to the mean. Financial prediction actually divides into three distinct problems with different signal strengths:

Problem Target Signal Course Focus
The Mean Future returns ~1-2% R² Accept randomness; naive often wins
The Variance Volatility ~15-40% R² GARCH models (Week 4); economically valuable
The Cross-Section Which assets ~5-15% R² Factor models, ML (Weeks 10-11); alpha generation

This hierarchy shapes everything we teach. Traditional textbooks present ARIMA as a prediction tool for returns, implying more sophisticated models yield better predictions. But ARIMA targets the conditional mean: precisely where competition has eliminated signal. The economically valuable prediction happens elsewhere: in variance (volatility clustering is real and exploitable) and cross-section (which stocks outperform is partially predictable). We structure the course around this insight: acknowledge the limits of return prediction, master volatility modelling where signal exists, and learn factor/ML methods for cross-sectional alpha where the opportunity lies.

The Challenge of Inference

Yet we must make inferences, build models, and guide decisions despite this irreducible uncertainty. Gelman, Hill, and Vehtari (2020) frames our challenge through three types of generalisation: from sample to population, from observed to counterfactual, from measurement to construct. Each appears constantly in finance.

When we observe 100 stocks and attempt to say something about “the market,” we generalise from sample to population. When performance improves after adopting a strategy, we wonder whether the strategy caused improvement: generalising from observed to counterfactual. When we quantify risk using historical volatility, we generalise from imperfect measurement to underlying construct.

Kelly, Malamud, and Zhou (2024) recently demonstrated that while “complex models can severely outperform simple models in return prediction,” the practical gains from complexity are often modest. Feasible three-variable solutions achieve 100% out-of-sample effectiveness compared to theoretically optimal high-dimensional models. This validates Black’s 1986 insight: noise limits the benefits of sophistication.

Course Philosophy

This course develops the capabilities needed for responsible data science in technology-enabled financial services: statistical foundations that distinguish signal from noise through proper validation and uncertainty quantification, professional competence with Bloomberg Terminal data and production-ready implementations, and critical thinking that balances sophisticated methods with intellectual humility about their limitations.

Following Box and Draper (1987) wisdom that “all models are wrong, but some are useful,” we emphasize rigorous validation over performance theatre, causal reasoning over correlation mining, and honest evaluation over technological narratives. From Black’s insight about acting “in the dark,” through Gelman’s framework for generalisation under incomplete information, to modern machine learning: sophisticated tools require sophisticated understanding of their limitations.

Note on targets: We model and evaluate returns (simple/log, often excess returns) rather than prices, for stationarity and comparability. See the Primer’s “Financial Returns: Targets and Notation” for definitions and a short example.

Course Approach

Integrated Analytical Framework

  • Statistical Foundations: Distinguishing signal from noise through bias-variance tradeoff, regularisation, and validation
  • Causal Reasoning: Moving beyond correlation to understand mechanisms and counterfactuals
  • Technical Implementation: Python programming for production-ready financial systems with Bloomberg Terminal integration
  • Professional Practice: Rigorous workflows for model development, testing, and monitoring
  • FinTech Applications: Applying these methods to open banking, digital credit, robo-advisors, cryptocurrency, and algorithmic finance

Contact

Professor Barry Quinn
Professor of Finance and Financial Technology | Director, Centre for Finance and Responsible Technology
📧 b.quinn1@ulster.ac.uk | 🏢 Room BC-08-205B
Office Hours: By appointment
📄 Full CV/Resume


Ulster University Business School | BSc Finance and Investment Management/MSc FinTech Management

References

Black, Fischer. 1986. “Noise.” The Journal of Finance 41 (3): 528–43. https://doi.org/10.1111/j.1540-6261.1986.tb04513.x.
Box, George E. P., and Norman R. Draper. 1987. Empirical Model-Building and Response Surfaces. John Wiley & Sons.
Brooks, Chris. 2019. Introductory Econometrics for Finance. 4th ed. Cambridge, UK: Cambridge University Press.
Campbell, John Y., and Samuel B. Thompson. 2008. “Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?” Review of Financial Studies 21 (4): 1509–31. https://doi.org/10.1093/rfs/hhm055.
Efron, Bradley, and Trevor Hastie. 2016. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press. https://hastie.su.stanford.edu/CASI/.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge, UK: Cambridge University Press. https://avehtari.github.io/ROS-Examples/.
Goyal, Amit, and Ivo Welch. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21 (4): 1455–1508. https://doi.org/10.1093/rfs/hhm014.
Kelly, Bryan T., Semyon Malamud, and Kangying Zhou. 2024. “The Virtue of Complexity in Return Prediction.” Journal of Finance 79 (1): 459–503. https://doi.org/10.1111/jofi.13298.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.
Prado, Marcos López de. 2018. Advances in Financial Machine Learning. John Wiley & Sons.
Tsay, Ruey S. 2010. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: Wiley.