Lab 8: Fraud Detection as Rare-Event Classification

From synthetic transactions to real Bitcoin data (Elliptic dataset)

Expected Time

Part A (synthetic data): ~60 minutes
Part B (Elliptic Bitcoin): ~45 minutes
Extension (graph analysis): +30 minutes

Open in Colab

1 Learning Objectives

By the end of this lab you will be able to:

Demonstrate why accuracy is meaningless for rare-event classification
Build a supervised fraud detection pipeline with stratified CV and class weighting
Select a cost-sensitive decision threshold that minimises expected cost
Compare Isolation Forest (unsupervised) with supervised and hybrid approaches
Run walk-forward temporal validation on real Bitcoin transaction data
Quantify the look-ahead bias gap between shuffled and temporal CV

2 Connection to the slides

This lab directly implements the exercises from the Week 8 slides. Part A uses synthetic transaction data to teach the statistical principles (accuracy trap, cost-sensitive thresholds, temporal CV, hybrid methods). Part B applies the same pipeline to the Elliptic Bitcoin dataset (Weber et al. 2019), which provides genuine temporal drift and a +0.065 AUC look-ahead bias gap.

3 Data

Part A generates synthetic data inline (no external files needed).

Part B uses two parquet files in data/elliptic/:

elliptic_labelled.parquet (~25 MB): 46,564 labelled Bitcoin transactions with 166 anonymised features and 49 time steps
elliptic_edges_labelled.parquet (~0.4 MB): 36,624 directed edges between labelled transactions

See data/elliptic/README.md for the full dataset description and download instructions.

4 Part A: Synthetic Transaction Data

4.1 Exercise 1: Generate the dataset

Generate 50,000 synthetic card transactions with a ~1% fraud rate and temporal drift. Features mimic what a real fraud team works with: amount, hour, velocity, foreign merchant flag, account age, and spend ratio.

4.2 Exercise 2: The accuracy trap

Build the simplest possible model (predict every transaction as legitimate). Calculate accuracy and recall. This demonstrates the base rate fallacy from Week 1 and Chapter 05.

4.3 Exercise 3: Supervised pipeline

Train logistic regression with class_weight='balanced' using 5-fold stratified CV. Report both AUC and Average Precision.

4.4 Exercise 4: Default threshold disaster

Fit unweighted logistic regression. Predict using the default 0.5 threshold. Observe that the model catches zero fraud.

4.5 Exercise 5: Cost-sensitive threshold

Sweep thresholds from 0.005 to 0.50, plot the expected cost curve (£20 per false alarm, £1,000 per missed fraud), and find the optimal threshold.

4.6 Exercise 6: Isolation Forest

Fit Isolation Forest with 2% contamination. Plot anomaly score distributions for fraud vs legitimate. Report precision and recall.

4.7 Exercise 7: Hybrid model

Add the Isolation Forest anomaly score as a feature to the supervised model. Measure the AUC lift.

5 Part B: Elliptic Bitcoin Data

5.1 Exercise 8: Load and explore

Load the labelled parquet. Plot the illicit rate by time step to visualise the temporal drift.

5.2 Exercise 9: Shuffled CV vs walk-forward validation

Run 5-fold shuffled stratified CV and walk-forward validation (train on past time steps, test on future). Compare AUC. The +0.065 gap is the strongest evidence for temporal CV in the entire course.

5.3 Exercise 10 (Extension): Graph exploration

Load the edge list. Build a directed graph with NetworkX. Compute degree centrality. Test whether high-centrality nodes are more likely to be illicit.

6 Summary

Exercise	Key result	Lesson
2. Accuracy trap	99% accurate, 0 fraud caught	Accuracy is useless for rare events
3. Supervised pipeline	AUC ~0.78, AP ~0.04	AUC flatters; AP tells the truth
4. Default threshold	TP = 0 at threshold 0.5	Default threshold catches nothing
5. Cost-sensitive	Optimal ~0.015, catches ~50%	Threshold is a business decision
6. Isolation Forest	Low precision, low recall	Unusual ≠ fraudulent
7. Hybrid model	+0.02 AUC lift	Stack unsupervised into supervised
9. Elliptic walk-forward	+0.065 look-ahead bias gap	Temporal CV matters on real data

7 References

Weber, Mark, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics.” In KDD Workshop on Anomaly Detection in Finance. https://arxiv.org/abs/1908.02591.

--- title: "Lab 8: Fraud Detection as Rare-Event Classification" subtitle: "From synthetic transactions to real Bitcoin data (Elliptic dataset)" format: html: toc: true number-sections: true execute: echo: true eval: false warning: false message: false --- ::: callout-note ### Expected Time - Part A (synthetic data): ~60 minutes - Part B (Elliptic Bitcoin): ~45 minutes - Extension (graph analysis): +30 minutes ::: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab08_fraud_detection.ipynb) ## Learning Objectives By the end of this lab you will be able to: - Demonstrate why accuracy is meaningless for rare-event classification - Build a supervised fraud detection pipeline with stratified CV and class weighting - Select a cost-sensitive decision threshold that minimises expected cost - Compare Isolation Forest (unsupervised) with supervised and hybrid approaches - Run walk-forward temporal validation on real Bitcoin transaction data - Quantify the look-ahead bias gap between shuffled and temporal CV ## Connection to the slides This lab directly implements the exercises from the [Week 8 slides](../slides/week08_fraud_detection.qmd). **Part A** uses synthetic transaction data to teach the statistical principles (accuracy trap, cost-sensitive thresholds, temporal CV, hybrid methods). **Part B** applies the same pipeline to the **Elliptic Bitcoin dataset** [@weber2019anti], which provides genuine temporal drift and a +0.065 AUC look-ahead bias gap. ## Data **Part A** generates synthetic data inline (no external files needed). **Part B** uses two parquet files in `data/elliptic/`: - `elliptic_labelled.parquet` (~25 MB): 46,564 labelled Bitcoin transactions with 166 anonymised features and 49 time steps - `elliptic_edges_labelled.parquet` (~0.4 MB): 36,624 directed edges between labelled transactions See `data/elliptic/README.md` for the full dataset description and download instructions. ## Part A: Synthetic Transaction Data ### Exercise 1: Generate the dataset Generate 50,000 synthetic card transactions with a ~1% fraud rate and temporal drift. Features mimic what a real fraud team works with: amount, hour, velocity, foreign merchant flag, account age, and spend ratio. ### Exercise 2: The accuracy trap Build the simplest possible model (predict every transaction as legitimate). Calculate accuracy and recall. This demonstrates the base rate fallacy from [Week 1](../chapters/01_foundations.qmd#sec-base-rate) and [Chapter 05](../chapters/05_alt_finance_marketplace_lending.qmd#sec-class-imbalance). ### Exercise 3: Supervised pipeline Train logistic regression with `class_weight='balanced'` using 5-fold stratified CV. Report both AUC and Average Precision. ### Exercise 4: Default threshold disaster Fit unweighted logistic regression. Predict using the default 0.5 threshold. Observe that the model catches **zero** fraud. ### Exercise 5: Cost-sensitive threshold Sweep thresholds from 0.005 to 0.50, plot the expected cost curve (£20 per false alarm, £1,000 per missed fraud), and find the optimal threshold. ### Exercise 6: Isolation Forest Fit Isolation Forest with 2% contamination. Plot anomaly score distributions for fraud vs legitimate. Report precision and recall. ### Exercise 7: Hybrid model Add the Isolation Forest anomaly score as a feature to the supervised model. Measure the AUC lift. ## Part B: Elliptic Bitcoin Data ### Exercise 8: Load and explore Load the labelled parquet. Plot the illicit rate by time step to visualise the temporal drift. ### Exercise 9: Shuffled CV vs walk-forward validation Run 5-fold shuffled stratified CV and walk-forward validation (train on past time steps, test on future). Compare AUC. The +0.065 gap is the strongest evidence for temporal CV in the entire course. ### Exercise 10 (Extension): Graph exploration Load the edge list. Build a directed graph with NetworkX. Compute degree centrality. Test whether high-centrality nodes are more likely to be illicit. ## Summary | Exercise | Key result | Lesson | |----------|-----------|--------| | 2. Accuracy trap | 99% accurate, 0 fraud caught | Accuracy is useless for rare events | | 3. Supervised pipeline | AUC ~0.78, AP ~0.04 | AUC flatters; AP tells the truth | | 4. Default threshold | TP = 0 at threshold 0.5 | Default threshold catches nothing | | 5. Cost-sensitive | Optimal ~0.015, catches ~50% | Threshold is a business decision | | 6. Isolation Forest | Low precision, low recall | Unusual ≠ fraudulent | | 7. Hybrid model | +0.02 AUC lift | Stack unsupervised into supervised | | 9. Elliptic walk-forward | +0.065 look-ahead bias gap | Temporal CV matters on real data | ## References