Lab 8: Fraud Detection as Rare-Event Classification

From synthetic transactions to real Bitcoin data (Elliptic dataset)

NoteExpected Time
  • Part A (synthetic data): ~60 minutes
  • Part B (Elliptic Bitcoin): ~45 minutes
  • Extension (graph analysis): +30 minutes

Open in Colab

1 Learning Objectives

By the end of this lab you will be able to:

  • Demonstrate why accuracy is meaningless for rare-event classification
  • Build a supervised fraud detection pipeline with stratified CV and class weighting
  • Select a cost-sensitive decision threshold that minimises expected cost
  • Compare Isolation Forest (unsupervised) with supervised and hybrid approaches
  • Run walk-forward temporal validation on real Bitcoin transaction data
  • Quantify the look-ahead bias gap between shuffled and temporal CV

2 Connection to the slides

This lab directly implements the exercises from the Week 8 slides. Part A uses synthetic transaction data to teach the statistical principles (accuracy trap, cost-sensitive thresholds, temporal CV, hybrid methods). Part B applies the same pipeline to the Elliptic Bitcoin dataset (Weber et al. 2019), which provides genuine temporal drift and a +0.065 AUC look-ahead bias gap.

3 Data

Part A generates synthetic data inline (no external files needed).

Part B uses two parquet files in data/elliptic/:

  • elliptic_labelled.parquet (~25 MB): 46,564 labelled Bitcoin transactions with 166 anonymised features and 49 time steps
  • elliptic_edges_labelled.parquet (~0.4 MB): 36,624 directed edges between labelled transactions

See data/elliptic/README.md for the full dataset description and download instructions.

4 Part A: Synthetic Transaction Data

4.1 Exercise 1: Generate the dataset

Generate 50,000 synthetic card transactions with a ~1% fraud rate and temporal drift. Features mimic what a real fraud team works with: amount, hour, velocity, foreign merchant flag, account age, and spend ratio.

4.2 Exercise 2: The accuracy trap

Build the simplest possible model (predict every transaction as legitimate). Calculate accuracy and recall. This demonstrates the base rate fallacy from Week 1 and Chapter 05.

4.3 Exercise 3: Supervised pipeline

Train logistic regression with class_weight='balanced' using 5-fold stratified CV. Report both AUC and Average Precision.

4.4 Exercise 4: Default threshold disaster

Fit unweighted logistic regression. Predict using the default 0.5 threshold. Observe that the model catches zero fraud.

4.5 Exercise 5: Cost-sensitive threshold

Sweep thresholds from 0.005 to 0.50, plot the expected cost curve (£20 per false alarm, £1,000 per missed fraud), and find the optimal threshold.

4.6 Exercise 6: Isolation Forest

Fit Isolation Forest with 2% contamination. Plot anomaly score distributions for fraud vs legitimate. Report precision and recall.

4.7 Exercise 7: Hybrid model

Add the Isolation Forest anomaly score as a feature to the supervised model. Measure the AUC lift.

5 Part B: Elliptic Bitcoin Data

5.1 Exercise 8: Load and explore

Load the labelled parquet. Plot the illicit rate by time step to visualise the temporal drift.

5.2 Exercise 9: Shuffled CV vs walk-forward validation

Run 5-fold shuffled stratified CV and walk-forward validation (train on past time steps, test on future). Compare AUC. The +0.065 gap is the strongest evidence for temporal CV in the entire course.

5.3 Exercise 10 (Extension): Graph exploration

Load the edge list. Build a directed graph with NetworkX. Compute degree centrality. Test whether high-centrality nodes are more likely to be illicit.

6 Summary

Exercise Key result Lesson
2. Accuracy trap 99% accurate, 0 fraud caught Accuracy is useless for rare events
3. Supervised pipeline AUC ~0.78, AP ~0.04 AUC flatters; AP tells the truth
4. Default threshold TP = 0 at threshold 0.5 Default threshold catches nothing
5. Cost-sensitive Optimal ~0.015, catches ~50% Threshold is a business decision
6. Isolation Forest Low precision, low recall Unusual ≠ fraudulent
7. Hybrid model +0.02 AUC lift Stack unsupervised into supervised
9. Elliptic walk-forward +0.065 look-ahead bias gap Temporal CV matters on real data

7 References

Weber, Mark, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics.” In KDD Workshop on Anomaly Detection in Finance. https://arxiv.org/abs/1908.02591.