Lab 02: Survivorship Bias in UK Banking

In-Class Bloomberg Terminal Session : Week 2

Author

Professor Barry Quinn

Published

May 8, 2026

In-Class Lab Structure

This is the in-class Bloomberg lab for Week 2, conducted in the Financial Innovation Lab.

Before class: Complete the homework (yfinance pipeline exercise) to familiarise yourself with data pipeline concepts.

In class: We use professional Bloomberg data to examine survivorship bias using real UK banking crisis data. You can run the analysis part (Part 2) in Colab after exporting your Excel sheet: see Step 5 below and the Colab link.

Open in Colab

Lab Overview

Type: In-class lab (Bloomberg Terminal, supervised)

Duration: 2 hours

Location: Financial Innovation Lab (~20 Bloomberg terminals)

Learning Objectives:

Using professional-grade data, you will:

Extract historical price data for delisted securities from Bloomberg
Quantify survivorship bias using real UK banking crisis data
Compare your results with simulated estimates from Chapter 2
Understand how professional data differs from free APIs

Prerequisites:

Complete Homework 02 (yfinance pipeline) : reinforces pipeline concepts
Read Chapter 02: Data and Measurement in Finance
Bring laptop for Python analysis (Bloomberg extraction done on terminals)

Background: The UK Banking Crisis of 2008

The 2008 financial crisis provides a stark illustration of survivorship bias. If you look at UK banking returns today using only currently-listed banks, you miss the catastrophic losses from institutions that failed or were rescued.

Bank	What Happened	Approximate Loss
Northern Rock	Nationalised Feb 2008	~95% loss
Bradford & Bingley	Nationalised Sep 2008	~98% loss
HBOS	Acquired by Lloyds Jan 2009	~82% loss before rescue
Royal Bank of Scotland	Government bailout	~90% loss at nadir

Students analysing “UK bank performance” using only current FTSE 100 banks would completely miss these catastrophic events.

Part 1: Bloomberg Data Extraction (60 minutes)

Step 1: Identify Securities to Extract

We’ll extract monthly adjusted prices for January 2005 to December 2010 for the following UK banks:

Survivors (current FTSE 100 banks):

Ticker	Name
`BARC LN Equity`	Barclays
`LLOY LN Equity`	Lloyds Banking Group
`HSBA LN Equity`	HSBC Holdings
`STAN LN Equity`	Standard Chartered
`RBS LN Equity`	Royal Bank of Scotland (now NatWest Group)

Failed banks (delisted during crisis):

Ticker	Name	Delisted	End date for data
`NRK LN Equity`	Northern Rock	Feb 2008	22 Feb 2008
`BB/ LN Equity`	Bradford & Bingley	Sep 2008	29 Sep 2008
`HBOS LN Equity`	HBOS	Jan 2009	16 Jan 2009

Handling Delisted Securities

Bloomberg retains historical data for delisted securities, but you must specify the correct date range ending at the delisting date (not today’s date). The Historical Time Series Wizard handles this automatically when you set custom date ranges.

Step 2: Bloomberg Setup

Complete these steps each time you use the Bloomberg Terminal:

Log in to the Bloomberg Terminal
Open Excel then close it again
Install the Bloomberg Excel Add-in (right-hand side of home screen : required each session)
Open Excel and begin extraction

Step 3: Extract Data Using the Historical Time Series Wizard

Bloomberg’s Historical Time Series Wizard provides a point-and-click interface for data extraction (much easier than writing BDH formulas manually!).

3.1 Launch the Wizard

In Excel, go to the Bloomberg tab in the ribbon.
Click “Function Build Wizard” (or similar: look for a wizard icon).
A wizard panel appears on the right side of the screen.
In the wizard, click “Historical Time Series” (listed as a new function).

3.2 Extract Survivor Banks (All 5, Same Date Range)

The wizard takes you through 4 steps:

Step 1: Securities

Click the search box and find each survivor bank:
- BARC LN Equity (Barclays)
- LLOY LN Equity (Lloyds)
- HSBA LN Equity (HSBC)
- STAN LN Equity (Standard Chartered)
- RBS LN Equity (Royal Bank of Scotland)
The wizard will show the Bloomberg IDs as you search.
Click Next.

Step 2: Fields

Choose PX_LAST (last price, adjusted for splits/dividends).
Click Next.

Step 3: Time Period and Frequency

Start date: 01/01/2005
End date: 31/12/2010
Frequency: Monthly
Click Next.

Step 4: Output Format

The default format is usually fine. The wizard will show a preview of the layout (dates in rows, securities in columns).
You can see examples of this layout in the raw Bloomberg .xlsx files in this project’s data folder.
Click Next.

Step 5: Output to Excel

Before clicking Finish: Select the top-left cell (e.g., A1) where you want the data to appear.
Click Finish (or equivalent button). The data will populate starting from the selected cell.

Your Excel sheet should now have columns: Date, BARC LN Equity, LLOY LN Equity, HSBA LN Equity, STAN LN Equity, RBS LN Equity.

3.3 Extract Failed Banks (3 Securities, Different Date Ranges)

For the failed banks, run the wizard separately for each because they have different delisting dates:

Northern Rock (NRK LN Equity):

Run the wizard again (Steps 1-4).
Securities: NRK LN Equity
Fields: PX_LAST
Time period: 01/01/2005 to 22/02/2008 (delisting date), Monthly
Output to a new column next to your survivor data (e.g., column G).

Bradford & Bingley (BB/ LN Equity):

Run the wizard again.
Securities: BB/ LN Equity
Fields: PX_LAST
Time period: 01/01/2005 to 29/09/2008, Monthly
Output to the next column (e.g., column H).

HBOS (HBOS LN Equity):

Run the wizard again.
Securities: HBOS LN Equity
Fields: PX_LAST
Time period: 01/01/2005 to 16/01/2009, Monthly
Output to the next column (e.g., column I).

Why Separate Extractions for Failed Banks?

Each failed bank has a different delisting date, so they need different end dates in the wizard. Running the wizard three times (once per failed bank) ensures you get data up to each bank’s delisting date without errors.

3.4 Verify Your Data

Your Excel sheet should now have:

Columns: Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS
Rows: Monthly dates from Jan 2005 to Dec 2010
Missing values: NRK, BB, and HBOS will have blank cells after their delisting dates (this is correct: they stopped trading).

Troubleshooting

Wizard not appearing? Make sure you installed the Bloomberg Excel Add-in (Step 2.2).
“Security not found” error? Bloomberg IDs are case-sensitive and require exact formatting (e.g., BARC LN Equity not BARC LN). Use the wizard’s search to ensure correct IDs.
Data not populating? Check that you selected a cell before clicking Finish in Step 5.

Advanced: Manual BDH Formulas (Optional)

If you’re comfortable with Excel formulas, you can use Bloomberg’s BDH() function directly instead of the wizard. This is faster if you know the exact syntax:

Survivors (all have same date range):

=BDH("BARC LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A")
=BDH("LLOY LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A")
=BDH("HSBA LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A")
=BDH("STAN LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A")
=BDH("RBS LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A")

Failed banks (different end dates):

=BDH("NRK LN Equity","PX_LAST","2005-01-01","2008-02-22","Per=M","Days=A")
=BDH("BB/ LN Equity","PX_LAST","2005-01-01","2008-09-29","Per=M","Days=A")
=BDH("HBOS LN Equity","PX_LAST","2005-01-01","2009-01-16","Per=M","Days=A")

Type these formulas in cells, and Bloomberg will populate the data automatically.

Step 4: Save Your Data

Export to CSV with the following structure:

Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS
2005-01-31, 523.5, 445.2, 847.0, 1023.5, 1845.0, 987.5, 423.0, 845.0
...

Save as: uk_banks_2005_2010_with_failures.csv

Step 5: Import your sheet into Colab (or Python)

If you ran Part 1 in the Financial Innovation Lab and have your data in Excel, use these steps to run the analysis in Colab (or on your laptop).

5.1 Export from Excel to CSV

In Excel, with your Bloomberg data sheet active: File → Save As (or Save a Copy).
Choose CSV (Comma delimited) (*.csv) as the file type.
Save as uk_banks_2005_2010_with_failures.csv (or keep your filename: you’ll upload it in the next step).
Ensure the first row is the header: Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS (column names may have spaces; the code below handles common variants).

A quirk of Bloomberg exports: delisted banks

If you use a pre-extracted Bloomberg file (e.g. the raw xlsx on Blackboard), you may notice that failed banks still have prices all the way to the end of the sample. Bloomberg sometimes carries forward the last known price for delisted securities, rather than leaving cells blank. That looks like “complete” data: but it masks the delisting event. For survivorship bias analysis, we need NA (or blank) after each bank’s delisting date. Northern Rock delisted Feb 2008, Bradford & Bingley Sep 2008, HBOS Jan 2009. Part of your data wrangling is to validate the data and set values to NA after those dates. This is exactly the kind of data quality check that separates professional analysis from blindly trusting a spreadsheet.

5.2 In Colab: upload your file

Run the next cell. If you are in Google Colab, it will prompt you to Choose Files and upload your CSV. Upload the file you just exported. If you are running locally and your CSV is in the same folder as the notebook, you can skip the upload and the code will look for uk_banks_2005_2010_with_failures.csv in the current directory. If no file is found, the lab uses sample data so you can still complete the analysis (e.g. if you missed the Bloomberg session).

import pandas as pd
import numpy as np
from pathlib import Path

# In Colab: prompt for file upload. Otherwise we'll try local path or use sample data.
IN_COLAB = False
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    pass

uploaded_path = None
if IN_COLAB:
    from google.colab import files
    uploaded = files.upload()  # Opens file picker; upload uk_banks_2005_2010_with_failures.csv
    if uploaded:
        # Use the first (and usually only) uploaded file
        uploaded_path = list(uploaded.keys())[0]
        print(f"Uploaded: {uploaded_path}")
else:
    for name in ["uk_banks_2005_2010_with_failures.csv", "uk_banks_2005_2010_with_failures (1).csv"]:
        if Path(name).exists():
            uploaded_path = name
            print(f"Using local file: {uploaded_path}")
            break
if uploaded_path is None:
    print("No CSV uploaded or found. The next cell will use sample data so you can still run the analysis.")

No CSV uploaded or found. The next cell will use sample data so you can still run the analysis.

5.3 Load your data (or sample data)

The next cell loads your CSV if you uploaded one, or creates sample data with the same structure so you can complete the lab.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def load_uk_banks_csv(path):
    """Load and normalise UK banks CSV from Bloomberg Excel export."""
    df = pd.read_csv(path)
    # Normalise column names (strip spaces, match expected symbols)
    df.columns = df.columns.str.strip()
    # Date column may be named 'Date' or first column
    date_col = 'Date' if 'Date' in df.columns else df.columns[0]
    df[date_col] = pd.to_datetime(df[date_col])
    df = df.set_index(date_col)
    # Ensure we have expected tickers (allow slight name variants)
    renames = {}
    for c in df.columns:
        c_upper = c.upper().strip()
        if c_upper in ('BARC', 'LLOY', 'HSBA', 'STAN', 'RBS', 'NRK', 'BB', 'HBOS'):
            renames[c] = c_upper if c_upper != 'BB' else 'BB'
    df = df.rename(columns=renames)
    return df.sort_index()

def make_sample_data():
    """Sample data matching UK banking crisis patterns (for use without Bloomberg data)."""
    dates = pd.date_range('2005-01-01', '2010-12-31', freq='ME')
    np.random.seed(42)
    def create_price_path(start, crisis_low, end, crisis_month=44):
        n = len(dates)
        pre_crisis = np.linspace(start, start * 1.3, crisis_month)
        crisis = np.linspace(start * 1.3, crisis_low, 6)
        recovery = np.linspace(crisis_low, end, n - crisis_month - 6)
        return np.concatenate([pre_crisis, crisis, recovery])
    return pd.DataFrame({
        'BARC': create_price_path(100, 15, 45),
        'LLOY': create_price_path(100, 12, 35),
        'HSBA': create_price_path(100, 55, 85),
        'STAN': create_price_path(100, 45, 90),
        'RBS': create_price_path(100, 8, 25),
        'NRK': np.concatenate([create_price_path(100, 5, 5)[:38], [np.nan]*34]),
        'BB': np.concatenate([create_price_path(100, 2, 2)[:45], [np.nan]*27]),
        'HBOS': np.concatenate([create_price_path(100, 18, 18)[:48], [np.nan]*24]),
    }, index=dates)

# Load uploaded/local CSV or fall back to sample data
if 'uploaded_path' in dir() and uploaded_path is not None:
    try:
        prices = load_uk_banks_csv(uploaded_path)
        print("Loaded your Bloomberg export.")
    except Exception as e:
        print(f"Could not load uploaded file: {e}. Using sample data.")
        prices = make_sample_data()
else:
    # Try local path (e.g. when running from repo)
    local = Path("uk_banks_2005_2010_with_failures.csv")
    if not local.exists():
        local = Path("data/bloomberg_database/uk_banks_2005_2010_with_failures.csv")
    if local.exists():
        prices = load_uk_banks_csv(local)
        print("Loaded local Bloomberg data.")
    else:
        prices = make_sample_data()
        print("Using sample data (no CSV found). You can still complete the analysis.")

print(f"Date range: {prices.index[0].strftime('%Y-%m')} to {prices.index[-1].strftime('%Y-%m')}")
print(f"Securities: {list(prices.columns)}")

Using sample data (no CSV found). You can still complete the analysis.
Date range: 2005-01 to 2010-12
Securities: ['BARC', 'LLOY', 'HSBA', 'STAN', 'RBS', 'NRK', 'BB', 'HBOS']

Part 2: Survivorship Bias Analysis (60 minutes)

Calculate Returns

# Calculate monthly returns
returns = prices.pct_change().dropna()

# Identify survivor vs failed groups
survivors = ['BARC', 'LLOY', 'HSBA', 'STAN', 'RBS']
failed = ['NRK', 'BB', 'HBOS']

print("\nMonthly Return Statistics (Full Period):")
print("=" * 50)
print("\nSurvivors:")
print(returns[survivors].describe().round(4))
print("\nFailed Banks (until delisting):")
print(returns[failed].describe().round(4))

Quantify Survivorship Bias

def calculate_portfolio_return(returns_df, columns, equal_weight=True):
    """
    Calculate portfolio returns, handling missing values properly.
    
    When a stock delists, we assume:
    - Final return is the loss to delisting price
    - Capital is then reallocated to remaining stocks
    """
    portfolio_returns = []
    
    for date in returns_df.index:
        # Get returns for this date (only non-null values)
        available = returns_df.loc[date, columns].dropna()
        if len(available) > 0:
            if equal_weight:
                portfolio_returns.append(available.mean())
            else:
                portfolio_returns.append(available.sum() / len(columns))
        else:
            portfolio_returns.append(np.nan)
    
    return pd.Series(portfolio_returns, index=returns_df.index)

# Calculate returns for different portfolios
survivors_only = calculate_portfolio_return(returns, survivors)
all_banks = calculate_portfolio_return(returns, survivors + failed)

# Cumulative wealth (starting with £100)
wealth_survivors = 100 * (1 + survivors_only).cumprod()
wealth_all = 100 * (1 + all_banks).cumprod()

print("\nSurvivorship Bias Analysis")
print("=" * 50)
print(f"\nCumulative Return (Jan 2005 - Dec 2010):")
print(f"  Survivors only: {(wealth_survivors.iloc[-1]/100 - 1)*100:.1f}%")
print(f"  Including failed: {(wealth_all.iloc[-1]/100 - 1)*100:.1f}%")
print(f"\nSurvivorship Bias: {(wealth_survivors.iloc[-1] - wealth_all.iloc[-1]):.1f} percentage points")
print(f"\nAnnualised Return:")
years = 6
print(f"  Survivors only: {((wealth_survivors.iloc[-1]/100)**(1/years) - 1)*100:.2f}% p.a.")
print(f"  Including failed: {((wealth_all.iloc[-1]/100)**(1/years) - 1)*100:.2f}% p.a.")

Visualise the Difference

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Individual stock prices (normalised)
ax1 = axes[0, 0]
normalised = prices.div(prices.iloc[0]) * 100
for col in survivors:
    normalised[col].plot(ax=ax1, label=col, alpha=0.7)
for col in failed:
    normalised[col].dropna().plot(ax=ax1, label=f'{col} (failed)', linestyle='--', alpha=0.7)
ax1.axhline(y=100, color='gray', linestyle=':', alpha=0.5)
ax1.set_ylabel('Price (Jan 2005 = 100)')
ax1.set_title('UK Bank Stock Prices 2005-2010')
ax1.legend(loc='upper left', fontsize=8)
ax1.grid(alpha=0.3)

# 2. Cumulative portfolio wealth
ax2 = axes[0, 1]
wealth_survivors.plot(ax=ax2, label='Survivors Only', color='blue', lw=2)
wealth_all.plot(ax=ax2, label='Including Failed Banks', color='red', lw=2)
ax2.axhline(y=100, color='gray', linestyle=':', alpha=0.5)
ax2.fill_between(wealth_survivors.index, wealth_survivors, wealth_all, 
                  alpha=0.3, color='coral', label='Survivorship Bias')
ax2.set_ylabel('Portfolio Value (£100 initial)')
ax2.set_title('The Cost of Survivorship Bias')
ax2.legend()
ax2.grid(alpha=0.3)

# 3. Rolling 12-month return difference
ax3 = axes[1, 0]
rolling_diff = (survivors_only.rolling(12).mean() - all_banks.rolling(12).mean()) * 100
rolling_diff.plot(ax=ax3, color='coral', lw=2)
ax3.axhline(y=0, color='gray', linestyle='-', lw=0.5)
ax3.fill_between(rolling_diff.index, 0, rolling_diff, alpha=0.3, color='coral')
ax3.set_ylabel('Monthly Return Difference (%)')
ax3.set_title('Rolling 12-Month Survivorship Bias')
ax3.grid(alpha=0.3)

# 4. Return distributions
ax4 = axes[1, 1]
ax4.hist(survivors_only.dropna() * 100, bins=30, alpha=0.6, label='Survivors Only', color='blue')
ax4.hist(all_banks.dropna() * 100, bins=30, alpha=0.6, label='Including Failed', color='red')
ax4.axvline(x=survivors_only.mean()*100, color='blue', linestyle='--', lw=2)
ax4.axvline(x=all_banks.mean()*100, color='red', linestyle='--', lw=2)
ax4.set_xlabel('Monthly Return (%)')
ax4.set_ylabel('Frequency')
ax4.set_title('Return Distribution Comparison')
ax4.legend()
ax4.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('survivorship_bias_uk_banks.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nFigure saved as 'survivorship_bias_uk_banks.png'")

Discussion Questions

Magnitude of Bias: How does the survivorship bias you calculated compare to the ~1% often cited in academic studies? Why might crisis periods show larger bias?
Practical Implications: If you were backtesting a UK banking sector strategy, how would ignoring failed banks affect your:
- Estimated Sharpe ratio?
- Value-at-Risk calculations?
- Confidence in the strategy?
Data Challenges: What practical difficulties did you encounter when trying to get data for delisted securities? How might this affect academic research?
Beyond Banks: Where else might survivorship bias be particularly severe? Consider:
- Emerging market equities
- Cryptocurrency
- Hedge funds
- Small-cap stocks

Deliverables

Submit a short report (max 2 pages) containing:

Data Summary: Table showing the securities analysed, date ranges, and data sources
Key Calculation: Your estimate of survivorship bias in UK banking returns 2005-2010 (annualised percentage)
Visualisation: One chart showing the cumulative wealth paths with and without failed banks
Reflection: One paragraph on implications for your own future analysis work

Extension: Automating Survivorship-Free Databases

For students interested in production systems:

class SurvivorshipFreeDatabase:
    """
    Framework for maintaining survivorship-free financial databases.
    
    Key principles:
    1. Capture all securities at point-in-time
    2. Record delisting events with final prices
    3. Maintain audit trail of universe changes
    """
    
    def __init__(self):
        self.securities = {}  # ticker -> metadata
        self.prices = pd.DataFrame()
        self.events = []  # (date, ticker, event_type, details)
    
    def add_security(self, ticker, metadata):
        """Add a new security to track."""
        self.securities[ticker] = metadata
        self.events.append((pd.Timestamp.now(), ticker, 'ADDED', metadata))
    
    def record_delisting(self, ticker, date, reason, final_price):
        """Record a delisting event with proper handling."""
        self.events.append((date, ticker, 'DELISTED', {
            'reason': reason,
            'final_price': final_price
        }))
        # Mark security as inactive but retain all historical data
        self.securities[ticker]['status'] = 'DELISTED'
        self.securities[ticker]['delisting_date'] = date
    
    def get_universe_at_date(self, date):
        """Return the set of active securities at a specific date."""
        active = []
        for ticker, meta in self.securities.items():
            if meta.get('status') != 'DELISTED':
                active.append(ticker)
            elif meta.get('delisting_date', pd.Timestamp.max) > date:
                active.append(ticker)
        return active
    
    def calculate_returns(self, start_date, end_date, survivorship_free=True):
        """
        Calculate portfolio returns with proper handling of delistings.
        
        If survivorship_free=True, uses point-in-time universe.
        If False, uses only securities that survived to end_date.
        """
        pass  # Implementation left as exercise

# This framework ensures you never accidentally introduce survivorship bias

References

Elton, E.J., Gruber, M.J., & Blake, C.R. (1996). Survivor Bias and Mutual Fund Performance. Review of Financial Studies, 9(4), 1097-1120.
Brown, S.J., Goetzmann, W., Ibbotson, R.G., & Ross, S.A. (1992). Survivorship Bias in Performance Studies. Review of Financial Studies, 5(4), 553-580.
Rohleder, M., Scholz, H., & Wilkens, M. (2011). Survivorship Bias and Mutual Fund Performance: Relevance, Significance, and Methodical Differences. Review of Finance, 15(2), 441-474.

--- title: "Lab 02: Survivorship Bias in UK Banking" subtitle: "In-Class Bloomberg Terminal Session : Week 2" author: "Professor Barry Quinn" date: today format: html: code-fold: false code-tools: true execute: warning: false message: false echo: true eval: true jupyter: fin510 --- ::: {.callout-important} ## In-Class Lab Structure This is the **in-class Bloomberg lab** for Week 2, conducted in the Financial Innovation Lab. **Before class**: Complete the homework (yfinance pipeline exercise) to familiarise yourself with data pipeline concepts. **In class**: We use professional Bloomberg data to examine survivorship bias using real UK banking crisis data. You can run the **analysis part** (Part 2) in Colab after exporting your Excel sheet: see Step 5 below and the Colab link. ::: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab02_survivorship_bias_bloomberg.ipynb) ## Lab Overview **Type**: In-class lab (Bloomberg Terminal, supervised) **Duration**: 2 hours **Location**: Financial Innovation Lab (~20 Bloomberg terminals) **Learning Objectives**: Using professional-grade data, you will: 1. Extract historical price data for delisted securities from Bloomberg 2. Quantify survivorship bias using real UK banking crisis data 3. Compare your results with simulated estimates from Chapter 2 4. Understand how professional data differs from free APIs **Prerequisites**: - Complete Homework 02 (yfinance pipeline) : reinforces pipeline concepts - Read Chapter 02: Data and Measurement in Finance - Bring laptop for Python analysis (Bloomberg extraction done on terminals) ## Background: The UK Banking Crisis of 2008 The 2008 financial crisis provides a stark illustration of survivorship bias. If you look at UK banking returns today using only currently-listed banks, you miss the catastrophic losses from institutions that failed or were rescued. | Bank | What Happened | Approximate Loss | |------|---------------|------------------| | **Northern Rock** | Nationalised Feb 2008 | ~95% loss | | **Bradford & Bingley** | Nationalised Sep 2008 | ~98% loss | | **HBOS** | Acquired by Lloyds Jan 2009 | ~82% loss before rescue | | **Royal Bank of Scotland** | Government bailout | ~90% loss at nadir | Students analysing "UK bank performance" using only current FTSE 100 banks would completely miss these catastrophic events. ## Part 1: Bloomberg Data Extraction (60 minutes) ### Step 1: Identify Securities to Extract We'll extract monthly adjusted prices for **January 2005 to December 2010** for the following UK banks: **Survivors (current FTSE 100 banks):** | Ticker | Name | |--------|------| | `BARC LN Equity` | Barclays | | `LLOY LN Equity` | Lloyds Banking Group | | `HSBA LN Equity` | HSBC Holdings | | `STAN LN Equity` | Standard Chartered | | `RBS LN Equity` | Royal Bank of Scotland (now NatWest Group) | **Failed banks (delisted during crisis):** | Ticker | Name | Delisted | End date for data | |--------|------|----------|-------------------| | `NRK LN Equity` | Northern Rock | Feb 2008 | 22 Feb 2008 | | `BB/ LN Equity` | Bradford & Bingley | Sep 2008 | 29 Sep 2008 | | `HBOS LN Equity` | HBOS | Jan 2009 | 16 Jan 2009 | ::: {.callout-tip} ## Handling Delisted Securities Bloomberg retains historical data for delisted securities, but you must specify the correct date range ending at the delisting date (not today's date). The Historical Time Series Wizard handles this automatically when you set custom date ranges. ::: ### Step 2: Bloomberg Setup **Complete these steps each time you use the Bloomberg Terminal:** 1. Log in to the Bloomberg Terminal 2. Open Excel then close it again 3. Install the **Bloomberg Excel Add-in** (right-hand side of home screen : required each session) 4. Open Excel and begin extraction ### Step 3: Extract Data Using the Historical Time Series Wizard Bloomberg's **Historical Time Series Wizard** provides a point-and-click interface for data extraction (much easier than writing BDH formulas manually!). #### 3.1 Launch the Wizard 1. In Excel, go to the **Bloomberg tab** in the ribbon. 2. Click **"Function Build Wizard"** (or similar: look for a wizard icon). 3. A wizard panel appears on the **right side** of the screen. 4. In the wizard, click **"Historical Time Series"** (listed as a new function). #### 3.2 Extract Survivor Banks (All 5, Same Date Range) **The wizard takes you through 4 steps:** **Step 1: Securities** - Click the search box and find each survivor bank: - `BARC LN Equity` (Barclays) - `LLOY LN Equity` (Lloyds) - `HSBA LN Equity` (HSBC) - `STAN LN Equity` (Standard Chartered) - `RBS LN Equity` (Royal Bank of Scotland) - The wizard will show the Bloomberg IDs as you search. - Click **Next**. **Step 2: Fields** - Choose `PX_LAST` (last price, adjusted for splits/dividends). - Click **Next**. **Step 3: Time Period and Frequency** - Start date: `01/01/2005` - End date: `31/12/2010` - Frequency: **Monthly** - Click **Next**. **Step 4: Output Format** - The default format is usually fine. The wizard will show a preview of the layout (dates in rows, securities in columns). - You can see examples of this layout in the raw Bloomberg `.xlsx` files in this project's data folder. - Click **Next**. **Step 5: Output to Excel** - **Before clicking Finish**: Select the **top-left cell** (e.g., `A1`) where you want the data to appear. - Click **Finish** (or equivalent button). The data will populate starting from the selected cell. Your Excel sheet should now have columns: `Date`, `BARC LN Equity`, `LLOY LN Equity`, `HSBA LN Equity`, `STAN LN Equity`, `RBS LN Equity`. #### 3.3 Extract Failed Banks (3 Securities, Different Date Ranges) For the failed banks, **run the wizard separately for each** because they have different delisting dates: **Northern Rock (NRK LN Equity):** - Run the wizard again (Steps 1-4). - Securities: `NRK LN Equity` - Fields: `PX_LAST` - Time period: `01/01/2005` to `22/02/2008` (delisting date), Monthly - Output to a **new column** next to your survivor data (e.g., column G). **Bradford & Bingley (BB/ LN Equity):** - Run the wizard again. - Securities: `BB/ LN Equity` - Fields: `PX_LAST` - Time period: `01/01/2005` to `29/09/2008`, Monthly - Output to the next column (e.g., column H). **HBOS (HBOS LN Equity):** - Run the wizard again. - Securities: `HBOS LN Equity` - Fields: `PX_LAST` - Time period: `01/01/2005` to `16/01/2009`, Monthly - Output to the next column (e.g., column I). ::: {.callout-note} ## Why Separate Extractions for Failed Banks? Each failed bank has a different delisting date, so they need different end dates in the wizard. Running the wizard three times (once per failed bank) ensures you get data up to each bank's delisting date without errors. ::: #### 3.4 Verify Your Data Your Excel sheet should now have: - **Columns**: Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS - **Rows**: Monthly dates from Jan 2005 to Dec 2010 - **Missing values**: NRK, BB, and HBOS will have blank cells after their delisting dates (this is correct: they stopped trading). ::: {.callout-warning} ## Troubleshooting - **Wizard not appearing?** Make sure you installed the Bloomberg Excel Add-in (Step 2.2). - **"Security not found" error?** Bloomberg IDs are case-sensitive and require exact formatting (e.g., `BARC LN Equity` not `BARC LN`). Use the wizard's search to ensure correct IDs. - **Data not populating?** Check that you selected a cell before clicking Finish in Step 5. ::: ::: {.callout-tip collapse="true"} ## Advanced: Manual BDH Formulas (Optional) If you're comfortable with Excel formulas, you can use Bloomberg's `BDH()` function directly instead of the wizard. This is faster if you know the exact syntax: **Survivors (all have same date range):** ```excel =BDH("BARC LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A") =BDH("LLOY LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A") =BDH("HSBA LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A") =BDH("STAN LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A") =BDH("RBS LN Equity","PX_LAST","2005-01-01","2010-12-31","Per=M","Days=A") ``` **Failed banks (different end dates):** ```excel =BDH("NRK LN Equity","PX_LAST","2005-01-01","2008-02-22","Per=M","Days=A") =BDH("BB/ LN Equity","PX_LAST","2005-01-01","2008-09-29","Per=M","Days=A") =BDH("HBOS LN Equity","PX_LAST","2005-01-01","2009-01-16","Per=M","Days=A") ``` Type these formulas in cells, and Bloomberg will populate the data automatically. ::: ### Step 4: Save Your Data Export to CSV with the following structure: ``` Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS 2005-01-31, 523.5, 445.2, 847.0, 1023.5, 1845.0, 987.5, 423.0, 845.0 ... ``` Save as: `uk_banks_2005_2010_with_failures.csv` ### Step 5: Import your sheet into Colab (or Python) If you ran Part 1 in the Financial Innovation Lab and have your data in Excel, use these steps to run the analysis in Colab (or on your laptop). **5.1 Export from Excel to CSV** 1. In Excel, with your Bloomberg data sheet active: **File → Save As** (or **Save a Copy**). 2. Choose **CSV (Comma delimited) (\*.csv)** as the file type. 3. Save as `uk_banks_2005_2010_with_failures.csv` (or keep your filename: you’ll upload it in the next step). 4. Ensure the first row is the header: `Date, BARC, LLOY, HSBA, STAN, RBS, NRK, BB, HBOS` (column names may have spaces; the code below handles common variants). ::: {.callout-note} ## A quirk of Bloomberg exports: delisted banks If you use a pre-extracted Bloomberg file (e.g. the raw xlsx on Blackboard), you may notice that **failed banks still have prices all the way to the end of the sample**. Bloomberg sometimes carries forward the last known price for delisted securities, rather than leaving cells blank. That looks like "complete" data: but it masks the delisting event. For survivorship bias analysis, we need NA (or blank) after each bank's delisting date. Northern Rock delisted Feb 2008, Bradford & Bingley Sep 2008, HBOS Jan 2009. Part of your data wrangling is to validate the data and set values to NA after those dates. This is exactly the kind of data quality check that separates professional analysis from blindly trusting a spreadsheet. ::: **5.2 In Colab: upload your file** Run the next cell. If you are in **Google Colab**, it will prompt you to **Choose Files** and upload your CSV. Upload the file you just exported. If you are running locally and your CSV is in the same folder as the notebook, you can skip the upload and the code will look for `uk_banks_2005_2010_with_failures.csv` in the current directory. If no file is found, the lab uses **sample data** so you can still complete the analysis (e.g. if you missed the Bloomberg session). ```{python} #| label: upload-csv #| eval: true import pandas as pd import numpy as np from pathlib import Path # In Colab: prompt for file upload. Otherwise we'll try local path or use sample data. IN_COLAB = False try: import google.colab IN_COLAB = True except ImportError: pass uploaded_path = None if IN_COLAB: from google.colab import files uploaded = files.upload() # Opens file picker; upload uk_banks_2005_2010_with_failures.csv if uploaded: # Use the first (and usually only) uploaded file uploaded_path = list(uploaded.keys())[0] print(f"Uploaded: {uploaded_path}") else: for name in ["uk_banks_2005_2010_with_failures.csv", "uk_banks_2005_2010_with_failures (1).csv"]: if Path(name).exists(): uploaded_path = name print(f"Using local file: {uploaded_path}") break if uploaded_path is None: print("No CSV uploaded or found. The next cell will use sample data so you can still run the analysis.") ``` **5.3 Load your data (or sample data)** The next cell loads your CSV if you uploaded one, or creates sample data with the same structure so you can complete the lab. ```{python} #| label: load-data #| eval: true import pandas as pd import numpy as np import matplotlib.pyplot as plt def load_uk_banks_csv(path): """Load and normalise UK banks CSV from Bloomberg Excel export.""" df = pd.read_csv(path) # Normalise column names (strip spaces, match expected symbols) df.columns = df.columns.str.strip() # Date column may be named 'Date' or first column date_col = 'Date' if 'Date' in df.columns else df.columns[0] df[date_col] = pd.to_datetime(df[date_col]) df = df.set_index(date_col) # Ensure we have expected tickers (allow slight name variants) renames = {} for c in df.columns: c_upper = c.upper().strip() if c_upper in ('BARC', 'LLOY', 'HSBA', 'STAN', 'RBS', 'NRK', 'BB', 'HBOS'): renames[c] = c_upper if c_upper != 'BB' else 'BB' df = df.rename(columns=renames) return df.sort_index() def make_sample_data(): """Sample data matching UK banking crisis patterns (for use without Bloomberg data).""" dates = pd.date_range('2005-01-01', '2010-12-31', freq='ME') np.random.seed(42) def create_price_path(start, crisis_low, end, crisis_month=44): n = len(dates) pre_crisis = np.linspace(start, start * 1.3, crisis_month) crisis = np.linspace(start * 1.3, crisis_low, 6) recovery = np.linspace(crisis_low, end, n - crisis_month - 6) return np.concatenate([pre_crisis, crisis, recovery]) return pd.DataFrame({ 'BARC': create_price_path(100, 15, 45), 'LLOY': create_price_path(100, 12, 35), 'HSBA': create_price_path(100, 55, 85), 'STAN': create_price_path(100, 45, 90), 'RBS': create_price_path(100, 8, 25), 'NRK': np.concatenate([create_price_path(100, 5, 5)[:38], [np.nan]*34]), 'BB': np.concatenate([create_price_path(100, 2, 2)[:45], [np.nan]*27]), 'HBOS': np.concatenate([create_price_path(100, 18, 18)[:48], [np.nan]*24]), }, index=dates) # Load uploaded/local CSV or fall back to sample data if 'uploaded_path' in dir() and uploaded_path is not None: try: prices = load_uk_banks_csv(uploaded_path) print("Loaded your Bloomberg export.") except Exception as e: print(f"Could not load uploaded file: {e}. Using sample data.") prices = make_sample_data() else: # Try local path (e.g. when running from repo) local = Path("uk_banks_2005_2010_with_failures.csv") if not local.exists(): local = Path("data/bloomberg_database/uk_banks_2005_2010_with_failures.csv") if local.exists(): prices = load_uk_banks_csv(local) print("Loaded local Bloomberg data.") else: prices = make_sample_data() print("Using sample data (no CSV found). You can still complete the analysis.") print(f"Date range: {prices.index[0].strftime('%Y-%m')} to {prices.index[-1].strftime('%Y-%m')}") print(f"Securities: {list(prices.columns)}") ``` --- ## Part 2: Survivorship Bias Analysis (60 minutes) ### Calculate Returns ```{python} #| label: calculate-returns #| eval: false # Calculate monthly returns returns = prices.pct_change().dropna() # Identify survivor vs failed groups survivors = ['BARC', 'LLOY', 'HSBA', 'STAN', 'RBS'] failed = ['NRK', 'BB', 'HBOS'] print("\nMonthly Return Statistics (Full Period):") print("=" * 50) print("\nSurvivors:") print(returns[survivors].describe().round(4)) print("\nFailed Banks (until delisting):") print(returns[failed].describe().round(4)) ``` ### Quantify Survivorship Bias ```{python} #| label: survivorship-analysis #| eval: false def calculate_portfolio_return(returns_df, columns, equal_weight=True): """ Calculate portfolio returns, handling missing values properly. When a stock delists, we assume: - Final return is the loss to delisting price - Capital is then reallocated to remaining stocks """ portfolio_returns = [] for date in returns_df.index: # Get returns for this date (only non-null values) available = returns_df.loc[date, columns].dropna() if len(available) > 0: if equal_weight: portfolio_returns.append(available.mean()) else: portfolio_returns.append(available.sum() / len(columns)) else: portfolio_returns.append(np.nan) return pd.Series(portfolio_returns, index=returns_df.index) # Calculate returns for different portfolios survivors_only = calculate_portfolio_return(returns, survivors) all_banks = calculate_portfolio_return(returns, survivors + failed) # Cumulative wealth (starting with £100) wealth_survivors = 100 * (1 + survivors_only).cumprod() wealth_all = 100 * (1 + all_banks).cumprod() print("\nSurvivorship Bias Analysis") print("=" * 50) print(f"\nCumulative Return (Jan 2005 - Dec 2010):") print(f" Survivors only: {(wealth_survivors.iloc[-1]/100 - 1)*100:.1f}%") print(f" Including failed: {(wealth_all.iloc[-1]/100 - 1)*100:.1f}%") print(f"\nSurvivorship Bias: {(wealth_survivors.iloc[-1] - wealth_all.iloc[-1]):.1f} percentage points") print(f"\nAnnualised Return:") years = 6 print(f" Survivors only: {((wealth_survivors.iloc[-1]/100)**(1/years) - 1)*100:.2f}% p.a.") print(f" Including failed: {((wealth_all.iloc[-1]/100)**(1/years) - 1)*100:.2f}% p.a.") ``` ### Visualise the Difference ```{python} #| label: visualise-bias #| eval: false fig, axes = plt.subplots(2, 2, figsize=(14, 10)) # 1. Individual stock prices (normalised) ax1 = axes[0, 0] normalised = prices.div(prices.iloc[0]) * 100 for col in survivors: normalised[col].plot(ax=ax1, label=col, alpha=0.7) for col in failed: normalised[col].dropna().plot(ax=ax1, label=f'{col} (failed)', linestyle='--', alpha=0.7) ax1.axhline(y=100, color='gray', linestyle=':', alpha=0.5) ax1.set_ylabel('Price (Jan 2005 = 100)') ax1.set_title('UK Bank Stock Prices 2005-2010') ax1.legend(loc='upper left', fontsize=8) ax1.grid(alpha=0.3) # 2. Cumulative portfolio wealth ax2 = axes[0, 1] wealth_survivors.plot(ax=ax2, label='Survivors Only', color='blue', lw=2) wealth_all.plot(ax=ax2, label='Including Failed Banks', color='red', lw=2) ax2.axhline(y=100, color='gray', linestyle=':', alpha=0.5) ax2.fill_between(wealth_survivors.index, wealth_survivors, wealth_all, alpha=0.3, color='coral', label='Survivorship Bias') ax2.set_ylabel('Portfolio Value (£100 initial)') ax2.set_title('The Cost of Survivorship Bias') ax2.legend() ax2.grid(alpha=0.3) # 3. Rolling 12-month return difference ax3 = axes[1, 0] rolling_diff = (survivors_only.rolling(12).mean() - all_banks.rolling(12).mean()) * 100 rolling_diff.plot(ax=ax3, color='coral', lw=2) ax3.axhline(y=0, color='gray', linestyle='-', lw=0.5) ax3.fill_between(rolling_diff.index, 0, rolling_diff, alpha=0.3, color='coral') ax3.set_ylabel('Monthly Return Difference (%)') ax3.set_title('Rolling 12-Month Survivorship Bias') ax3.grid(alpha=0.3) # 4. Return distributions ax4 = axes[1, 1] ax4.hist(survivors_only.dropna() * 100, bins=30, alpha=0.6, label='Survivors Only', color='blue') ax4.hist(all_banks.dropna() * 100, bins=30, alpha=0.6, label='Including Failed', color='red') ax4.axvline(x=survivors_only.mean()*100, color='blue', linestyle='--', lw=2) ax4.axvline(x=all_banks.mean()*100, color='red', linestyle='--', lw=2) ax4.set_xlabel('Monthly Return (%)') ax4.set_ylabel('Frequency') ax4.set_title('Return Distribution Comparison') ax4.legend() ax4.grid(alpha=0.3) plt.tight_layout() plt.savefig('survivorship_bias_uk_banks.png', dpi=150, bbox_inches='tight') plt.show() print("\nFigure saved as 'survivorship_bias_uk_banks.png'") ``` --- ## Discussion Questions 1. **Magnitude of Bias**: How does the survivorship bias you calculated compare to the ~1% often cited in academic studies? Why might crisis periods show larger bias? 2. **Practical Implications**: If you were backtesting a UK banking sector strategy, how would ignoring failed banks affect your: - Estimated Sharpe ratio? - Value-at-Risk calculations? - Confidence in the strategy? 3. **Data Challenges**: What practical difficulties did you encounter when trying to get data for delisted securities? How might this affect academic research? 4. **Beyond Banks**: Where else might survivorship bias be particularly severe? Consider: - Emerging market equities - Cryptocurrency - Hedge funds - Small-cap stocks --- ## Deliverables Submit a short report (max 2 pages) containing: 1. **Data Summary**: Table showing the securities analysed, date ranges, and data sources 2. **Key Calculation**: Your estimate of survivorship bias in UK banking returns 2005-2010 (annualised percentage) 3. **Visualisation**: One chart showing the cumulative wealth paths with and without failed banks 4. **Reflection**: One paragraph on implications for your own future analysis work --- ## Extension: Automating Survivorship-Free Databases For students interested in production systems: ```{python} #| label: extension #| eval: false class SurvivorshipFreeDatabase: """ Framework for maintaining survivorship-free financial databases. Key principles: 1. Capture all securities at point-in-time 2. Record delisting events with final prices 3. Maintain audit trail of universe changes """ def __init__(self): self.securities = {} # ticker -> metadata self.prices = pd.DataFrame() self.events = [] # (date, ticker, event_type, details) def add_security(self, ticker, metadata): """Add a new security to track.""" self.securities[ticker] = metadata self.events.append((pd.Timestamp.now(), ticker, 'ADDED', metadata)) def record_delisting(self, ticker, date, reason, final_price): """Record a delisting event with proper handling.""" self.events.append((date, ticker, 'DELISTED', { 'reason': reason, 'final_price': final_price })) # Mark security as inactive but retain all historical data self.securities[ticker]['status'] = 'DELISTED' self.securities[ticker]['delisting_date'] = date def get_universe_at_date(self, date): """Return the set of active securities at a specific date.""" active = [] for ticker, meta in self.securities.items(): if meta.get('status') != 'DELISTED': active.append(ticker) elif meta.get('delisting_date', pd.Timestamp.max) > date: active.append(ticker) return active def calculate_returns(self, start_date, end_date, survivorship_free=True): """ Calculate portfolio returns with proper handling of delistings. If survivorship_free=True, uses point-in-time universe. If False, uses only securities that survived to end_date. """ pass # Implementation left as exercise # This framework ensures you never accidentally introduce survivorship bias ``` --- ## References - Elton, E.J., Gruber, M.J., & Blake, C.R. (1996). Survivor Bias and Mutual Fund Performance. *Review of Financial Studies*, 9(4), 1097-1120. - Brown, S.J., Goetzmann, W., Ibbotson, R.G., & Ross, S.A. (1992). Survivorship Bias in Performance Studies. *Review of Financial Studies*, 5(4), 553-580. - Rohleder, M., Scholz, H., & Wilkens, M. (2011). Survivorship Bias and Mutual Fund Performance: Relevance, Significance, and Methodical Differences. *Review of Finance*, 15(2), 441-474.