W

WquGuru·QuantLearn

ArbitrageIntermediate

Pair Trading

Market-neutral statistical arbitrage

Historical Backtest Results

Backtesting results using NVIDIA (NVDA) and AMD pair from 2013-2014, demonstrating the strategy in action.

Pair Trading Positions

Pair Trading Positions

Trading positions for NVDA vs AMD pair showing entry/exit points based on Z-score signals.

Portfolio Performance

Portfolio Performance

Total portfolio performance showing Z-score oscillations and corresponding asset value changes.

Engle-Granger Test Results

Engle-Granger Test Results

Statistical output from Engle-Granger two-step cointegration test showing significant results.

Statistical Arbitrage Theory

Pair trading is based on the concept of cointegration - when two assets move together in the long term despite short-term divergences.

The strategy works like "a drunk man with a dog" - the invisible leash (statistical relationship) keeps both assets in check.

When one asset becomes relatively overvalued compared to its pair, we short the overvalued asset and long the undervalued one.

The strategy profits when the pair converges back to their historical relationship.

This approach was pioneered by quantitative analysts at Morgan Stanley in the 1980s and remains a cornerstone of statistical arbitrage.

Mathematical Foundation

1
Cointegration Relationship

Yt=α+βXt+εtY_t = α + β X_t + ε_t

Long-run equilibrium relationship between two assets, where εₜ is the residual that should be stationary.

2
Engle-Granger Step 1

β^=Σ(XtXˉ)(YtYˉ)/Σ(XtXˉ)2β̂ = Σ(X_t - X̄)(Y_t - \bar{Y}) / Σ(X_t - X̄)²

OLS estimation of the cointegrating coefficient β using historical price data.

3
Augmented Dickey-Fuller Test

Δεt=γεt1+ΣφiΔεti+utΔε_t = γε_{t-1} + Σφ_i Δε_{t-i} + u_t

Test for stationarity of residuals. H₀: γ = 0 (unit root), reject if p-value < 0.05.

4
Error Correction Model (Step 2)

ΔYt=αy+δyεt1+ΣγyiΔYti+ΣλyiΔXti+vytΔY_t = α_y + δ_y ε_{t-1} + Σγ_{yi} ΔY_{t-i} + Σλ_{yi} ΔX_{t-i} + v_{yt}

Adjustment coefficient δy must be negative, indicating mean reversion to equilibrium.

5
Z-Score Calculation

Zt=(εtμε)/σεZ_t = (ε_t - μ_ε) / σ_ε

Normalized residual used for signal generation. Trading signals triggered at ±1σ or ±2σ thresholds.

Core Algorithm Implementation

The pair trading algorithm consists of three main components: cointegration testing, signal generation, and position management.

Engle-Granger Two-Step Method

python
def EG_method(X, Y, show_summary=False):
    # Step 1: Estimate long-run equilibrium
    model1 = sm.OLS(Y, sm.add_constant(X)).fit()
    epsilon = model1.resid
    
    # Check stationarity with ADF test
    if sm.tsa.stattools.adfuller(epsilon)[1] > 0.05:
        return False, model1
    
    # Step 2: Error correction model
    X_dif = sm.add_constant(pd.concat([X.diff(), epsilon.shift(1)], axis=1).dropna())
    Y_dif = Y.diff().dropna()
    model2 = sm.OLS(Y_dif, X_dif).fit()
    
    # Adjustment coefficient must be negative
    if list(model2.params)[-1] > 0:
        return False, model1
    else:
        return True, model1

Tests for cointegration using Engle-Granger methodology. Returns True if assets are cointegrated with proper error correction.

Signal Generation Logic

python
def signal_generation(asset1, asset2, method, bandwidth=250):
    signals = pd.DataFrame()
    signals['asset1'] = asset1['Close']
    signals['asset2'] = asset2['Close']
    signals['signals1'] = 0
    signals['signals2'] = 0
    
    for i in range(bandwidth, len(signals)):
        # Test cointegration on rolling window
        coint_status, model = method(signals['asset1'].iloc[i-bandwidth:i],
                                   signals['asset2'].iloc[i-bandwidth:i])
        
        if coint_status:
            # Calculate normalized residuals (Z-score)
            fitted = model.predict(sm.add_constant(signals['asset1'].iloc[i:]))
            residual = signals['asset2'].iloc[i:] - fitted
            z_score = (residual - np.mean(model.resid)) / np.std(model.resid)
            
            # Generate signals based on Z-score thresholds
            if z_score > 1:  # Upper threshold
                signals.at[signals.index[i], 'signals1'] = 1
            elif z_score < -1:  # Lower threshold
                signals.at[signals.index[i], 'signals1'] = -1
    
    return signals

Generates trading signals based on Z-score deviations from mean. Signal1=1 means long asset1/short asset2, and vice versa.

Portfolio Management

python
def portfolio(data):
    capital0 = 20000
    positions1 = capital0 // max(data['asset1'])
    positions2 = capital0 // max(data['asset2'])
    
    portfolio = pd.DataFrame()
    portfolio['holdings1'] = data['cumsum1'] * data['asset1'] * positions1
    portfolio['cash1'] = capital0 - (data['positions1'] * data['asset1'] * positions1).cumsum()
    portfolio['total_asset1'] = portfolio['holdings1'] + portfolio['cash1']
    
    # Repeat for asset2 with opposite positions
    portfolio['holdings2'] = data['cumsum2'] * data['asset2'] * positions2
    portfolio['cash2'] = capital0 - (data['positions2'] * data['asset2'] * positions2).cumsum()
    portfolio['total_asset2'] = portfolio['holdings2'] + portfolio['cash2']
    
    portfolio['total_asset'] = portfolio['total_asset1'] + portfolio['total_asset2']
    return portfolio

Manages portfolio allocation and tracks performance of both assets separately before combining into total portfolio value.

Implementation Steps

  1. 1Identify two potentially cointegrated assets (same industry, stock vs ETF)
  2. 2Apply Engle-Granger two-step method to test for cointegration
  3. 3Calculate the spread between the two assets using OLS regression
  4. 4Normalize residuals to create Z-score for signal generation
  5. 5Set trigger conditions based on ±1σ or ±2σ from mean spread
  6. 6Execute trades when Z-score exceeds threshold levels
  7. 7Monitor cointegration status on rolling windows (typically 250 days)
  8. 8Clear positions immediately when cointegration relationship breaks

Key Metrics

Cointegration test p-value < 0.05 (reject null hypothesis of no cointegration)
Adjustment coefficient δ < 0 (negative for mean reversion)
Half-life of mean reversion (how quickly spreads converge)
Maximum drawdown during divergence periods
Sharpe ratio of market-neutral returns
Hit rate (percentage of profitable trades)
Average holding period per position

Risk Considerations

Cointegration relationships can break due to fundamental changes (regime shifts)
Market conditions are dynamic - historical relationships may not persist
Company-specific events (new products, mergers) can permanently alter relationships
Example: NVIDIA vs AMD diverged after AI/crypto boom despite historical correlation
Model assumes Gaussian distribution of residuals, which may not hold during market stress
Transaction costs and slippage can erode profits from frequent trading
Leverage amplifies both profits and losses in divergent markets

Practice Implementation

Prerequisites

Mathematical Background

  • • Linear regression and OLS estimation
  • • Time series analysis (stationarity, unit roots)
  • • Hypothesis testing and p-values
  • • Basic econometrics (error correction models)

Technical Skills

  • • Python programming (pandas, numpy)
  • • Statistical libraries (statsmodels)
  • • Data visualization (matplotlib)
  • • Financial data handling (yfinance)

Complete Implementation

Access the full Python implementation from the original quantitative trading repository:

bash
# Complete pair trading implementation
git clone https://github.com/je-suis-tm/quant-trading.git
cd quant-trading
python "Pair trading backtest.py"
# Modify tickers and parameters for your own analysis

Learning Checkpoints

1

Understand Cointegration

Can you explain why two assets might be cointegrated and what breaks this relationship?

2

Interpret Statistical Tests

Practice reading ADF test results and understanding when to accept/reject cointegration.

3

Signal Generation

Implement Z-score calculations and understand threshold selection (±1σ vs ±2σ).

4

Risk Management

Understand position sizing, monitoring regime changes, and exit strategies.

Recommended Learning Path

Immediate Actions

  • Download and run the Python script
  • Test with different asset pairs
  • Experiment with threshold parameters

Advanced Studies

  • Learn Johansen cointegration test
  • Study Vector Error Correction Models
  • Explore multiple asset pair trading

Important Disclaimer

This strategy involves significant risk. Historical cointegration relationships can break permanently. Always use proper risk management, position sizing, and never risk more than you can afford to lose. Paper trade extensively before using real capital.

Quick Navigation