PolymarketData | Polymarket Sports Trading Strategies: Model Slippage With Historical L2 Data

Polymarket NFL and NBA game markets are liquid most of the week, and then they aren't. In the 15–30 minutes before kickoff or tip-off, spreads routinely widen 30–50% from their pre-game baseline as liquidity providers pull depth and directional flow picks up. If your backtest treats a 3-cent spread as constant throughout the trading window, you're measuring a market that doesn't exist in the minutes that matter most.

The same pattern — tightening spreads during calm, rapid widening around catalysts — shows up across sports categories. It's not noise; it's a structural feature of event-driven markets where information asymmetry concentrates near event time. Modeling it isn't optional if you want your results to transfer to live trading.

The two-layer framing that matters

Every sports prediction market strategy has two separate problems to solve, and they're easy to conflate. The first is the selection problem: was your directional view on the game outcome any good? The second is the execution problem: could you actually enter and exit at a cost that preserved that edge? A model can be genuinely informative about game outcomes and still fail as a trading strategy if execution costs in the pre-game window eat the entire edge.

Most sports market backtests measure only the selection layer. The execution layer is where the real PnL gets determined.

What historical L2 data shows you that price bars don't

Top-of-book quotes tell you where the market was pricing the outcome. L2 depth tells you how much was actually available at that price — and at what cost your target size could have been filled. The difference matters more as you approach game time.

Here's a pattern you'll see repeatedly in sports markets: a market sitting at 2,500 contracts on the best ask level during morning trading shrinks to 800 contracts two hours before kickoff, even if the quoted price barely moves. Slippage on a 5,000-contract buy goes from 25 bps to 80 bps, not because spreads widened dramatically but because visible depth at each level thinned out. A pure-price backtest misses this entirely.

Pulling sports market data and L2 snapshots

import os
import requests
import pandas as pd

API_KEY = os.environ["POLYMARKETDATA_API_KEY"]
BASE = "https://api.polymarketdata.co/v1"
HEADERS = {"X-API-Key": API_KEY}

# Discover NFL game markets
r = requests.get(
    f"{BASE}/markets",
    headers=HEADERS,
    params={"search": "NFL", "limit": 100},
    timeout=30,
)
r.raise_for_status()
nfl_markets = pd.DataFrame(r.json()["data"])

slug = nfl_markets.iloc[0]["slug"]
print(f"Analyzing: {slug}")

# Pull 5-minute metrics to examine spread behavior around game time
metrics_r = requests.get(
    f"{BASE}/markets/{slug}/metrics",
    headers=HEADERS,
    params={
        "start_ts": "2025-01-05T12:00:00Z",  # morning of game day
        "end_ts": "2025-01-05T23:59:00Z",
        "resolution": "5m",
    },
    timeout=30,
)
metrics_r.raise_for_status()
metrics = pd.DataFrame(metrics_r.json()["data"])
metrics["t"] = pd.to_datetime(metrics["t"], utc=True)

# Examine spread behavior across the day
print(metrics[["t", "spread", "volume", "liquidity"]].describe())

Once you have this data, compare spread levels in the 2 hours before game time versus the 6-hour window before that. The difference is usually obvious in the summary statistics.

Building an execution-aware entry signal

The key insight for sports markets is that your signal and your execution window often don't align. You might form a view 4 hours before kickoff based on injury news, but by the time you're ready to size in, the book has thinned and your cost has tripled. Build this distinction into your backtest explicitly.

# Pull 5-minute L2 snapshots
books_r = requests.get(
    f"{BASE}/markets/{slug}/books",
    headers=HEADERS,
    params={
        "start_ts": "2025-01-05T12:00:00Z",
        "end_ts": "2025-01-05T23:59:00Z",
        "resolution": "5m",
    },
    timeout=30,
)
books_r.raise_for_status()
books_raw = books_r.json()["data"]

# Build a time-indexed book store
books_index = {b["t"]: b for b in books_raw}
books_ts = sorted(books_index.keys())

import bisect

def get_nearest_book(signal_ts_str, max_lag_minutes=10):
    """Return the nearest book snapshot within max_lag_minutes of signal_ts."""
    idx = bisect.bisect_left(books_ts, signal_ts_str)
    candidates = books_ts[max(0, idx-1):idx+2]
    if not candidates:
        return None
    nearest = min(candidates, key=lambda t: abs(
        pd.Timestamp(t) - pd.Timestamp(signal_ts_str)
    ))
    lag_min = abs(
        (pd.Timestamp(nearest) - pd.Timestamp(signal_ts_str)).total_seconds() / 60
    )
    if lag_min > max_lag_minutes:
        return None
    return books_index[nearest]


def weighted_fill(levels, target_size):
    remaining = float(target_size)
    filled = notional = 0.0
    for price, size in levels:
        take = min(remaining, float(size))
        notional += take * float(price)
        filled += take
        remaining -= take
        if remaining <= 0:
            break
    if filled == 0:
        return None, 0.0, float(target_size)
    return notional / filled, filled, max(0.0, float(target_size) - filled)

Regime-aware sizing policy

Once you can see slippage by time-of-day and spread level, the execution policy becomes obvious: don't apply uniform sizing across all conditions. A strategy that performs well using a fixed 2,000-contract size at calm spreads will likely show dramatically different results after applying regime-aware sizing.

def get_execution_params(snap, spread_threshold=0.04, depth_threshold=1500):
    """
    Returns (should_trade, recommended_size_fraction) based on current book conditions.
    Reduce size when spread is wide or top-of-book depth is thin.
    """
    if snap is None:
        return False, 0.0

    asks = snap.get("asks", [])
    if not asks:
        return False, 0.0

    best_ask = float(asks[0][0])
    best_bid = float(snap.get("bids", [[0, 0]])[0][0])
    spread = best_ask - best_bid
    top_ask_depth = float(asks[0][1])

    if spread > spread_threshold:
        return False, 0.0   # spread too wide — skip entirely

    if top_ask_depth < depth_threshold:
        return True, 0.5    # thin depth — half size

    return True, 1.0        # normal conditions — full size


# Apply to a set of signal timestamps
BASE_SIZE = 2000
results = []

for signal_ts in metrics["t"].dt.strftime("%Y-%m-%dT%H:%M:%SZ"):
    snap = get_nearest_book(signal_ts)
    should_trade, size_fraction = get_execution_params(snap)

    if not should_trade:
        results.append({"t": signal_ts, "status": "blocked", "size": 0})
        continue

    target_size = int(BASE_SIZE * size_fraction)
    asks = snap["asks"]
    avg_fill, filled, unfilled = weighted_fill(asks, target_size)

    if avg_fill is None:
        continue

    ref_price = float(asks[0][0])
    slippage = (avg_fill - ref_price) / ref_price * 10_000

    results.append({
        "t": signal_ts,
        "status": "executed",
        "size": target_size,
        "avg_fill": avg_fill,
        "slippage_bps": slippage,
        "fill_ratio": filled / target_size if target_size > 0 else 0,
    })

results_df = pd.DataFrame(results)
blocked_pct = (results_df["status"] == "blocked").mean() * 100
print(f"Blocked trades: {blocked_pct:.1f}%")
print(results_df[results_df["status"] == "executed"]["slippage_bps"].describe())

The blocked_pct number matters. If 35% of your signals get blocked by the spread/depth filters, your live strategy will have significantly lower trade frequency than the raw signal suggests. That affects your PnL per unit time, your Sharpe, and how long it takes to build statistical confidence in the results.

KPIs that reveal whether a sports strategy is actually tradable

Win rate and gross return are necessary but not sufficient. Add these to every strategy evaluation: median slippage in bps, P90 slippage, fill ratio by size bucket (500 / 1,000 / 5,000 contracts), execution-adjusted edge per trade (gross edge minus slippage minus fees), and blocked-trade rate from the risk filter. If slippage and blocked-trade metrics are significantly worse in pre-game windows than during calmer periods, your strategy's behavior around game time is fundamentally different from its behavior the rest of the week — and that's probably the window where it's doing most of its work.

The single most common mistake in sports-market backtests: reporting gross edge without subtracting execution costs, and discovering the shortfall only in live trading.

All data from the polymarketdata.co API. Sports market coverage and endpoint details at polymarketdata.co/docs.