Back to Blog
backtestingpythonapipolymarketquant

How to Backtest Polymarket Strategies with 1-Minute Data in Python

Build an execution-aware Polymarket backtest in Python: align minute bars, apply liquidity filters, and simulate realistic fills from historical L2 depth.

PolymarketData Team
Code and analytics dashboards used for strategy backtesting
Image credit: 1981 Digital

The gap between a backtest that "looks profitable" and one that actually tells you something useful usually comes down to a single question: did you model execution, or did you assume midpoint fills? Midpoint-fill backtests overstate edge on Polymarket, sometimes dramatically — especially in smaller markets or around event catalysts where liquidity thins and spreads widen. A strategy showing +80 bps expected edge can look like +20 bps after realistic fill costs. A +30 bps strategy can go negative.

This walkthrough builds a Python backtest from the ground up with execution realism built in from step one — not bolted on afterward.

The three-layer stack

Think of your research pipeline as three distinct concerns: the signal layer (historical prices, derived features, directional logic), the tradeability layer (is this market actually liquid enough to trade at the moment the signal fires?), and the execution layer (what does it cost to enter and exit at your size?). Most notebooks collapse all three into the signal layer. That's the problem.

These three layers map directly to three API endpoints: /prices for the signal layer, /metrics for tradeability, and /books for execution.

Step 1: Build a reproducible market universe

This step sounds boring but it's where most backtests silently break. If your universe changes between runs, your results aren't comparable.

import os
import requests
import pandas as pd

API_KEY = os.environ["POLYMARKETDATA_API_KEY"]
BASE = "https://api.polymarketdata.co/v1"
HEADERS = {"X-API-Key": API_KEY}

# Fetch markets and snapshot the universe at a fixed point
r = requests.get(
    f"{BASE}/markets",
    headers=HEADERS,
    params={"search": "election", "limit": 200},
    timeout=30,
)
r.raise_for_status()
markets = pd.DataFrame(r.json()["data"])

universe = markets[markets["status"].isin(["open", "resolved"])].copy()

# Save this — your run metadata needs to know exactly which markets you used
universe_snapshot = {
    "built_at": pd.Timestamp.utcnow().isoformat(),
    "slugs": universe["slug"].tolist(),
    "count": len(universe),
}
print(f"Universe: {universe_snapshot['count']} markets as of {universe_snapshot['built_at']}")

Pin the slug list. Run it again tomorrow and it'll be different because markets resolved. That's fine — but you need to know which version of the universe your results correspond to.

Step 2: Pull aligned prices and metrics

Timestamp misalignment is the most common source of false positives. Two rows that should be the same bar aren't, because one endpoint uses the bar open and the other uses the bar close. Inner-join and check the row count.

slug = universe.iloc[0]["slug"]
window = {
    "start_ts": "2025-01-01T00:00:00Z",
    "end_ts": "2025-06-30T23:59:00Z",
    "resolution": "1m",
}

prices_r = requests.get(f"{BASE}/markets/{slug}/prices", headers=HEADERS, params=window, timeout=30)
prices_r.raise_for_status()
prices_df = pd.DataFrame(prices_r.json()["data"])

metrics_r = requests.get(f"{BASE}/markets/{slug}/metrics", headers=HEADERS, params=window, timeout=30)
metrics_r.raise_for_status()
metrics_df = pd.DataFrame(metrics_r.json()["data"])

for df in [prices_df, metrics_df]:
    df["t"] = pd.to_datetime(df["t"], utc=True)

merged = (
    prices_df.rename(columns={"p": "price"})
    .merge(metrics_df, on="t", how="inner")
    .sort_values("t")
    .set_index("t")
)

print(f"Prices rows: {len(prices_df)}, Metrics rows: {len(metrics_df)}, Joined: {len(merged)}")
# If joined << min(prices, metrics), something is off — investigate before continuing

If the join drops more than 5% of rows, don't proceed. Figure out the gap pattern first. Is it a specific time window? Weekends? A market that went inactive? Know the answer before it contaminates your signal.

Step 3: A signal you can explain in one sentence

Complex signals are hard to debug. Start with something simple enough that you can state its logic in one sentence and confirm by inspection that it's doing what you think.

# Signal: buy when 20-bar return is positive and spread is below threshold
merged["ret_20m"] = merged["price"].pct_change(20)
merged["signal"] = 0

MAX_SPREAD = 0.02  # 2 cents on a binary market — adjust per market type
merged.loc[
    (merged["ret_20m"] > 0) & (merged["spread"] < MAX_SPREAD),
    "signal"
] = 1

# Trade events = signal transitions
trade_events = merged[merged["signal"].diff().fillna(0) != 0].copy()
print(f"Trade events: {len(trade_events)}")
print(f"Blocked by spread filter: {(merged['spread'] >= MAX_SPREAD).sum()} bars")

That last print is useful. If the spread filter is blocking 40% of your bars, your strategy is implicitly targeting a liquidity condition that doesn't hold most of the time. You need to know that.

Step 4: Fill simulation from L2 depth

This is the step that separates a backtest from a story. For each trade event, find the nearest historical book snapshot and walk the levels to estimate your actual fill price.

# Fetch 5-minute book snapshots for the same window
books_params = {**window, "resolution": "5m"}
books_r = requests.get(f"{BASE}/markets/{slug}/books", headers=HEADERS, params=books_params, timeout=30)
books_r.raise_for_status()
books_raw = books_r.json()["data"]

# Index books by timestamp for nearest-lookup
books_df = pd.DataFrame([
    {"t": pd.Timestamp(b["t"], tz="UTC"), "asks": b["asks"], "bids": b["bids"]}
    for b in books_raw
]).set_index("t").sort_index()


def weighted_fill(levels, target_size):
    """levels: [[price, size], ...] sorted best-to-worst"""
    remaining = float(target_size)
    filled = notional = 0.0
    for price, size in levels:
        take = min(remaining, float(size))
        notional += take * float(price)
        filled += take
        remaining -= take
        if remaining <= 0:
            break
    if filled == 0:
        return None, 0.0, float(target_size)
    avg_fill = notional / filled
    unfilled = max(0.0, float(target_size) - filled)
    return avg_fill, filled, unfilled


def nearest_book(ts, books_df, side, max_lag_minutes=10):
    """Find the book snapshot closest to ts, within max_lag_minutes."""
    try:
        idx = books_df.index.get_indexer([ts], method="nearest")[0]
        snap = books_df.iloc[idx]
        lag = abs((books_df.index[idx] - ts).total_seconds() / 60)
        if lag > max_lag_minutes:
            return None
        return snap["asks"] if side == "buy" else snap["bids"]
    except Exception:
        return None


# Simulate fills for each trade event
TARGET_SIZE = 1000  # contracts — adjust to your actual trade size
results = []

for ts, row in trade_events.iterrows():
    side = "buy" if row["signal"] == 1 else "sell"
    levels = nearest_book(ts, books_df, side)

    if levels is None:
        continue

    avg_fill, filled, unfilled = weighted_fill(levels, TARGET_SIZE)

    if avg_fill is None:
        continue

    ref_price = row["price"]
    slippage_bps = (
        (avg_fill - ref_price) / ref_price * 10_000 if side == "buy"
        else (ref_price - avg_fill) / ref_price * 10_000
    )

    results.append({
        "t": ts,
        "side": side,
        "ref_price": ref_price,
        "avg_fill": avg_fill,
        "filled": filled,
        "unfilled": unfilled,
        "fill_ratio": filled / TARGET_SIZE,
        "slippage_bps": slippage_bps,
    })

fills_df = pd.DataFrame(results)
print(fills_df[["slippage_bps", "fill_ratio"]].describe())

Look at the P90 slippage and the fill ratio distribution. If median slippage is 25 bps and P90 is 90 bps, your strategy's expected edge needs to clear that bar consistently, not just on average.

What to log on every run

If you can't reproduce a result, it didn't happen. Save at minimum:

  • run_id, strategy_version, universe_slug_list, universe_built_at
  • Per-trade: signal_ts, book_ts, requested_size, filled_size, avg_fill, slippage_bps
  • Aggregate: gross_pnl, execution_cost_total, net_pnl, blocked_trade_count

That last field — blocked trades — is easy to skip and important to track. If your risk gate is rejecting 30% of signals, your realized Sharpe will be different from your paper Sharpe. You need to know by how much.

The number that matters

Run your strategy twice: once with midpoint fills, once with L2 fill simulation. Compare net PnL. The gap between those two numbers is the cost of sloppy assumptions. If the L2-aware version is still profitable after execution costs, you have something worth developing further. If it isn't, you found out cheaply — before risking real capital.


All data from the polymarketdata.co API. Endpoint reference at polymarketdata.co/docs.