Getting Started with Polymarket Historical Data
Set up a clean first workflow with Polymarket historical data: fetch markets, pull minute data, and structure your first reproducible research notebook.

Most first Polymarket notebooks fail for the same three reasons. Timestamps get mixed between UTC and local time, market IDs drift between runs, and the backtest uses price bars alone — no execution model, no spread filter, no sense of whether the trade was actually fillable. You get a result that looks clean in the notebook and breaks immediately in the real market.
This guide lays out a setup that avoids all three. By the end you'll have a reproducible pipeline pulling prices, metrics, and an L2 book snapshot from a single market — the minimum structure for research that holds up.
Start with one market, three data families
Resist the urge to ingest 500 markets on day one. Start with one market and pull three things: metadata (so you can reproduce exactly which market you studied), minute price bars (for signal prototyping), and an L2 book snapshot (for a first pass at execution realism). Everything else can come later, once you know the plumbing works.
import os
import requests
import pandas as pd
API_KEY = os.environ["POLYMARKETDATA_API_KEY"]
BASE = "https://api.polymarketdata.co/v1"
HEADERS = {"X-API-Key": API_KEY}
# 1. Find a market
r = requests.get(
f"{BASE}/markets",
headers=HEADERS,
params={"search": "bitcoin", "limit": 50},
timeout=30,
)
r.raise_for_status()
markets = pd.DataFrame(r.json()["data"])
slug = markets.iloc[0]["slug"]
print(f"Working with: {slug}")
# 2. Pull one week of minute prices
prices_r = requests.get(
f"{BASE}/markets/{slug}/prices",
headers=HEADERS,
params={
"start_ts": "2025-01-01T00:00:00Z",
"end_ts": "2025-01-08T00:00:00Z",
"resolution": "1m",
},
timeout=30,
)
prices_r.raise_for_status()
prices = pd.DataFrame(prices_r.json()["data"])
prices["t"] = pd.to_datetime(prices["t"], utc=True)
prices = prices.sort_values("t").reset_index(drop=True)
print(f"Rows: {len(prices)}, range: {prices['t'].min()} → {prices['t'].max()}")
Before doing anything else, print the row count and timestamp range. If you're getting far fewer rows than expected, you've likely hit a market that was inactive during that window — pick a different one and re-run. This check takes 10 seconds and prevents two hours of confused debugging later.
Pull metrics alongside prices — and join them carefully
Price bars alone tell you where the market was. Metrics tell you whether it was tradable. Volume, spread, and liquidity snapshots sit in a separate endpoint; you need to join them before you can reason about execution quality.
# 3. Pull metrics for the same window
metrics_r = requests.get(
f"{BASE}/markets/{slug}/metrics",
headers=HEADERS,
params={
"start_ts": "2025-01-01T00:00:00Z",
"end_ts": "2025-01-08T00:00:00Z",
"resolution": "1m",
},
timeout=30,
)
metrics_r.raise_for_status()
metrics = pd.DataFrame(metrics_r.json()["data"])
metrics["t"] = pd.to_datetime(metrics["t"], utc=True)
# Inner join: forces you to see gaps immediately
df = (
prices.rename(columns={"p": "price"})
.merge(metrics, on="t", how="inner")
.sort_values("t")
.set_index("t")
)
print(f"Joined rows: {len(df)}")
print(df[["price", "spread", "volume"]].describe())
Use an inner join while you're building. Outer joins hide gaps. If the join drops a meaningful fraction of rows, that's important information — it means prices and metrics have different coverage patterns for this market, and you need to understand why before you model anything.
Add an L2 snapshot before your first strategy pass
Adding even a single order book pull changes how you think about the data. Once you see that a "0.530 best ask" actually has only 400 contracts at that level before the price walks, midpoint-fill assumptions start to feel dishonest.
# 4. Pull 5-minute L2 snapshots for the same window
books_r = requests.get(
f"{BASE}/markets/{slug}/books",
headers=HEADERS,
params={
"start_ts": "2025-01-01T00:00:00Z",
"end_ts": "2025-01-08T00:00:00Z",
"resolution": "5m",
},
timeout=30,
)
books_r.raise_for_status()
books = books_r.json()["data"]
# Inspect the first snapshot
snap = books[0]
print(f"Snapshot at: {snap['t']}")
print(f"Top 3 asks: {snap['asks'][:3]}")
print(f"Top 3 bids: {snap['bids'][:3]}")
# Each level: [price, size]
The response gives you bids and asks as [price, size] arrays, sorted best-to-worst. That's the raw material for fill simulation — you'll use it in every execution-aware backtest from here on.
The three mistakes worth avoiding early
Mixing timestamps. Keep everything in UTC throughout your pipeline. Convert to local time only for display. One stray tz_localize(None) or a join on naive vs. aware datetimes will corrupt your results silently — you won't know until your signals are off by hours.
Letting your market universe drift. If you run the same discovery query twice and get different results (because a market resolved or new ones were added), your backtests aren't comparable. Save the exact list of slugs and the timestamp when you built it as part of your run metadata.
Backtesting with price bars only. Even a simple mean-reversion strategy needs a spread filter. Without it, you're modeling entries at prices that were never actually tradable at your size. Pull the metrics, add df.loc[df["spread"] > 0.02, "signal"] = 0, and see how many of your "best" setups disappear. That's not a bug — that's reality.
A useful first milestone
Before you think about model logic, set this as your target: one market, one week of aligned price + metric data, one L2 snapshot examined by hand. Run it twice from scratch and confirm you get identical row counts and identical summary statistics. If you can do that, you have a reliable base. Everything after that is iteration on top of a stable foundation.
All data from the polymarketdata.co API. Full endpoint reference at polymarketdata.co/docs.