I Built a Sports Prediction Bot That Bets Against the Market — Here's the Architecture

The sportsbooks have solved the prediction problem. Their models have forty years of data, proprietary injury feeds, real-time line movement, and entire teams of quantitative analysts. Beating them is a different problem category than the one most people think they're solving.

Polymarket is different. It's a prediction market — decentralized, slow-moving, and priced by retail participants who follow news cycles instead of probability curves. The lines move on sentiment, not on actuarial math. That gap is the edge.

I built Shiva — a sports prediction engine that ingests team statistics, injury reports, rest data, and pre-game media sentiment, then compares its estimated probability against Polymarket's market price. When the gap is large enough, it places a bet. Every decision is logged. Every outcome is tracked. The model learns.

The Architecture Problem Nobody Talks About

Most sports prediction projects fail at the data layer. They either depend on expensive paid APIs (ESPN Synergy, StatHead subscriptions, Basketball Reference bulk exports) or they scrape brittle HTML that breaks with every site redesign.

Shiva uses nothing that costs money. The ESPN Stats API is unofficial but stable — it's what ESPN's own apps use, which makes it unlikely to disappear. The MLB Stats API is official and fully documented. Perplexity's Sonar API handles pre-game media context for lineup confirmations and injury updates that don't appear in the structured data feeds until it's too late.

The Polymarket GAMMA API provides the market universe: 55+ live NBA game events, each with 30-49 sub-markets covering moneylines, point spreads, totals, and individual player props. Every scan cycle takes 30 seconds. The bot evaluates every active market against a price filter (>10%, <90%) that strips out near-resolved positions and focuses on genuinely uncertain outcomes.

SHIVA — SPORTS PREDICTION ENGINE

Scan → Evaluate → Execute → Learn · 30s cadence

DATA LAYER

external feeds · no API keys required

NBA Feed

ESPN · stats + injuries

MLB Feed

MLB Stats API · pitching + form

Media Analyzer

Perplexity · lineup news

Sports Scanner

GAMMA API · 55+ NBA events

↓

DECISION ENGINE

probability estimation · 6 weighted factors

Market Classifier

moneyline · spread
total · props

→

Sports Strategy

home court · rest · form
injuries · H2H · sentiment

→

Edge Calculator

Kelly criterion · EV
SIGNAL: STRONG / MODERATE

↓

EXECUTION LAYER

risk gates · CLOB orders · outcome claiming

Risk Gate

daily loss limit
position cap · cooldown

→

Polymarket CLOB

FOK market orders
Paper / Live mode

→

Claimer

resolve outcomes
track PnL

↓

LEARNING LAYER

adaptive weights · calibration · dashboard visibility

Signal Logger

SQLite · every evaluated
market gets recorded

→

Trainer

Brier score · log loss
Kelly factor calibration

→

Mission Control

Training tab · accuracy
weight drift · history

Free public APIs — no credentials required

6-factor probability model

DRY_RUN=true by default

Self-improving via resolved outcomes

Four layers. None of them coupled tightly enough to cascade-fail. Data feeds fail gracefully — if the NBA injury report times out, the model runs without it. The execution layer is permanently sandboxed in paper trade mode until explicitly disabled. The logging layer records everything independently of whether a bet was placed.

The Six-Factor Probability Model

Every market type gets a different estimation function, but they all draw from the same six factors.

Home court advantage — NBA home teams win approximately 57% of games historically. That's the prior. Every estimate starts with home court at +5% before any other data is applied.

Rest — Back-to-back games carry a -8% penalty. A well-rested team facing a back-to-back opponent gets a +6% bonus. These numbers come from the literature on NBA performance degradation under compressed schedules.

Recent form — Last five games, weighted [0.30, 0.25, 0.20, 0.15, 0.10], most recent first. This captures momentum without overweighting single-game variance. A team that went 4-1 over the last five has a different trajectory than one that went 1-4.

Injuries — Structured by severity: Out (-15%), Doubtful (-10%), Questionable (-7%), Day-to-Day (-5%), Probable (-2%). Total impact is capped at -30% to prevent a cascading injury list from pushing a team to near-zero probability. The cap is real — game situations exist where four players are questionable simultaneously.

Head-to-head — Reserved for future integration. The data is available; the implementation is pending enough resolved history to validate the weight before committing.

Media sentiment — Perplexity scans pre-game coverage for each matchup. The output is a sentiment score from -1.0 to 1.0, applied as a small modifier. It's the only soft factor in the model, deliberately weighted low until calibration data justifies increasing it.

◉SIGNAL

The model's prior is deliberately conservative. Edge in prediction markets comes from knowing when the market is wrong, not from having a better model than the books. A 4% edge at even money has positive expected value. That's the bar.

For spread markets, the model translates expected point differential into a cover probability using a logistic approximation of the normal CDF with a 12-point standard deviation — the empirical NBA game variance. For totals, the same method applies against the over/under line with a 15-point SD. Player props use stat-type-specific standard deviations (8 for points, 3.5 for rebounds, 2.5 for assists) against a 60/40 blend of season average and recent five-game average.

Paper Trading as a Data Collection Strategy

The model ships with DRY_RUN=true. That's not a safety hedge — it's a data strategy.

The first job of the paper trading loop is not to make money. It's to generate a labeled dataset: thousands of signals with known outcomes, factor values recorded at the time of prediction, and final market resolution stored in SQLite. That dataset is the raw material for calibration.

The training loop runs daily at 5 AM, after NBA game results have propagated through Polymarket's resolution pipeline. It reads resolved trades, computes Brier score (the mean squared difference between predicted probability and actual outcome), log loss, and accuracy by market type. Then it updates factor weights.

ADAPTIVE LEARNING LOOP

Paper trade → Resolve → Calibrate → Update · Daily at 5 AM ET

01

PAPER TRADE

Bot evaluates markets every 30s. Signals and simulated trades logged to SQLite. DRY_RUN=true.

↓

02

MARKET RESOLVES

Claimer polls Polymarket for resolved outcomes. Win/loss recorded against each signal's predicted probability.

↓

03

CALIBRATION RUN

Trainer computes Brier score and log loss per market type. Finds which factors correlated with correct calls.

↓

04

WEIGHT UPDATE

Factors that predicted wins get multiplier bump (up to 1.5×). Poor predictors drop (floor 0.5×). Saved to adaptive_weights.json.

↓

05

DEPLOY

Next scan cycle loads updated weights. Strategy applies them to probability estimates. Loop continues.

EXAMPLE WEIGHTS (after 200 samples)

home_court

1.3x

injury_impact

1.2x

form_weight

1.1x

media_sentiment

0.9x

b2b_penalty

1.0x

rest_bonus

0.7x

weights start at 1.0 · adjust ±10% per cycle · min 20 resolved samples required

BRIER SCORE

calibration quality

LOG LOSS

probability accuracy

ACCURACY

by market type

WEIGHT DRIFT

factor evolution

The weight system mirrors what I built for Foresight, the crypto prediction bot. Every factor carries a multiplier, initialized at 1.0 and bounded between 0.5 and 1.5. A factor that correlates with winning signals gets a 10% bump. A factor that correlates with losses gets a 10% reduction. The minimum sample size before any adjustment is 20 resolved trades — not enough data to calibrate is explicitly represented as a system state, not a silent assumption.

◈INSIGHT

The adaptive weights don't make the model smarter than its inputs. They make it honest about which inputs are actually informative for sports outcomes on Polymarket, as opposed to which inputs are merely available.

The entire training state surfaces in Mission Control — the dashboard that monitors all active bots in the ecosystem. The Training tab shows calibration curves, factor weight drift over time, accuracy by market type, and training run history. If home court advantage is systematically miscalibrated on Polymarket's NBA lines, that'll appear as a weight pushing toward 1.5× before any human analysis confirms it.

What This Is Actually Measuring

Sports prediction on Polymarket is not the same problem as sports prediction against a sportsbook line.

A sportsbook line is set by professional oddsmakers, sharpened by professional bettors, and adjusted in real time based on money flow from sophisticated markets. Beating it sustainably requires genuine information advantage — proprietary data, faster reaction to injury news, model structures the public doesn't have.

A Polymarket sports line is set by a much thinner market. The volumes on individual NBA game markets are substantial ($500K to $3M daily on high-profile matchups), but the price discovery process is slower and less efficient. Lines move on news cycles and Twitter sentiment more than on model recalibration. The vig is lower than a sportsbook — prediction markets take a smaller cut.

That's the structural edge. Not a better model than the quants. A better model than retail participants pricing NBA lines on a prediction market.

Active NBA Markets

55+

game events per scan · 30–49 sub-markets each

The paper trading loop generates the ground truth on whether that structural edge is real. Six months of logged signals with known outcomes will show whether the 6-factor model has genuine calibration or whether it's running on plausible priors that don't survive contact with actual market data. Both outcomes are useful. One tells you where to deploy capital. The other tells you which factors to rebuild.

Either way, the data exists. And the model learns.

The full architecture and trading logs surface in Mission Control, the command dashboard I built for the Invictus Labs ecosystem.

I Built a Sports Prediction Bot That Bets Against the Market — Here's the Architecture

The Architecture Problem Nobody Talks About

The Six-Factor Probability Model

Paper Trading as a Data Collection Strategy

What This Is Actually Measuring

Follow the Signal

Foresight v5.0: How I Rebuilt a Prediction Market Bot Around Candle Boundaries

Hermes: A Political Oracle That Bets on Polymarket Using AI News Intelligence

Leverage: Porting the Foresight Signal Stack to Crypto Perpetuals