Predictive Model Methodology

Full transparency on how PropsBot Golf Intelligence builds matchup win probabilities, scoring props, and confidence scores. Everything below is the actual pipeline — inputs, methods, parameters, and live backtest performance.

What we model

We generate probabilistic forecasts for every PGA Tour event. Some outputs are directly observed (odds, scores, lines); most are derived from a Monte Carlo simulation over player skill distributions.

Derived outputs

Observed inputs we surface

Inputs

Methods

(a) Monte Carlo matchup simulation

We run 10,000 simulated tournaments per event. Each player's per-round score is drawn from a Normal distribution centered on their blended SG projection and widened by per-player variance estimated from history. Scores are integer-rounded (golf is a discrete game) and all players in a group share a round shock term so matchup correlation is preserved on bad-weather days.

(b) Bayesian confidence score

Each projection blends a season prior (long-run SG baseline) with a recent-course-type likelihood (how the player has fared at similar venues). The posterior width determines the confidence score — tight posterior = high confidence, wide posterior = low confidence.

(c) Dead-heat EV (3-ball) & push EV (2-ball)

3-ball markets settle with dead-heat rules on ties, so our EV calculation weights a tied outcome at 1/N payout. 2-ball markets typically push on ties (stake returned) — we account for the tie probability explicitly from the sim instead of rolling it into win probability.

(d) Anomaly detection

For every market, we compute a z-score of the book's implied probability against the model's probability across the field. High-|z| entries surface as Edge Finder picks; low-|z| entries are labeled "book-aligned."

(e) Vig removal before EV

Every edge percentage on this site is computed against a de-vigged fair implied probability, not the raw book line. For outright winner markets we sum implied probabilities across the full field (the overround is typically 1.20-1.30 on a PGA event, i.e. 20-30% of hold), then normalize each player's implied probability by that overround. Without this correction every "+5% edge" we displayed would have been roughly break-even after vig. The current winner-market overround is published live in /golf-data.json under bookVigInfo.winner.overround.

(f) Learned course fit

Course fit is a sample-size-weighted blend of (1) a hand-curated trait prior — does the course reward power, accuracy, scrambling, putting? — and (2) a learned signal computed from each player's historical strokes-gained residual vs the field at the same course over the last 5 years. The blend uses Bayesian shrinkage with k=2: a player with n=2 prior appearances at this course gets 50% weight on the learned signal, n=5 gets 71%, players with no prior data fall back fully to the trait prior. The residuals are derived from BDL's player_round_stats endpoint with round_number=-1 for tournament totals. The per-player learned-vs-prior breakdown is exposed under player.learnedCourseFit in the JSON.

(g) Isotonic calibration of confScore

The raw confScore is a 12-weight composite signal whose 0-100 range has no inherent probability interpretation. We fit pool-adjacent-violators isotonic regression against historical make-cut outcomes from the history/ snapshot archive, producing a monotonic lookup table that maps raw scores to calibrated make-cut probabilities. The calibration is persisted in model_params.json under calibration.table and applied at scrape time, so every player carries a confScoreCalibratedMakeCutProb field. In the most recent training run, the calibrated Brier score (0.196) beat both the no-information baseline (0.249) and the raw score/100 baseline (0.220) on the same data — measurable predictive skill.

(h) Monte Carlo cut-line prediction

The cut line is simulated, not pulled from history. Every player's 36-hole total is drawn from the same per-round Normal distribution used in the matchup model, and the cut score is the 65th-best total (PGA standard cut size) averaged over 4,000 sims. When Round 1 scores are in, we hold R1 fixed and simulate only R2 — this sharply tightens the prediction Friday morning. The output includes per-player make-cut probabilities, which power the "Cut Bubble" filter on the scorecard view.

(i) Field-strength normalization

Comparing strokes-gained residuals across events is unfair without adjusting for field strength: beating a major field by +0.5 SG is more impressive than the same +0.5 at an opposite-field event. We compute a field-strength multiplier per event from the OWGRs of the top quartile of entrants — strong fields (majors) score above 1.0, opposite-field events below. The multiplier is applied to each player's learned course-fit residual before averaging, so historical residuals are weighted by how hard they were earned. The current event's multiplier is exposed at /golf-data.json under fieldStrength.

(j) Per-player tournament position probabilities

A separate 4-round Monte Carlo (4,000 sims) produces per-player probabilities for win, top-5, top-10, top-20, and make-cut. These are real model probabilities — no derivation from win odds, no heuristic scaling. They feed the scorecard's edge calculations for every market where BDL also ships a real book line (winner, top-5, make-cut). Top-10 and Top-20 don't exist as native BDL futures markets, so we expose our model probability but don't display edge — the comparison line would be our own derivation. Per-player probabilities are attached to each player object as modelTop5Prob, modelTop10Prob, modelTop20Prob, modelMakeCutProb.

(k) Per-hole Monte Carlo (round-score / birdie / bogey / eagle props)

For each player we run a separate per-hole simulation (1,500 sims) that draws each hole's outcome from a 5-way categorical distribution — eagle, birdie, par, bogey, double-plus. The base distribution per hole comes from BDL tournament_course_stats historical scoring counts (when available), adjusted multiplicatively by each player's sgTotal. The skill multipliers are tuned so a +1 SG player picks up roughly 0.5 strokes per round (matching the empirical DataGolf relationship): 25% more birdies, 20% fewer bogeys, 40% more eagles, 35% fewer doubles.

Outputs per player: round-score distribution (P(score ≤ k) for the relevant range), and per-round distributions for birdies / bogeys / eagles / double+ counts (P(count ≥ k)). These are stored in perHoleProps and summary fields (modelExpectedRoundScore, modelExpectedBirdies, etc.) attach to each player object.

When BDL's /odds/player_props ships over/under lines for these markets, we compute model fair probability from the distribution, de-vig against the over/under pair, and surface an edge percentage per side — pricing the previously "unmodeled" markets (birdies o/u, bogeys o/u, round-score o/u). Edges live at pricedPlayerProps in the JSON.

(l) Cut-line confidence interval

The cut Monte Carlo already produces a distribution of cut scores across sims — we now surface the 5th and 95th percentiles as a 90% CI. A tight CI (±1 stroke) reads "high confidence"; a wide one (±4) tells you the cut is genuinely uncertain. Exposed at cutPrediction.predictedCutCI90.

(m) Bayesian skill update mid-tournament

After each round, the actual tournament SG observation is conjugate-Gaussian updated against the season-SG prior. Prior std assumed N(season, 0.5); per-round observation noise N(round, 1.5). After 4 rounds the posterior weights the observation roughly 50% when the player has been notably different from season baseline; after 1 round, only ~10%. Exposed as sgTotalUpdated + sgTotalUpdatedStd, and every downstream predictor (matchup, cut, position, per-hole) consumes the updated value when present via effective_sg(player).

(n) Matchup decomposition

Predicted matchup edge is decomposed into per-factor contributions in strokes: skill (SG), course fit, recent form, weather. Stored under player.contrib on each matchup output and visualized in the Compare drawer so users see exactly where the edge comes from rather than a black-box win probability.

(o) Gaussian copula for cross-hole correlation

The per-hole simulator now uses a full Gaussian copula rather than a single round-wide momentum factor. The structure:

Defaults: ρ_global = 0.01, ρ_local = 0.06, calibrated empirically so total round-score std lands ~3.5 strokes on a par-72 — within the PGA empirical range (2.9-3.5). Both parameters live in model_params.json and are tunable by the weekly backtest job.

What this unlocks: the per-hole simulator now emits a true joint distribution over per-hole scores, not just round totals. The pricing engine uses it to price (a) front-9 over/unders via frontNine.pLte, (b) back-9 over/unders via backNine.pLte, (c) single-hole over/unders like "Player X on Hole 5 over 3.5" via holes[N].pLte. None of these were priceable with the old shared-momentum approximation because the joint structure wasn't accessible — only round-total marginals were.

(s) Historical calibration backfill

The live confScore calibration trains on whatever history/ snapshots we've archived locally (~3 events at any given time). We backfilled the training set with 2024 + 2025 completed PGA events from BDL: ~50 events × ~150 players each. For each historical event we pull /tournament_results for outcomes and /player_season_stats for the contemporaneous season SG, building a proxy "confScore-equivalent" via 50 + 15·sgTotal (matches the empirical confScore distribution). The isotonic table is refit on the combined live + backfill corpus, extending cleanly into the 80-95 score range where the live-only data previously plateaued. Generated by scripts/backfill_calibration.py; consumed by scripts/calibrate.py.

(t) Per-player empirical round variance

Previously every player used a global baseStd = 2.85 in the matchup Monte Carlo. Real golfers have meaningfully different round-to-round volatility — Scheffler is consistent (~2.4), Wyndham Clark is wild (~3.5). We now pull 1-2 seasons of /player_round_results from BDL, compute the empirical std of par-relative scores per player, and attach as scoreStd. Predictors consume this when available, falling back to the global base only for players with fewer than 8 historical rounds. Refines matchup edge, cut-line tightness, and Sharpe-adjusted edge rankings.

(u) Per-course SG category weights (learned)

Different courses reward different SG categories: Augusta loads heavily onto SG: Approach, Pebble onto Putting, Whistling Straits onto Driving. The hand-curated trait dictionaries capture this priors-style, but we can do better: regress historical finish position onto the four SG categories at each course (5 years × 5 events per course, ridge-regularized least squares with λ=0.5) to get per-course coefficients {ott, app, arg, putt}.

The coefficients are negative when better SG predicts a better finish (which is what we expect); the magnitudes show which SG categories matter most at the venue. They're used to refine the effective-skill computation: a player strong in SG: Approach gets an upward skill bump at Augusta beyond what their overall sgTotal would suggest. Applied via effective_sg(player, course_weights); capped at ±0.5 strokes/round to prevent runaway overlay from noisy small-sample fits. Output surfaced at courseSgWeights[course_key] with r2 for transparency.

(p) Closing-line value proxy

For every player with meaningful model edge (|edge| ≥ 2), we look back 24 hours in odds_history.json (already snapshotted at every cron run) and compute whether the implied probability moved in the model's predicted direction. The aggregate "% CLV-positive picks" is the sharp's metric for predictive skill — stronger signal than win/loss because line movement is incrementally repriced by sharp money, not by random outcomes. Sharp models target >55%. Exposed at clvSummary and per-player clvLineMoveBp + clvPositive. Zero new API calls — entirely uses existing snapshots.

(q) Player similarity (kNN on SG profiles)

For each player, the top-5 most similar peers are computed by Euclidean distance on the z-score-normalized vector of (sgOtt, sgApp, sgArg, sgPutt, drivingDistance, scoringAvg). Useful for rookies/first-timers who lack course history — their similar peers may have. Attached to each player as similarPlayers: [{name, distance}, ...].

(r) Sharpe-style edge sorting

The scorecard view's leaderboard adds an "Edge ÷ σ" sort that divides each player's de-vigged edge by the outcome std √(p·(1-p)), boosted by posterior skill uncertainty. Surfaces confident edges over noisy ones — a small edge on a tight prediction can be better than a larger edge on a coin-flip. Pure frontend computation against the model probabilities already in the JSON.

Tunable parameters

These are the actual values driving the current model, fetched live from /model_params.json. They are overwritten weekly by the backtest job when calibration drifts.

Loading live model parameters...

Performance (last backtest)

Live backtest output from /backtest-report.json. Runs every Monday at 2AM ET against all completed tournament weeks in history. Brier score below 0.25 means the model beats the 50/50 coin-flip benchmark on matchup markets.

Loading backtest report...

What we don't do (honest accounting)

Update cadence

Scraper (weekly-scrape.yml)

Backtest (weekly-backtest.yml)