The Python code below runs the anonymized implementation of the methodology described here that was used in "The Tennis Racket". The methodology contains many important details. Please read it before continuing here.
The code below excludes opening odds that implied probabilities more than 10 percentage points higher or lower than the median of all bookmakers’ opening odds for the match. (Otherwise the return of these odds toward the consensus could be mistaken for a sign of suspicious betting.) The code also excludes matches that were noted as "canceled" — typically a result of pre-match withdrawals — or "walkover" on OddsPortal.
The code below find the odds movement for a bookmaker in a given match by calculating the difference between each player’s chance of winning implied by the opening and final odds.
The code below selects only matches where, in at least one book, the odds moved more than 10 percentage points. The 10-percentage-point cutoff is based on discussions with sports-betting investigators, who said that movement above this threshold was what prompted them to give greater scrutiny to a match.
Players who lost more than 10 such “high-movement” matches are selected for analysis.
The code below runs a series of simulations to estimate the unlikelihood of each player’s outcomes. Each simulation uses the player’s implied chance of winning — based on each match’s opening odds — to generate a set of outcomes for each string of matches. BuzzFeed News ran the simulation 1 million times per player. The result: The estimated chance that the player would have lost as many (or more) high-movement matches as the player did, if the chances implied by the opening odds were correct.
Note on reading the
In some simulations an additional player received an estimated likelihood just barely under 0.05. To be conservative we are not including that player among our totals.
The strings below represent the anonymized names of the 28 players flagged in a 2008 report by investigators for the Assocation of Tennis Professionals. Each anonymized name is the SHA256 hash of the name plus a randomly-generated salt.