Understanding Baseball Through Computer Simulation: Evaluating Batting, Steals, and Hot Hand

Accurate baseball analysis is made very difficult by the fact that there are many variables at play and that multiple players can simultaneously affect the game. For example, we cannot truly quantify a batter’s individual contribution to an offense because scoring runs is a team effort. The analysis is also limited by the size of the sample, which makes judgments about decisions such as stealing third base very imprecise. I can get around these barriers with a computer simulation, where I can test changes in one variable while keeping the rest constant, and quickly execute millions of inputs. The simulation can isolate and evaluate a batter by simulating an offense made up entirely of the same player and calculating the runs of this team for 9 innings. For example, you can simulate an offense made up of 9 Mike Trouts and determine how many runs he would score per 9 innings, or simulate an all-player offense hitting home runs 15% of the time and going out the rest of the time. hour. When tested with real MLB teams, the simulation accurately determines their runs per 9 innings and results with 4.5434 runs per 9 innings for the 2016 MLB average, just 1% above the true average. This simulation can help improve our understanding of a player’s value as a hitter, the relative value of walks and different types of hits, the hot-hand fallacy, and basing decisions.

For the program to analyze batting, it simulates only the part of the game where the team is on offense and simulates that all base running backs act like the 2016 MLB average base running back. The program works by randomizing appearance of each plate with the probability of each result based directly on the statistics entered. For example, if the user entered five doubles out of 100 plate appearances, the simulated hitter would have a 5% chance of hitting a double each time at bat. It also includes the possibility of errors, double plays and sacrifice shots, which is based on the league average. The program then stores information about the number of outs, runs, the entry, and the status of each base. For the simulation to be an accurate evaluation that does not vary by chance, it simulates 9 million innings and then returns the number of runs scored for every 9 innings.

This simulation creates a unique statistic for a player’s worth as a hitter, called simulated runs per game (SRPG). This is the number of runs scored for every 9 innings in the simulation by a team made up entirely of this player. This is an effective measure of a hitter because it isolates him from other factors and turns all parts of the hitting into runs scored, which is what results in wins. This can be used to compare and rank the value of MLB hitters, or hypothetical players, such as a player who walks half the time and walks the other half versus a player who only hits home runs 15%. of time (player per base wins by 3.8 runs). The most common hitting statistics don’t accurately weigh up the relative value of walks and different types of hits. Batting average and on-base percentage don’t value a home run higher than a solo, slugging percentage values ​​a home run four times as much as a solo, and the most slugging on-base only adds up to the stats. Through this simulation, we can determine the value of the walks and each hit type in terms of added runs. I did this by taking the 2016 league average and then adding or subtracting a score as walks, and finding the added runs per game per added walk, and then doing the same for different types of shots. Then I used division to find the relative value of each result and set walks equal to one. The relative running value of a single is 1,226 times the value of a walk, doubles are 1,713, triples are 2,211, and home runs are 2,977 times more valuable than a walk. By using these values ​​and dividing by plate appearances, we have a new stat I’ll call Batting Value, defined as: ((Walks + 1,226 * Singles + 1,713 * Doubles + 2,211 * Triples + 2,977 * Home Runs) / Appearances in the plate ). This is similar to the advanced Weighted On-Base Average (wOBA) statistic, created by sabermetric expert Tom Tango. wOBA similarly evaluates results based on run value, but is not based on computer simulation. The relative value of each hit type for tickets in wOBA is 1.29 for singles, 1.84 for doubles, 2.348 for triples, and 3.043 for home runs. These values ​​are very similar to the batting value, but the batting value values ​​are slightly higher. While there are slight differences, both the wOBA and the batting value are much more accurate and complete measures of a batter’s worth than are commonly known. Both batting value and simulated runs per game can be used to rank the effectiveness of hitters. The 2016 MLB average for SRPG is 4.5434 and 0.4524 for Batting Value. According to SRPG, the best hitter of the 2016 season was Mike Trout, with an SRPG of 9.712871. In terms of batting value, David Ortiz was the best hitter at .607, just above Trout’s .603 batting value. This difference in ranking makes sense because Ortiz’s power is a major advantage in the context of the MLB average, on which batting value is based, where Trout’s ability to get on base is a major advantage in the context of the super team. simulated 9 Mike Trouts. .

This simulation also provides insight into the hot hand idea. This program does not take into account the hot or cold hand, so, for example, it does not make a pitcher nervous after allowing punches and throwing less effectively for the remainder of the inning. The entire show is random without this boost idea, however the simulation can still accurately generate runs per game for MLB teams. This suggests that there is no true hot hand for offenses or pitchers, because if there were a hot hand, the hits would cluster in certain innings more than in the simulation, resulting in more runs per game. This supports the idea that the hot hand is a fallacy, and it is simply a matter of misinterpreting the streaks as a result of the hot hand rather than a possible random outcome, such as pulling heads three times in a row on a coin toss.

When simulation is used to evaluate batting, the entire base run is simulated to be the 2016 MLB average – for example, running backs score from second place only about 60% of the time. However, an additional feature of the program allows the user to enter the second and third theft success rates, and then add stolen bases to the program accordingly. This can be used to determine what base stealing rate adds to the runs per game and helps the team. This can be found by testing different success rates until the equilibrium rate is found. For stealing seconds with all other bases empty, the equilibrium rate is 76.5%. When broken down by the number of exits, the equilibrium rates are 79.5% when there are 0 exits, 74.4% with 1 exit, and 69.5% with 2 exits. To steal third with all other bases empty, the general equilibrium rate is 77%, 76% for 0 outs, 74% for 1 out, and 84% for 2 outs. This confirms the conventional wisdom that the best time to steal a third is with 1 out and the worst time is with 2 outs. This can help determine if a robbery attempt is a good idea by estimating whether the runner’s success rate is greater than the situational equilibrium rate. However, other factors still need to be considered, such as the type of hitters behind the runner, the score, and what inning it is.

In conclusion, this program is capable of carrying out simulations that are impossible in real life, giving us a new way of analyzing baseball. With this, we can isolate a game variable from all other variables in real baseball and test its effect on runs scored. This includes eliminating all other players and making a completely one-hitter team, adjusting the probability of a certain outcome, such as a home run, and adding stolen bases to a specific situation. The simulation is capable of accurately determining runs per 9 innings for MLB teams and the 2016 MLB average, providing evidence against the hot hand.

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *