Monday, April 04, 2005
Trent at Lookout Landing has used Ichiro's career batting average and at bats per game to determine that Ichiro's career would have to span over 110 full seasons before a hit streak of 56 consecutive games would be more likely than not to have occurred.
Using a similar approach, I've calculated the probabilities of tying or breaking Dimaggio's record in a single season for every player that has appeared in 56 or more games in a season since 1941. (My methods are explained in the comments for this post; the short version is that number of games played, batting average, and at bats per game are the key factors). Here are the ten seasons in which it was most likely for a 56-game hitting streak or better to have occurred:
1) Ichiro, 2004, 4.05% probability
2) Rod Carew, 1977, 1.84%
3) Ichiro, 2001, 1.12%
4) Darin Erstad, 2000, 1.03%
5) Wade Boggs, 1985, 0.79%
6) Stan Musial, 1948, 0.77%
7) George Brett, 1980, 0.67%
8) Tony Gwynn, 1994, 0.66%
9) Tony Gwynn, 1997, 0.61%
10) Ralph Garr, 1974, 0.58%
54) Joe Dimaggio, 1941, 0.13%
I think that it speaks volumes to the difficulty in hitting safely in 56 consecutive games that in the season in which it occurred, it was only 0.13 percent likely to have taken place. It also speaks volumes to Tony Gwynn's 1994 season that despite losing several games to the 1994 players' strike, he still placed in the top ten.
It makes intuitive sense that the 2004 version of Ichiro should be at the top of the list. He established a new record for hits in a season, he played in 161 games, and was among the league leaders in at bats per game. Taking into account every players' probabilities, Dimaggio's record has had a 27.21% probability of being equalled or bettered since it was established in 1941. It seems that there's actually a reasonable chance that someone breaks the unbreakable record sometime in my lifetime. Why not in 2005?
Three statistics are used to generate these probabilities: games G, at bats AB, and batting average AVG. From these we can determine AB/G to generate the probability p that a player gets a hit in any individual game:
p = 1 - (1-AVG)^(AB/G)
The part after the "1 -" is the chance of failing to get a hit. Subtracting the chance of failure from one leave the chance of success.
The probabilty P that a player begins a 56-game (or better) hitting streak in any given game is
P = p^56
and the probability q of failing to start a 56-game hit streak (what high standards!) is
q = 1 - P
Now, to simplify matters greatly, I decided that a hitting streak doesn't count if it spans more than a single season. So if you haven't started a 56-game hitting streak by the 56th-to-last game in which you played in a season, you'll just have to wait until the next Opening Day to try again. So, in a given season a player has n chances, with n given by
n = G - 55
to start a 56-game hit streak. The chance F of failing to hit in 56 consecutive games at some point during a season is found using
F = q^n
Subtracting F from 1 gives the probability of a 56-game hit streak occurring, based on a player's batting statistics that season.