Read 56: Joe DiMaggio and the Last Magic Number in Sports Online
Authors: Kostya Kennedy
The streak has been studied by mathematicians, economists, professors of finance, Ivy League sociologists and by a man who would later become the deputy director of the National Economic Council under George W. Bush. The Nobel Prize–winning physicist Edward M. Purcell took a long and detailed look at the streak, as did an evolutionary biologist so accomplished that the Society for the Study of Evolution has an award named after him—the Stephen Jay Gould Prize.
In other words, some wicked smart people have gotten into this stuff.
The streak examiners have devised algorithms, run computer simulations, and enlisted teams of undergraduate students to perform dice-rolling experiments. Some folks used a player’s game-by-game results as a basis for inquiry, others relied on a player’s cumulative statistics over the course of a season (DiMaggio in 1941 etc.), or over several seasons or a career. For some, DiMaggio’s batting average served as a point of reference; others factored in the walks that he drew. One man attempted to predict the probability of a hitter who is on a 30-game streak maintaining the streak for 26 more games by drawing upon the past performances of all players during long streaks. A paper on the general streakiness (or lack thereof) among ballplayers sought to weigh game conditions such as runners on base and the handedness of pitchers.
The results? Well, a 1994 study deduced that a streak like DiMaggio’s will occur once every 746 years. A rebuttal to that one put it at once every 18,519 years. Other estimates have ranged from a probability of about .001 (or 1 in 1,000) that DiMaggio would have hit in 56 straight in 1941 to .000054 (a little more than 5 in 100,000). Another assessment said that while the chances of DiMaggio himself having hit in 56 straight were only 1 in 3,394, the likelihood that some major leaguer at some time, somewhere, would have done it along the way is a robust 1 in 16. A few years ago a student and a professor working together at Cornell suggested that there was a 42% chance of such a streak having occurred at some point in baseball history, an overly generous estimate that upon further review the professor himself has felt inclined to question.
The data and statistics can be parsed and deployed in any number of ways to compute the likelihood of a hitting streak. All things considered there is only slightly more consensus on the probability of Joe DiMaggio—or of any major leaguer—having achieved a 56-game hitting streak than there is on the probability of life existing on other planets. Which factors are truly relevant and which are not? Given all the physical and psychological factors that come in to play on a baseball field, who really knows? To conundrums such as the Fermi Paradox and the Drake Equation, let us add The DiMaggio Enigma.
THE SAME BREED
of scholars who’ve pored over DiMaggio’s streak have also explored related concepts such as whether an athlete can get “hot” and whether or not there is such thing as a ballplayer who is predictably good in “clutch” situations. The analysts are looking at areas in which they can measure “the influence of chance or random events on the outcomes observed,” as the Harvard sociologist Stanley Lieberson wrote in his 1997 paper
Modeling Social Processes: Some Lessons from Sports
. As a whole, sports provides an attractive arena for such study because the results tend to be easily measurable and quantifiable (baskets sunk or not sunk, home runs hit or not hit, etc.) and because the games abide by governing rules that provide a degree of order and some limits upon what can and cannot occur. (A basketball player can’t play more than 48 minutes in a regulation game; a pitcher must stand 60 feet, 6 inches away from home plate and so on.) Compared to a lot of what makes up everyday life, ball games are a controlled experiment.
Study of the “hot hand” in sports grew out of work done in the 1970s by the psychologists Daniel Kahneman (also a Nobel laureate) and Amos Tversky that examined the way that humans perceive and judge probability. Then in 1985 Tversky collaborated with another psychologist, Thomas Gilovich, on a paper that charted shooting patterns in actual NBA games. The study showed that a player having what we might call a “hot hand” had no bearing upon his chances of making a basket. That is, a shooter who has sunk, say, six of seven shots in a game is no more or less likely to sink his eighth attempt than if he had been successful on only one of seven attempts to that point. Just as flipping a coin and getting heads six of seven times does not make the coin any more or less likely to land on heads on the eighth flip than if it has landed on heads just once in seven previous flips. The odds, assuming the coin is perfectly balanced, are 50-50 either way. Similarly, a player’s likelihood of sinking that eighth basket (either from the field or from the free throw line; both situations were tested) corresponds to his or her overall shooting percentage.
Subsequent studies tended to echo Tversky and Gilovich’s conclusions and similar work was done in baseball. A 1993 paper by Indiana University professor S. Christian Albright charted batters’ results over four major league seasons and “failed to find convincing evidence in support of wide-scale streakiness.” Sure, a .300 hitter will sometimes have a stretch of 11 hits in 20 at bats, and other times have a stretch of just one hit in 20 at bats, but that is what is to be expected by pure chance or random variation—again, in the same way that flipping a coin 1,000 times will give you some sequences of 15 heads in 20 flips and other sequences of five heads in 20 flips. (I know; I tried this.) Over a long enough trial, though, the proven .300 hitter will hit around .300 and the coin will land on heads roughly 50% of the time. (I got 509 tails and 491 heads over the course of my 1,000 flips.) Each at bat—like each coin flip and each foul shot—is by this reckoning an independent event unrelated to the at bats that come before or after.
So, did Joe DiMaggio, a career .343 hitter entering the 1941 season, get “hot” when he batted .440 over the final 35 games of his hitting streak? Do any players truly have hot and cold stretches, or are they simply the beneficiaries and victims of randomness? Objective analyses say that hot streaks and slumps are a myth derived from faulty perception. “Ninety-nine percent of what observers see as a player being hot or cold is an illusion,” says Red Sox adviser Bill James, baseball’s emperor of statistical analysis. “There may be rare cases when a player makes an adjustment or is bothered by an injury, or some other factor enters in that actually changes performance beyond what is expected. But otherwise a hot streak simply is not real.”
Of course if you make such a suggestion to a ballplayer he’ll look at you as if you have cream cheese on your face. “Whoever says something like that needs to get out from behind his calculator and play some ball,” says former NL batting champion Keith Hernandez. “When you are hot you feel hot. When you’re in a slump you feel lousy.”
“There is no question that sometimes you’re going well and other times not so well,” says Wade Boggs, a career .328 hitter and a five-time batting champ. “That’s just how it is. You try to be consistent but sometimes your mechanics will get messed up or something will get in your way. When you’re hot you want to ride it as long as you can.”
Batters swear that they go through stretches when the baseball looks to be the size of a grapefruit coming in to the plate, and endure other times when the ball resembles an aspirin. And they aren’t the only ones who believe this. Managers regularly strategize around the belief that an opposing batter is “hot,” and try to avoid pitching to him. As one of innumerable examples, the Los Angeles Angels felt this way about the Yankees’ Alex Rodriguez during the 2009 ALCS. A-Rod had a playoff batting average of .429 and had hit four home runs in 21 playoff at bats before the Angels walked him seven times in his final 18 trips to the plate.
So why would some scientists tend to trust statistical analysis over reports from the front lines? Why rely on untethered data and probability theory rather than on testimony from the players experiencing the events? This lack of trust in the participants brings to mind the old story of Ludwig Wittgenstein, who one day approached an acquaintance with a question.
“Tell me,” Wittgenstein began, “why do people say that it was natural for man to assume that the sun went around the earth rather than that the earth was rotating?”
His friend answered: “Well, obviously because it looks as if the sun is going around the earth.”
And Wittgenstein replied: “But, what would it have looked like if it had looked as though the earth were rotating?”
Thus, it may
look
as though a hitter who knocks out a string of home runs and multihit games is in a hot streak, or that a hitter who strikes out repeatedly while taking a succession of oh-for-fours is in a slump, but how would it look if these stretches were simply the result of random, independent events and chance?
THE MOST COMMON
approach to computing the probability of a hitting streak involves the application of two statistics: 1) the frequency with which a player gets a hit per time he comes to the plate and 2) the average number of times he comes to the plate each game. The studies that have relied on a player’s batting average to determine that first statistic, hit-frequency, are clearly in error. Batting average does not take into account unofficial at bats such as walks and sacrifice hits, even though each of those at bats is in fact an opportunity to get a hit. In many real-life game situations a walk is as good as a hit, as Little League coaches like to say. In determining a player’s batting average a walk is as good as a nothing. But in figuring the likelihood of getting a hit in a given game, a walk is as good as an out. It is a missed opportunity.
Joe DiMaggio’s career batting average was .325. He had 2,214 career hits in 6,821 official at bats. But when you divide those hits by his career
plate appearances
(7,671), you get .288. (That is, he got a hit 28.8% of the time came to the plate.) I’ll call this his career streak average. More relevant to determining DiMaggio’s likelihood of putting together a long streak in 1941 is to look at his statistics from ’36 through the end of the ’40 season. In that five-year stretch his batting average was .343, his streak average .311.
The accounting for walks (and other unofficial plate appearances) is simple but crucial. Some fans wonder why DiMaggio’s peer Ted Williams, whose career .344 batting average is the seventh highest alltime—and much higher than DiMaggio’s—never put together a streak of even 25 games. Williams was actually a much weaker candidate for a hitting streak than DiMaggio was because Williams walked so frequently. With 2,654 hits in 9,791 career plate appearances, Williams’s streak average is just .271. Even in 1941 when Williams batted .406 to DiMaggio’s .357, Williams’s streak average was just .305 to DiMaggio’s .311. Scores of players are or were better hitting streak candidates than the immortal Williams, including lesser lights such as current Phillies infielder Placido Polanco with a career batting average of just .303, but a streak average of .277.
Let’s do this very briefly. Once you have established a player’s streak average and his average plate appearances per game (for DiMaggio that number was 4.54 from 1936 up to the start of the 1941 season) you can figure out the likelihood that he will get one or more hits in a game by using a basic formula of probability theory. In DiMaggio’s case that likelihood was a hair below 82%—extremely high although hardly unprecedented. To figure the likelihood that he will get a hit in two specific consecutive games, you simply multiply .82 by itself and get about .67 or 67%. Multiply this by .82 again and you get 55% as the probability of a hit in three specific consecutive games and so on. Working it out this way the chances for a 10-game streak are 13.74%, for a 20-gamer 1.89%, for a 30-gamer .26%, for a 40-gamer .0357% and for 56 straight .0015%.
But that’s clearly not our answer. The above formula is for a
specific
56-game stretch. What we want to know is the likelihood of a given streak at any time over the course of a season, that is, in
any
of the available 56-game stretches, not a particular one. A batter could hit in games 1 through 56, or games 2 through 57, or games 3 through 58 and so on. There are 99 possible ways to hit in 56 straight over a 154-game season. Many of those sequences are overlapping. Now, figuring the probability becomes more complicated. In a 2002 paper published in the
Baseball Research Journal
, Michael Freiman employed a recursive algebraic formula to resolve this issue and ultimately came up with .01%, or one in 10,000, as DiMaggio’s chances for hitting in 56 straight at some point in the 1941 season.
1