Ballpark EstimatesJason Michael Barker

While browsing through newsgroups (known as "Usenet" to the tragically hip) earlier this week, I ran across an interesting formula for estimating a pitcher's ERA. It seems that a few years ago, Bill James discovered that given a pitcher's on-base percentage and slugging percentage allowed, you can estimate fairly accurately what his ERA is going to be.

The formula itself is really quite simple: multiply OBP times SLG, then multiply the resulting number by 31. For example, if Pitcher X allowed a .300 OBP and a .400 SLG, the formula would predict his ERA to be 3.72 -- not bad in this era of big offense. If opponents hit him a bit harder, say .350 OBP and .500 SLG, that works out to a 4.88 ERA.

Curious about how accurate this tool was, I ran the numbers for various Seattle Mariner pitchers (this was in the Mariners' newsgroup, after all). Much to my surprise it was quite accurate, albeit with a small sample of pitchers. Of the seven pitchers I looked at (I told you it was a small sample), each one's estimated ERA was within half a run of his actual ERA, and most were much closer.

At this point I was more than a bit intrigued. A few questions popped into my head: What might cause a pitcher's actual ERA to vary from his estimated ERA? What sort of conclusions should we draw about a pitcher who drastically overperforms (or underperforms) the estimation? And finally, what am I going to eat for dinner this evening? (mac and cheese, if you must know).

Obviously, seven Seattle pitchers do not a meaningful query make, so I decided to look at all the pitchers who threw at least 200 innings last season (the chart below is excerpted from that list).

OBP is the On-Base Percentage allowed, SLG is the Slugging percentage allowed, and eERA is Estimated ERA based on our formula (OBP*SLG*31). aERA is the pitcher's actual 1999 ERA. The chart is sorted by eERA - aERA, the difference between the two ERAs, where a positive number indicates that the calculation was too high. % Difference shows how far off Estimated ERA was, as a percentage of actual ERA.

Pitcher Team OBP SLG eERA aERA eERA - aERA Difference
Juan Guzman CIN .332 .419 4.31 3.74 0.57 15.3%
Brad Radke MIN .314 .443 4.31 3.75 0.56 15.0%
Mike Mussina BAL .312 .411 3.98 3.50 0.48 13.6%
Greg Maddux ATL .323 .403 4.04 3.57 0.47 13.0%
Ismael Valdes LA .321 .446 4.44 3.98 0.46 11.5%
Randy Johnson ARI .266 .335 2.76 2.48 0.28 11.4%
Mike Hampton HOU .322 .324 3.23 2.98 0.25 8.5%
Pedro Martinez BOS .248 .288 2.21 2.07 0.14 7.0%
Freddy Garcia SEA .345 .398 4.26 4.07 0.19 4.6%
Dave Burba CLE .336 .421 4.39 4.25 0.14 3.2%
Andy Ashby SD .311 .406 3.91 3.80 0.11 3.0%
Sterling Hitchcock SD .320 .425 4.22 4.11 0.11 2.6%
Shane Reynolds HOU .303 .418 3.93 3.85 0.08 2.0%
Tom Glavine ATL .346 .390 4.18 4.12 0.06 1.5%
Pedro Astacio COL .343 .479 5.09 5.04 0.05 1.1%
Kevin Millwood ATL .258 .337 2.70 2.68 0.02 0.6%
Scott Erickson BAL .358 .432 4.79 4.81 -0.02 -0.3%
Omar Daal ARI .308 .378 3.61 3.65 -0.04 -1.1%
Charles Nagy CLE .344 .458 4.88 4.95 -0.07 -1.3%
Jamie Moyer SEA .311 .394 3.80 3.87 -0.07 -1.8%
Bartolo Colon CLE .314 .398 3.87 3.95 -0.08 -1.9%
Kevin Brown LA .273 .336 2.84 3.00 -0.16 -5.2%
Kevin Appier OAK .349 .448 4.85 5.17 -0.32 -6.2%
Orlando Hernandez NYY .311 .392 3.78 4.12 -0.34 -8.3%
Gil Heredia OAK .318 .441 4.35 4.81 -0.46 -9.6%
David Wells TOR .320 .438 4.34 4.82 -0.48 -9.9%
Chuck Finley ANA .330 .386 3.95 4.43 -0.48 -10.9%
Eric Milton MIN .299 .406 3.76 4.49 -0.73 -16.2%
Steve Trachsel CHC .330 .457 4.68 5.56 -0.88 -15.9%

Some remarks:
-- Even with this larger sample, the formula appears to be quite accurate: with just four exceptions, every pitcher's Estimated ERA was within half a run of his actual ERA. Overall, estimates were off an average of .27 runs per pitcher.

-- In trying to come up with some reasoning behind the estimates that are off, the first thought that comes to mind is "bullpen." Pitchers often leave games with runners on base, and it's up to their bullpen whether the runners score or not. A "lucky" pitcher (those at the top of the list) who performs better than his estimate might have had a higher percentage of his runners stranded, while the "unlucky" pitchers at the bottom of the list might have had a large percentage of their runners come around to score.

-- There are certainly other reasons why the estimates might be off, but bullpen performance is the most intuitive to me. I'd be interested in hearing your ideas on the subject.

-- In terms of long-term conclusions I'm inclined to think that, much like run support, these things will average out over time. So, similar to a pitcher who posts a gaudy won-loss record despite mediocre performance, a pitcher who performed better than his estimate shouldn't expect to be so lucky the next year. And like the hard-luck pitcher who goes 8-15 despite a league-average ERA, pitchers who perform worse than their estimate should expect to catch a few breaks the next season.

So here's tip for all you fantasy leaguers out there -- look for pitchers who were unlucky in 1999 (both in terms of won-loss record and estimated ERA), and assume they're going to have better luck next year.

about the author

Jason Michael Barker is much too young to have the nickname "Bud," so don't even try it, mister. Suggest "Sparky" at jmb@strikethree.com.

Google Custom Search