Steal Away

Dave Paisley

A few weeks ago, my esteemed colleague Derek Zumsteg, in retaliation for the Seattle Mariners picking up noted speedster Brian Hunter from the Tigers, waxed lyrical on just how overrated the stolen base is. In Hunter's case, it's very overrated because he just doesn't get on base enough to make it worthwhile, but that's a story for another day.

In response to Derek's article, a reader took Derek to task for it.

Kevin wrote:

Hi Derek, Re-read your piece on stolen bases, and how they have zero correlation (well, 5%) with runs scored. Er, yes, and that's one of the problems with just applying stats analysis without thinking the problem through.

Ooh, them's fighting words right there. Kevin continues

Say, for example, you are managing the 75 Boston Red Sox, with Rice, Lynn, Fisk, Yaz, Hobson, etc. This team, of course, scored many runs and didn't steal all that many bases. Why? Because they had so many good hitters, it didn't make sense to take the chance to run themselves out of an inning, particularly at Fenway, when Dewey or Carbo or whoever was likely to hit a homer in the next at-bat. So, the 75 Sox are going to score a lot of runs, have no SB's, and add an 'outlier' data point to your chart.

The 'Billyball' Oakland A's were probably at the opposite end of the spectrum; not a lot of power, not a lot of high average hitters, so you'd better scrape for runs. However, since the first two items (power, average) have more effect on runs scored, again, the SB factor might be dwarfed - so here, you would have a team with high SB's and low runs scored, and another 'outlier'.

Your mission (should you decide to accept it..), is to find teams that had basically similar BA/SLG/OBP totals, but different SB attempts and success rates. For example, last year, here are the total for the Orioles and Indians:

AVE R H HR SB CS SB% SLG OBP
O's .273 817 1520 214 86 48 64% .447 .347
Tribe .272 850 1530 198 143 60 70% .448 .347

Now, given the O's 16 more homers, they should automatically have a leg up on runs scored over the Indians. However, despite numbers that were eerily similar in BA/SLG/OBP, the Indians scored 4% more runs, and surprise! they had a big edge in both the number of SB's, and their success rate. I'm sure if you download Sean Lahman's whole database, and do some sorts, you'll find more pairs like this. And then, when you've corrected for obvious factors like BA, SLG, and OBP, you can really examine SB's and see if they do contribute to runs scored. Obviously in the sample above, it appears they do, but we both know that data points of 1 are basically useless. However, I think if you have the time to look, you'll find more instances that support my case.

Regards, Kevin

So there's the setup. However, rather than wade through acres of data in the database referred to, I thought I'd use a little mathematical analysis instead. (By the way, Sean Lahman's database and much other useful stuff is on his website at www.baseball1.com)

The guru of baseball statistics is, of course, the legendary Bill James, and so I thought I'd turn to James' analysis to delve into the mystery of the value of the stolen base. James is the inventor of Runs Created, a comprehensive measure of the total contribution to runs scored derived from raw offensive stats. I find it very unwieldy, and impossible to figure out without a spreadsheet handy, so I don't use it much. However, it comes in handy here.

The full runs created formula looks like this:

RC = A x B / C

Where:
A = Hits + Walks + Hit by Pitch - Caught Stealing - Grounded into DP

B = Total Bases + .52 x (Stolen Bases + Sacrifices + Sac Flies) + .26 x (Walks + Hit by Pitch - Intentional Walks)

C = At Bats + Walks + Hit by Pitch + Sacrifices + Sac Flies

See what I mean about unwieldy? There is a simplified version of it that ignores many of the minor effects that looks like this:

RC' = A x B / C

Where:
A = Hits + Walks + Caught Stealing

B = Total Bases + .52 x (Stolen Bases) + .26 x (Walks)

C = At Bats + Walks

If we were to ignore base running and drop the Walks part from B, the formula would be pretty close to OBP x SLG x AB. Anyway, the bottom line here is that we have a mathematical model of the creation of runs that has been tried and tested and blessed by statheads worldwide that includes the effect of stolen bases. You may want to argue about the accuracy of the correlation, but if you do, I suggest you take it up with the resident RC gurus on news:rec.sport.baseball. Good luck.

The one thing many people forget about when discussing stolen bases is the success rate. The formula above shows that getting caught stealing is like taking away a base hit, while stealing a base is only worth half a hit (52% to be precise.) That can be explained logically by realizing that the getting on base is more valuable than gaining an extra base once you're on. Making an out on base is, of course, a very bad thing.

Just by tinkering around with the numbers above a little, it's obvious that you have to steal about two bases for every time caught stealing just to break even, or a 67% success rate (it actually works out to more like 69%). Now, there aren't many players that do better than about 75%, and many do worse. Here are some notables, with career numbers before this season:

Player SB CS SB%
Brian L. Hunter 177 47 79%
Tim Raines 803 145 85%
Rickey Henderson 1297 301 81%
ML 1998 3284 1505 69%

Yes, the star base stealers do beat the odds, but baseball overall just about breaks even, meaning that, if Bill James is right, base stealing is mostly a non-value added activity. All of which points to Derek's conclusion that stealing bases has little to do with run scoring in the grand scheme of things.

But what of Kevin's example teams from last year? One observation is that predicting runs and correlating them with raw offensive stats isn't an exact science. The 33 run difference isn't that significant in the first place. Plugging the actual numbers into the RC formula, it predicts a difference of only 25 runs, so we can attribute 8 of the runs to sheer random fluctuation.

Of the 25 remaining runs, 5 are accounted for by the extra 10 hits in slightly more at bats, 14 are accounted for by the increased walks and hence plate appearances and the remaining 6 from the increased stolen bases, but more importantly, the increased stolen base success rate. If the Indians had been successful at the same rate as the Orioles -- 64% -- they would have lost some runs.

Is this going to change anybody's mind about stolen bases? Probably not. The inflated opinion of them notwithstanding, stolen bases are exciting to watch and do have value if used judiciously and practiced by experts.

about the author

Dave Paisley just loves a good chat about the value of statistics. Send your bouquets, brickbats and arguments along to him at drdjp@strikethree.com.

Google
Web Strikethree.com