On Monday, E.J. discussed recent work published by Bill Petti at FanGraphs. For over a decade, we've tried to best objectify a part of baseball that Bill James' failed to address with his Pythagorean Expectation. Using his method, we use runs scored and run allowed to estimate the amount of games a team should have won and lost. With a big enough sample size, the Pythagorean Expectation has been wildly successful, but that doesn't mean it's perfect. Earlier this year, Petti posted an article on FanGraphs to help inspire quantifying consistency. How would this affect Pythagorean win-loss records? The idea is that, a team would prefer a player hit one time in three at bats for three games, rather than getting three hits in three at bats one game and then go hitless over the next two games. While the three hits in one game help a team tremendously in that one game, they will fail contribute in the next two games.
At the end of the year, both of these players will have the same statistics, and the amount of runs scored will remain the same. The problem is, the Pythagorean record won't reflect that these runs were bunched into one out of three games, as opposed to being spread out between the entire year. In theory, consistent offenses should perform better than streaky teams throughout the year.
This is where Monday's article comes into play. Petti, with the help of Baseball Prospectus' Matt Swartz, improved upon a statistic he calls volatility (VOL). The point of the statistic is to calculate how consistent a player is, and the lower, the better. In his piece, he found the volatility of all hitters in 2012 with 300 plate appearances or more. As I mentioned, E.J. pointed out the significance of his findings, of which, Derek Jeter ranks as the least volatile (or the most consistent) hitter since 1974.
It's pretty amazing stuff, but it had me thinking about the Yankees' offensive struggles in 2012. Although the team had the highest wOBA in the MLB, and scored the second most runs behind the Rangers, the Yankees certainly lacked something last season. We heard the term "RISP fail" thrown around quite a bit, but by the end of the year, the team ranked ninth in baseball with a .788 OPS in scoring position opportunities. According to the data, the Yankees didn't struggle with runners in scoring position, however it's hard to overlook the collapse the team approached in August and September, and the ridiculously cold offense during the playoffs.
If someone forced me to make a subjective observation about this team's offense, I would call them too streaky. Fortunately, the Yankees have the most consistent player in baseball since 1974 on their team, but what about the rest? How did the Yankees' volatility rank amongst other teams? And did consistency actually affect James' Pythagorean Expectation over the course of a full season?
I decided the best way to figure this all out was to throw Petti's numbers into an SQL database. I matched up his database with one that accounted for teams, and then I did some basic math. In short, I found the average team volatility in 2012, taking into account the number of plate appearances by each player. Please note that the players included still only have 300 plate appearances, and it leaves out players who served partial seasons with teams.
|Team||Team VOL||Players||Pythagorean||Record||Pyth Diff|
Unsurprisingly, the Yankees ranked with the 8th highest volatility in baseball, and the second highest in the American League. More volatility means less consistency, and this matches up with exactly what we saw on the field. The Yankees were very good at putting runs on the board, but a lot of it happened in streaks.
You might speculate that this means their home run dependent offense somehow created unstable run production, but in this theory, power actually helps a team become more consistent. With more doubles, triples, and home runs, less hits are required to score runs.
Although the sample size is far too small, and I'm far from a statistics intellect, I found the data interesting when matched with a team's Pythagorean record and actual record. Petti pointed out that the point of volatility was to quantitate an aspect of baseball that was forgotten by the James' Pythagorean Expectation. When I coupled the cumulative team VOL with the difference in the Pythagorean records and actual records, there seems to be some significance. The top 10 teams with the lowest VOL averaged 1.67 wins above their expected Pythagorean record, the 10 teams in the middle averaged 0.2 wins above their expected record, and the bottom 10 teams averaged 2 wins below their expected record. Of course, it's important to note that teams like the Orioles were obvious outliers that remained in the data.
Back to the Yankees. While the team ranked exceptionally low with their offensive consistency, their Pythagorean Expectation was identical to their actual record. Perhaps low pitcher volatility made up for the high offensive volatility. It's something to explore as we try to better predict baseball. From a Yankee fan stand point, if you were one of those that threw around the term "RISP fail", you're allowed to feel validated, the offense was inconsistent.