Right after the 2005 season, my now-departed (from the board, not from this mortal coil, so far as I know) and incredibly thin-skinned friend JayhawkBill got into a debate regarding what we could expect from Manny over the remainder of his contract.
The conversation begins in earnest here, and is composed of 2 separate components:
Component 1: Historical Comps and Projections
JHB contended that PECOTA/BR’s historical comps – which included Albert Belle and Bo Jackson – would provide useful barometers of what to expect from Manny moving forward. In the process, he essentially dismissed those of us who pointed out the flaws in comparison-based methodology as being some sort of anti-intellectual Luddites.
In short, my response to JHB was:
QUOTE
I'll tell you what, JHB, name a reasonable bet, and I'll take the over on those projections. 79 HR in the next 4 years? He'll do that in the next 2 +. 372 hits in 4 years? Ditto.
In the process, JHB dismissed arguments with snarky tidbits such as “Cool. I've got numbers” and “I have something, and you have nothing. Feel free to post your analysis when you have time. Until then, something is better than nothing,” the latter of which flagrantly ignored the substance of opposition to his viewpoint.
Well guess what, JHB (wherever you are)? I have time. I think I’ll post an analysis.
First of all, let’s look at BR’s top 10 similar players to Manny back then and now:
CODE
**2005*** ***2008***
1. Juan Gonzalez (913) 1. Ken Griffey (866)
2. Jim Thome (859) 2. Jeff Bagwell (847)
3. Frank Thomas (852) 3. Frank Robinson (842) *
4. Albert Belle (851) 4. Barry Bonds (839)
5. Johnny Mize (848) 5. Frank Thomas (836)
6. Larry Walker (841) 6. Gary Sheffield (829)
7. Jose Canseco (830) 7. Chipper Jones (819)
8. Duke Snider (827) 8. Willie Mays (818) *
9. Hank Greenberg (824) 9. Jim Thome (816)
10. Joe DiMaggio (818) 10. Mickey Mantle (811) *
1. Juan Gonzalez (913) 1. Ken Griffey (866)
2. Jim Thome (859) 2. Jeff Bagwell (847)
3. Frank Thomas (852) 3. Frank Robinson (842) *
4. Albert Belle (851) 4. Barry Bonds (839)
5. Johnny Mize (848) 5. Frank Thomas (836)
6. Larry Walker (841) 6. Gary Sheffield (829)
7. Jose Canseco (830) 7. Chipper Jones (819)
8. Duke Snider (827) 8. Willie Mays (818) *
9. Hank Greenberg (824) 9. Jim Thome (816)
10. Joe DiMaggio (818) 10. Mickey Mantle (811) *
A couple of interesting things to take away from this. First, Manny’s 2008 comps are better players than his 2005 comps. Historically good players. Hall-of-Famers (with the possible exception of Thome and Sheff – and I think they’ll both make it).
Second, the comparison scores of Manny’s top comps are even lower than they were in 2005.
Both findings support my contention at the time regarding the flaws of using low comparison score players to predict another player:
QUOTE
(this methodology is)…by definition flawed once any given player has fewer players that compare to him. Tony Graffanino might have 500 legit comps in the history of baseball (probably more). Manny Ramirez might have 10 (perhaps less).
Small n = large variabilty = massive error of estimate.
Small n = large variabilty = massive error of estimate.
…and…
QUOTE
So not only do the comp players not "seem" like Manny on the surface (injury, skillset, era, etc.), but they're not especially good statistical matches for his career. Like I said earlier, Manny is a unique cat. Since there's such a small pool of players anywhere near Manny in baseball history, there is a shitload of error involved in using the "top 10" to project Manny.
So, let’s look at the numbers.
First, the counting stats. In the table below, the JHB 4-Year columns refer to the projections made by BR back in 2005. Note that these are 4-year projections, and that Manny has had (as of today) 2.18 years to amass further stats. The 2+ x 2005 columns are Manny’s 2005 numbers multiplied by (as of today) 2.18.

We see that while HR’s have dropped off from 2005, Manny is performing at or better than 2005 levels in hits, doubles, triples, and walks. He is striking out a little less. He has ALREADY outperformed JHB’s 4-YEAR projections in doubles (and in K’s, unfortunately), and is at or above 77% of the way in most other counting stats.
Next, the rate stats. Keep in mind that using rate stats may be a bit unfair, as they may lower between now and the end of JHB’s 4-year projection window:

AVG and OPB are BETTER than in 2005, SLG is lower (due to the observed drop in HR from ’05), and OPS is pretty much a wash (8 points out of nearly a thousand).
Conclusion vis a vis this portion of the argument. I believe the data pretty clearly illustrates that my conclusions about the flaws of BR’s methodology in this particular case were justified.
Component 2: Using Regression to Predict Player Performance
The next portion of the debate centered on JHB’s use of a regression equation built on 5 (yes, 5) data points to predict fielding stats (specifically, ZR).
I pointed out that:
QUOTE
I would argue that there are floor effects that render your linear prediction untenable…I would be quite surprised to see his performance continue to decline in such a steep linear pattern.
…and…
QUOTE
As I see it, the major problem with using regression in this context is that it assumes an infinite linear relationship between year and (in this case) ZR, and as I pointed out earlier, there may be a "floor" to how low even Manny's ZR is likely to go.
Even after explaining the inappropriate use of regression to JHB, he clung to these 3-year projections of Manny’s ZR:
CODE
Predicted Actual %error
2006 .687 .694 1.0%
2007 .655 .713 8.9%
2008 .622 .778 25.1%
2006 .687 .694 1.0%
2007 .655 .713 8.9%
2008 .622 .778 25.1%
Note that as I said back in 2005, the regression assumes a linear relationship that get further and further from reality whenever a floor effect sets in. Manny is basically the same fielder as he was in 2005. Granted, this is not a good thing – but FAR better than the regression predicted – because regression was an inappropriate tool.
Conclusion
The point of this post was not to belittle JHB, who I like a lot. Rather, it is to illustrate why it is equally important to recognize the limitations of statistics as it is to recognize their strengths. By focusing too much on the strengths, JHB fell into the trap of ignoring the flaws.