Jump to content
Brewer Fanatic

Stats Are Evil


rluzinski
Quote:
The thing that bothers me about this entire argument is when people point to the exceptions and say "See! Stats are totally unreliable".

if you were refering to my post, that wasn't my intent. it just seems that in the instances stats don't predict something correctly, rather than saying yes, there were things at work in that case that stats do not encompass. there are a lot of psychological and human elements to baseball. the stats were wrong, it is just disregarded and somehow doesn't count, because that wasn't the "real" jj hardy.

Link to comment
Share on other sites

  • Replies 75
  • Created
  • Last Reply

I don't see how anyone could be suprised that JJ struggled at first when he had only 100 AB above AA and missed almost a whole year.

 

I will say that all those player production predictors didn't take into account Hardy's time off, giving a good example of a model not taking into account all variables.

Link to comment
Share on other sites

Quote:
I will say that all those player production predictors didn't take into account Hardy's time off, giving a good example of a model not taking into account all variables.

precisely. and now that hardy has gotten his feet wet, he seems to be on track to putting up similar numbers to what the production predictors would have estimated.

 

a wonderful example of stats and non-stats living in perfect harmony...

Link to comment
Share on other sites

The whole point of my post was to show that even if you took a robot hitter that had a 25% chance of getting a hit per AB, he wouldn't bat .250 as a result of the natural distribution of probabilistic events. Therefore, the very best simulator in the whole world that could somehow magically KNOW what the exact abilities of a player will be for the next year STILL will be off as often and as much as my chart shows.

 

I was responding to this:

 

Just because a statistical prediction is off by 20 points on a batting average doesn't make it wrong. Even if you knew the EXACT ability of the player you wouldn't be exactly right. I think people need to keep that in mind.

 

I guess I just don't understand why we should used this chart then. Even by your admission even the most accurate simulation wouldn't be completely accurate even by the guidelines of the chart. So why would we use it when we don't know the actual ability of the player, we don't take age into consideration, or take injury into consideration? If the most accurate simulation would be off, then using it when factoring in all of the things I just stated would make this chart totally unrealiable.

 

So what I'm saying is that in your statement when you said

 

"Just because a statistical prediction is off by 20 points on a batting average doesn't make it wrong."

 

It actually really is wrong for what it is being used for. Maybe not for the parameters by which it was constructed, but for all uses in baseball when trying to predicting future events it isn't reliable.

Link to comment
Share on other sites

isn't that a prime example of how unreliable stats can be? because they don't take things like this into account? that's an entire half-season worth of baseball that stats would have never predicted due to the "human element."

 

I think I'm not explaining my position very well. Part of the problem may be the framing of this discussion as "stats vs. non-stats." What I'm saying is: Stats are incredibly valuable as predictors. In making a prediction, you have to read stats in context. A lot of contextual factors can be quantified; information can always be made more complete.

 

Nobody here is arguing for robotic application of stats. At the beginning of this year, I reacted to the BP projection about Hardy approximately like this: "Hmm. They think, based on his minor league performance, that he's a solid major league hitter. That seems like a valid assessment of his ability. I wonder how recovery from the injury will affect his performance?" That isn't some nebulous "human element," in the way I think you mean -- something that nobody could ever hope to quantify. Obviously it involves a human element, but just as obviously you can predict a lot of human behavior. If people are starving, they will probably steal food. That's as human as it gets, but if I owned a grocery store, I could make a pretty good estimate of how that human tragedy would affect my inventory and revenues. If the BP folks had focused on everything we knew about JJ Hardy, they could have adjusted their numbers based on the fact that he sat out a year. It isn't like he's the first person who ever had that problem, which means we have information that we can use to adjust the numbers. BP didn't do that, because they use a consistent method across the book.

 

As it turned out, recovery from the injury (plus dumb luck) affected JJ's performance a whole lot. In June, if you had asked me, "Doesn't this debacle prove JJ Hardy isn't a good major league hitter?" I would have said, "I don't believe it does. His minor league stats still make me confident that he'll hit, once he's cleaned off the rust from missing most of 2004." I was right about that, because of the statistical info. That's all I'm saying.

 

I can't put it any more simply than that. To me, the thought process I just described is a statistical analysis. It's not a robotic statistical analysis that says the numbers tell you everything without thought, assumption, or adjustment. But I have to disagree with you that something like the injury recovery, or the random variance that Russ and James have both talked about, undermines the value of stats. Ultimately, we aren't talking about "stats"; we're talking about the belief that concrete information has great value for predicting future events.

 

Some predictable events, like injury rust as a drain on performance, are harder to quantify than others. Even predictable events are subject to random variance. I think you and I agree on those two facts. If you want to keep shouting to the hills that I'm disingenuously denying the imperfection of stats, I can't stop you. But with all respect, I think I'm not doing that. I hope my clarification here will at least convince you of my good faith in this discussion.

 

Greg.

Link to comment
Share on other sites

Brewer Fanatic Contributor
Quote:
The thing that bothers me about this entire argument is when people point to the exceptions and say "See! Stats are totally unreliable".

 

This happens all to often on these boards, which is unfortunate at times. Many good discussions quickly deteriorate into a few people shooting off post after post about the 'exception' or whatever, completely losing the original idea of the thread.

Link to comment
Share on other sites

I know you are trying to say that Ichiro is worth more to a team because the fans love him and you are probably right. While that type of worth may show up on an accountant's spreadsheet, SABR fella's are concentrated on a player's worth in winning games.

 

In a way, that's like me saying that since I can't do OXS without putting pen to paper, I can't see it's value.

 

Everything about the sport is derived so that the owners make money. Everything.

 

Winning and fielding winning teams is a huge, monsterous part of that but not the only, "end all" part.

 

For example, the Cubs, who are just rolling in cash with no effort whatsoever realized in 2003 with 80 share TV ratings on playoff games, finally realized that they could be making a whole lot "more" money by winning. Sadly they had no plan on making that happen, so they have regressed but at least you can see the team has that goal now.

 

Conversly the Royals, who have as loyal and hardcore fanbase as any team, have pissed it all away by driving the team into the ground with mis-management. If they have a goal other than staying out of bankruptcy, I can't see it.

 

On another topic . . .

 

My problem with SABR has always been it's arrogance.

 

Using Brian's example of Smoking, the SABR argument would go like this.

______________

SABR:8-10 people die from smoking.

 

Me: what do the other 2 people die from?

 

SABR: It doesn't matter stupid! 8-10 people die from smoking!

 

Me: I'm just curious. What outside factors led to their deaths?

 

SABR: Listen MORON! Numbers don't LIE. YOU ARE GOING TO DIE FROM SMOKING, GET IT? YOU ARE JUST PATHETIC AND CLUELESS!!

 

Me: I don't smoke.

________________

 

It's kind of like the whole "traditional wisdom" vs bloggers thing.

 

Right now the "traditional wisdom" is coming out the stats community in an endless stream and the "fringe" oppinons are coming out of moron Talk Radio hosts and Joe Morgan. Yet all we hear is how resistant baseball is to these ideas.

 

Believe me, If Howard Lincoln could milk a few more nickles from his paying customers by using statistical analysis then the M's would hire Mat Olkin right away.

 

Oh wait they already did? Now if they would only listen to him.

Link to comment
Share on other sites

Quote:
I hope my clarification here will at least convince you of my good faith in this discussion

absolutely. i wasn't really trying to point out an "exception" to stats being useful either, and i apologize if i misconstrued what you were trying to say. if anyone was to look through my post history, while i at times think stats can be unreliable (stating the obvious there i guess), i have never denied their usefulness and i realize they are the best tool we have.

 

you did bring up another interesting point though. are things like recovery from injury and time off/rustiness at all quantifiable? you posted that the projections for hardy could have been altered to take these things into account. what would be the best way to go about doing this? are there actual formulas that have been developed based on type of injury or time away? considering how different each player's mental and physical makeup is, and how different each individual injury can be, it seems like those are things that would be nearly impossible to quantify. particularly if that player has never suffered a major injury before, and there is no past information to reference.

Link to comment
Share on other sites

My problem with SABR has always been it's arrogance.

 

Fair enough. But I must ask you to please resist the urge to bias your opinion of statistic analysis because you don't generally care for the people who are conducting the analysis. I certainly hope I don't come accross that way (too often atleast) because I'm the first to admit I'm learning as I go.

 

I guess I just don't understand why we should used this chart then.

 

So you know the "error" of stat in general? For example, you might hear a poll proclaiming that grape jelly is more popular than strawberry. They'll say 53% of all people polled liked grape jelly and 47% liked strawberry jelly. If that poll has an error of +/- 4% than the poll can't conclude anything, though. If however, 56% of the pollers sampled liked grape jelly, than you can safely conclude that grape jelly is, in fact, more popular (shame on them).

 

It actually really is wrong for what it is being used for. Maybe not for the parameters by which it was constructed, but for all uses in baseball when trying to predicting future events it isn't reliable.

 

All it's being used for is to subtract off the uncertainty of a number distributed binomially. That's really it. If a .250 career batter hits .260 one year, it could certainly be a result of only luck.

Link to comment
Share on other sites

I've been meaning to write a magnum opus about luck (well not so much mangum opus but more of a well thought out monograph). What Russ is talking about can be called luck but really isn't. All he's saying is that even a season isn't necessarily a large enough sample size to gaurentee that a .250 hitter hits .250 for the year.
Link to comment
Share on other sites

[ You can say the same for the minor league careers of Bill Hall and Steve Scarborough too. Yes...scary. ]

 

Hall was pushed, Scar was not. That's the major difference in my mind between their numbers.

 

The thing about Billy Hall is that I have _faith_ that he can be a good hitter based on the fact that when we see him approach at-bats differently, the results are great. It's not just an issue of him getting hot.

 

Now, if he can learn how to keep that approach consistent over the year, then i'd be a happy Bill Hall 2.0 camper.

Link to comment
Share on other sites

isn't that a prime example of how unreliable stats can be? because they don't take things like this into account? that's an entire half-season worth of baseball that stats would have never predicted due to the "human element."

Yeah, and that's why, like rluz pointed out, only an idiot will rely COMPLETELY on stats, as there is more to be said than just the raw numbers. Injuries happen, there are differences in age, platoons that allow for only favorable matchups, ability to play multiple/difficult positions, etc. But when it comes down to it, after looking at those other factors, stats will make the picture much clearer, or those factors will make the picture much clearer after looking at the stats. Look at Rick Helling, dude has one of the better ERA's on the team, so why aren't people getting excited about those numbers like they are about Eveland's, which aren't as good? Well, Helling is 35 versus 21 and has had injury problems, and thus doesn't factor into our long-term plans as he isn't expected to improve like Eveland. So there will always be a human component to compliment or clarify the stats.

 

The thing that bothers me about this entire argument is when people point to the exceptions and say "See! Stats are totally unreliable".

 

The point of statistical analysis isn't to predict everything with absolute certainty. It's too look at trends and say "what is the most likely thing to happen".

Yeah, and stats, as a predictor can be somewhat shaky because they don't account for the things like injury, maturation, and declines. There are always unforeseen extraneous variables like the smokers getting hit by buses. They do, however, serve much better as a descriptor in looking back upon peformances and analyzing them. Hardy, for example, is easy to look back on and view the progression from bad to very solid numbers as he either became comfortable or recovered from injury, or both, while stats didn't accurately predict his performance this year because of those factors. They were, however, much more accurate once he appeared to be comfortable and healthy like the numbers he previously produced which led up to that prediction. Point is, it is very rare that we have a large enough sample to predict what will happen when an extraneous variable pops up. For example, if a guy was always a strict platoon player, it will be very hard to extrapolate those numbers to predict procuction in an everyday role. And if a guy is injured, it is next to impossible to accurately predict what will happen since you don't exactly have a mountain of data showing how he'll play with a right labrum that is at 79.8% of its full strength.

Link to comment
Share on other sites

If however, 56% of the pollers sampled liked grape jelly, than you can safely conclude that grape jelly is, in fact, more popular (shame on them).

 

No kidding, everyone knows strawberry jelly is the best! Morons! http://forum.brewerfan.net/images/smilies/laugh.gif

 

I understand what you're saying Russ. Thanks for the clarification.

 

Hall was pushed, Scar was not. That's the major difference in my mind between their numbers.

 

Scar: Rookie League in 1999 Indy by 2002.

Hall: Rookie League in 1998 Indy by 2002.

 

If anything, Scar was pushed equally if not more than Hall was. And the numbers for each are identical.

Link to comment
Share on other sites

Brewer Fanatic Contributor
Don't forget that Hall was drafted out of HS and Scarborough played at Texas A&M. That, in all likelyhood, make Steve more ready to advance more quickly thru the system, as he was not only older but had played baseball at a higher level than Hall.

Chris

-----

"I guess underrated pitchers with bad goatees are the new market inefficiency." -- SRB

Link to comment
Share on other sites

you did bring up another interesting point though. are things like recovery from injury and time off/rustiness at all quantifiable? you posted that the projections for hardy could have been altered to take these things into account. what would be the best way to go about doing this? are there actual formulas that have been developed based on type of injury or time away? considering how different each player's mental and physical makeup is, and how different each individual injury can be, it seems like those are things that would be nearly impossible to quantify. particularly if that player has never suffered a major injury before, and there is no past information to reference.

 

I wish I actually knew enough to give a good answer to this very good question; all I can do is take a general stab at it. When you're trying to use deductive reasoning, which is all statistical analysis really does, you figure out what information is available and how salient it is to the problem at hand. Can we find other players who have missed substantial time and then come back? If so, that's helpful, and we can use that info; but, as you point out, it still leaves a lot of variables out there. Maybe we can actually find a bunch of truly similar cases -- guys who have missed a year right before their rookie season. If not, maybe we can refine the numbers with other "disruptive" rookie experiences, like guys who were pushed to the majors too quickly.

 

Like I said, I'm just engaging in general guesswork here. I suspect that, even if a more knowledgeable person could come up with a specific model for incorporating something like JJ's injury rust into the projection, that person would have to acknowledge that the incorporation added a larger measure of likely error to the quantitative exercise. Still, my sense is that a general adjustment of this sort would be possible. Maybe I'm wrong; maybe a better approach would be to take BP's PECOTA projection as a good assessment of JJ's "established level of ability" and then have a qualitative, rather than quantitative, discussion of caveats -- remember that he was out for a year, so we should expect him to underperform his established level of ability, but who knows by how much. That would just be a different way of following the same basic thought process.

 

Greg.

Link to comment
Share on other sites

Don't forget that Hall was drafted out of HS and Scarborough played at Texas A&M. That, in all likelyhood, make Steve more ready to advance more quickly thru the system, as he was not only older but had played baseball at a higher level than Hall.

 

But they were both rushed. Neither guys stats warranted promotions besides both of their seasons in High Desert. And they both were promoted after one year at each level no matter if they succeeded or not...obviously by looking at their numbers.

Link to comment
Share on other sites

[ But they were both rushed. Neither guys stats warranted promotions besides both of their seasons in High Desert. And they both were promoted after one year at each level no matter if they succeeded or not...obviously by looking at their numbers. ]

 

Age makes a big difference when you're evaluating minor league talent and what level they should be at.

Link to comment
Share on other sites

Age makes a big difference when you're evaluating minor league talent and what level they should be at.

 

They are only a year apart and neither of them have deserved to be at the levels they were at throughout their careers.

 

Both of them had one good minor league year...at High Desert of course.

Link to comment
Share on other sites

They are 2 years apart. Hall was 18 when he started rookie ball in '98, Scar 21 when he started rookie ball in '99. Hall was 22 when he started AAA in '02', Scar 24 the same year and he didn't get to AAA until July it appears.
Link to comment
Share on other sites

Both of them had one good minor league year...at High Desert of course.

 

One of my least favorite players, Scar certainly earned his promotion from Huntsville to Indy. He had an amazing first half season here. Then he was promoted to Indy and then the wheels came off. He didn't hit at all and even after he was sent back down to Huntsville later that year he didn't recover his form

Despite his slump after his return, he still put up an 820 OPS in AA, more than reasonable for a 2B/SS as he was then. Weeks put up 742 and still got his promotion to AAA - just for comparison.

Link to comment
Share on other sites

[ Are we talking about talent or athleticism? There is a difference. ]

 

If you look back at my message in its entirety, I tried to make it obvious that I was changing the topic because I didn't feel like arguing the difference between minor league stats between two players to the teeth based on age differences and prior baseball experience.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

The Twins Daily Caretaker Fund
The Brewer Fanatic Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Brewers community on the internet. Included with caretaking is ad-free browsing of Brewer Fanatic.

×
×
  • Create New...