Jump to content
Brewer Fanatic

Baseball stats for dummies part 1: OPS


Yeah, you can generally find me over on BN (who are you?). Hopefully our conversations over there have been pleasant. http://forum.brewerfan.net/images/smilies/smile.gif

 

I understand the stats frustration. It's not for everyone. I guess it makes it all the more sick that stats is more of a hobby for me then, huh? I've taken a few PhD stats courses and have taken my share of lumps at conferences. I discovered that I hated the PhD, but loved the stats.

Link to comment
Share on other sites

I'm the one who reminds badgermaniac that the Yankees don't hit more HRs just because they think Bill James was on to something when he said HRs are the way to win games....Every team would love more homeruns, the Yankees just have more talent. Agrostis

 

Since you love stats so much, I'm sending you a couple years of data to play with. http://forum.brewerfan.net/images/smilies/wink.gif It's fun, right?

Link to comment
Share on other sites

I tried my hand at this in THIS thread and picciolo and end warned me of my mistakes then. After doing my best to learn some things abut statistics I've learned that I have a lot more to learn. http://forum.brewerfan.net/images/smilies/smile.gif The fact that OBP and SLG aren't independent probably makes the excercise doomed at the start. For what it's worth, this was the equation:

 

Runs = (1444 x OBP) + (898 x SLG) - 443

 

I now understand making a blanket statement like, "OBP is more important than SLG" and supporting it by comparing ratios is clearly wrong. Would it be incorrect to say it like this, however:

 

In the current environment, an increase in OBP of one point has the same effect on a team's runs scored as an increase in SLG of 1.6 points.

 

According to the equation, this is a fact:

 OBP SLG Runs 0.300 0.450 394.3 0.310 0.450 408.7 0.300 0.466 408.7

Provided that we relate the ratio back to a "per unit" of the variable, it seems to keep us out of trouble, no?

 

Since I did that mini-study, I've read a million studies taking different approaches to compare the relative value of OBP and SLG. One study took the very straight forward approach of using linear weights to estimate what a team with a particular kind of OBP and SLG would score (think it was in the last "By The Numbers"). They would add or subtract various baseball events until they would get the OBP and SLG they wanted, then look at the estimated sun scored. It came up with a 1.3 OBP/SLG ratio, if I recall correctly.

Link to comment
Share on other sites

Quote:
In the current environment, an increase in OBP of one point has the same effect on a team's runs scored as an increase in SLG of 1.6 points.

No. Again, the scale effect, among other things. When interpretting a linear regression, we say that "holding all else constant, a one point increase in X increases Y by ___."

 

The "all else constant" phrase means variables in the equation and other variables not in the equation. Going back to my earlier example, someone who adds a pound of weight will be related to a smaller increase in homeruns holding balls in play constant, than a one point change in balls in play holding weight constant because of the sheer size of weight relative to a percentage.

 

Another reason we can't make the statement is multicollinearity. As you know, OBP and SLG will be highly correlated with each other. The goal in designing a model is to have ind. variables that are highly correlated with the Y variable, not each other. Multicollinearity biases the coefficient estimates. As an example, I would be willing to bet any amount of money that I could do a study that shows that smoking decreases the chances of developing lung cancer. How? I'd just add a bunch of variables that are related to smoking, such as:

 

Number of lighters owned

Cases of Milwaukee's Best Light drank per week

Number of bowling and dart leagues they belong to

Social status

Lives in a trailer park (Y or N)

Parents smoke (Y or N)

Number of trips to the local smoke shop

etc.

 

Now if I add a variable such as "number of cigs smoked per day" to this equation...I have no idea what will show up. It would be just as likely that smoking would show as as negatively related to cancer as positively related to cancer because the coefficients will all be screwed up.

 

I'm not familiar with the other studies you mention. I'd have to look at those sometime. Off the top of my head, the only way to compare OPS and SLG would be an F-test which would tell you if one variable is significantly different from the other in terms of explanatory power. It wouldn't give relative weights though. In statistics, even if we have two variables with different coefficients (assuming the exact same scale & no multicollinearity), it doesn't necessarily mean that they are statistically different. An F-test would show whether OBP is statistically different from SLG in explaining run production. I have to work on updating my SAS license, or I'd run the test right now.

 

As a side note, in multiple regression, the only way to compare two independent variable coefficients is if they are of the same scale, there is zero multicollinearity, and the model has an R-squared of 1.

Link to comment
Share on other sites

Another reason we can't make the statement is multicollinearity.

 

I conceded as much when I wrote, "The fact that OBP and SLG aren't independent probably makes the exercise doomed at the start." I appreciate that because OBP and SLG are very dependant of each other (sharing the BA component) you can't use multilinear regression on them at all. The rest of my post was meant to be more of a "what if they were independent" exercise.

 

Someone that weighs 200 lbs and puts 50% in play still works out to 32 HR's. But now, each percentage point increase is only worth 1.6 times a pound of weight. All I did was change the scale. This is why you can't use coefficients as relative weights. It's apples & oranges.

 

Now this is where you lose me. It's worth 160 times BIP if it's in the form of "0.xx" and 1.6 times BIP if in the form "xx.0". If you relate it back to the units, both are a true statement.

 

Now, if one makes the illogical jump to suggest one variable influences the equation more or is more "imporant" than the other irrespective of each variables scale, they are missing the boat. Scale does matter then. If player weight varies from 1,500 - 10,000 lbs, the BIP contribution to HR would be miniscule. You wouldn't recognize that without looking at scale.

Link to comment
Share on other sites

If you do decide try something, I have SAS on my computer...boss said it was mandatory. Be nice to finally use it for something fun.

 

I'm not exactly a master sas-code man, but If you get it pretty close, I can usually look through the log and tinker with the errors to make it work.

 

So if you do want to run something, I'd be more than happy to try and run it....pretty sure I can get the output in HTML so it could probably be easily posted somewhere....not sure if BF.net is a possibility.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

The Twins Daily Caretaker Fund
The Brewer Fanatic Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Brewers community on the internet. Included with caretaking is ad-free browsing of Brewer Fanatic.

×
×
  • Create New...