Jump to content
Brewer Fanatic

Small sample alert


Hammer
How many ab's are required to drop the "small sample" tag? 50? 75? 100? I've noticed some trends from the Brewers hitters this year and would like to comment on them but don't want the "small sample size police" to arrest me...
@BrewCrewCritic on Twitter "Racing Sausages" - "Huh?"
Link to comment
Share on other sites

Recommended Posts

Comment away. No matter what your opinion is, on any topic, someone is going to disagree with you vehemently. That's part of the board. I, for one, have never cared what others think about my opinions.
Link to comment
Share on other sites

It depends on what you're trying to determine with the sample. :)

 

If you're trying to determine a player's value for contract purposes, you need hundreds of ABs, maybe 1500 or more. If you're deciding that your well established cleanup hitter isn't performing and should temporarily be dropped to seventh in the order, maybe a couple hundred ABs will do.

 

Often, you can simply enlarge the sample. Look at the small current sample and relate it to the one that's more meaningful, e.g. the last year or last three years or whatever. If you expect an .850 OPS from someone and he's hanging around .680 for a month, there's probably not much reason for concern.

 

The thing to avoid is looking at the small sample in isolation. For instance, nobody's going to recommend making George Kottaras the cleanup hitter based on this season's ABs.

That’s the only thing Chicago’s good for: to tell people where Wisconsin is.

[align=right]-- Sigmund Snopek[/align]

Link to comment
Share on other sites

I don't think there is a number. for example we all knew Prince started the season slow. I don't think many would argue that he should have been dropped 3 weeks into the season because he started slow because eventually he was going to move out of it.

 

I drafted Mark Reynolds relatively high and he has really struggled. About 5% of espn owners have already dropped him. I think its way to early to freak out and I wouldn't be too concerned until June.

Link to comment
Share on other sites

There's really no hard and fast rule. Based on his track record everyone thought Adam Dunn would eventually hit but last year he never did and he recorded one of the worst seasons in MLB history. Had he not had an extensive record, he would not have continued to play.

 

The old bromide that a player will perform to the numbers on the back of his bubble gum card over the course of a season usually works but not always.

 

I think most rookies and first time starters though have to do something in their first 150 or so plate appearances to justify staying at the big league level. Some, like Hardy in his rookie year, were given more time.

Link to comment
Share on other sites

In the spirit of early-season, small sample analysis:

 

Here are some Brewers team stats for 2012.

 

Innings 1 through 6 - Offense

 

.215 / .276 / .391 = .667 OPS in 279 AB (#14 in NL for OPS)

 

7th Inning or Later

 

.264 / .344 / .458 = .802 OPS in 144 AB (# 1 in NL for OPS)

 

Overall

 

.232 / .299 / .414 = .713 OPS in 423 AB (# 7 in NL for OPS)

 

Overall Team Pitching

 

5.05 ERA in 114 IP (Last in NL for ERA)

 

1.46 WHIP (#15 in NL for WHIP)

Link to comment
Share on other sites

This may help

 

Even a full season isn't enough in many cases. You may have to slightly adjust up or down a bit based on the current season and age but you probably want 1500ish PA to have a good idea what that starting point should be. I would guess a month or 2 for the stats to show a difference in approach.

Fan is short for fanatic.

I blame Wang.

Link to comment
Share on other sites

Put it this way: I'm not exactly worried that Ryan Braun is hitting .261 in his first 46 at-bats. Instead, looking at his Large Sample career stats, I know a Hot Streak is imminent
The David Stearns era: Controllable Young Talent. Watch the Jedi work his magic!
Link to comment
Share on other sites

Put it this way: I'm not exactly worried that Ryan Braun is hitting .261 in his first 46 at-bats. Instead, looking at his Large Sample career stats, I know a Hot Streak is imminent

 

People thought a hot streak was imminent for Andruw Jones in 2007/08, but it never came.

 

I know it's a pretty safe bet to say that Braun will hit better than .261 or that Grienke will have an ERA below 5.00 by the time the season is over, but I have watched baseball (and sports in general) to know not to assume anything. It's kind of like investments. If you pick all of your investements based solely on past performance, you are probably going to have some dissapointing returns.

User in-game thread post in 1st inning of 3rd game of the 2022 season: "This team stinks"

Link to comment
Share on other sites

Pittsburgh is on a pace to score 351 runs this season which I think would be the lowest of all time, and yet for some reason the sample size doesn't bother me in that case. :tongue

 

They're also on pace to finish third in the division. If they improbably pulled off both feats it would, in many ways, be one of the most remarkable seasons in baseball history.

Link to comment
Share on other sites

When I put the post up I was particularly focusing on the amazingly high k rates for a couple of of everyday players. While the season is young, I was looking specifically at Weeks having a 36% strikeout rate over his first 50 at bats when he is at around 26.5% for his career. While the whole "a k isn't necessarily a bad thing" argument is an entirely different can of worms it just feels like he is striking out at an amazingly high rate. I'll revisit the topic in another month and a half when he is getting closer to the 200 ab range and see how wrong I was the wonder about this.
@BrewCrewCritic on Twitter "Racing Sausages" - "Huh?"
Link to comment
Share on other sites

You should generally expect regression to the mean rather than a hot or cold streak to "make up for" small samples that seem to stick out.

That’s the only thing Chicago’s good for: to tell people where Wisconsin is.

[align=right]-- Sigmund Snopek[/align]

Link to comment
Share on other sites

When I put the post up I was particularly focusing on the amazingly high k rates for a couple of of everyday players. While the season is young, I was looking specifically at Weeks having a 36% strikeout rate over his first 50 at bats when he is at around 26.5% for his career. While the whole "a k isn't necessarily a bad thing" argument is an entirely different can of worms it just feels like he is striking out at an amazingly high rate. I'll revisit the topic in another month and a half when he is getting closer to the 200 ab range and see how wrong I was the wonder about this.

 

 

I posted in one of the in-game threads this year how impatient everyone looks at the plate. I don't think the Brewers, in general, have been known to be exactly selective at the plate over the past 3 or 4 years (sorry, don't have numbers to back this up), but it just really seems to stand out to me in this young season.

User in-game thread post in 1st inning of 3rd game of the 2022 season: "This team stinks"

Link to comment
Share on other sites

How many ab's are required to drop the "small sample" tag? 50? 75? 100? I've noticed some trends from the Brewers hitters this year and would like to comment on them but don't want the "small sample size police" to arrest me...

 

If you are referring to scouting-based observations (e.g. Is player so-in-so taking a different approach at the plate?), comment away. If you are referring to strictly statistical trends, it's simply a matter of how confident you are that the trends you see represent reality. If we want to use a player or team's performance over a period of time to estimate expected future performance, the sample size plays a huge role in knowing how confident we can be in that estimate. Any statistical estimate should really include confidence intervals. Using the simplest example, if I have a special coin that flips heads a constant percentage of the time between 0-100%, we can estimate the true percentage of heads based on flipping that coin. The more times we flip it, the better the average head flip percentage represents the actual percentage.

 

In the baseball world, it's a lot more complicated of course. We have to deal with many other variables and we also have to consider regression to the mean. There are articles that have been written on this topic that explain all this much better than I could but if you want to know how relevant this year's stats are to a players expected future performance, I have a trick for you. For a particular player, just compare his preseason projection to an updated projection (includes his performance so far this year). Fangraphs has the ZiPS preseason projections as well as a rest-of-season (ROS) projection. here's an example:

 

Armis Ramirez:

ZiPS preseason: .278/.340/.476

2012: .174/.235/.283

ZiPS ROS: .271/.333/.465

 

http://www.fangraphs.com/statss.aspx?playerid=1002&position=3B

 

That is not an insignificant difference but it's not exactly predicting a terrible season going forward, either.

Link to comment
Share on other sites

I remember this article from fangraphs a couple years ago (the same one Logan mentioned). The statistics that went into these numbers is way over my head...but if you are curious, they are here. I had to find them in an archive of the website. I see in the comments that they have a similar one for pitching too. The results of both of these studies are below.

For Hitters

For Pitchers

 

Batters

50 PA: Swing %

100 PA: Contact Rate

150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA

200 PA: Walk Rate, Groundball Rate, GB/FB

250 PA: Flyball Rate

300 PA: Home Run Rate, HR/FB

500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate

550 PA: ISO

 

Pitchers

50 BF: nothing

100 BF: nothing

150 BF: K/PA, grounder rate, line drive rate

200 BF: flyball rate, GB/FB

250 BF: nothing

300 BF: nothing

350 BF: nothing

400 BF: nothing

450 BF: nothing

500 BF: K/BB, pop up rate

550 BF: BB/PA

600 BF: nothing

650 BF: nothing

700 BF: nothing

750 BF: nothing

 

I guess it's not that surprising that pitching takes larger samples to become reliable. I do find it amazing that a stat like BA and BABIP which a lot of people use to evaluate players don't actually stabilize in a season.

Link to comment
Share on other sites

It really depends on the individual player/situation. McGehee is a great example. Some people kept believing all year Casey would "come around" and get close to his historical mean. The problem with that is even 1 1/2 seasons+ wasn't really enough to make his numbers valid.

 

Then you have guys who have pretty drastic swings from year to year. You can average those years out, but that really won't be very accurate. Because history has shown he'll either do better or worse than the average stats.

 

So, the more history you have and how consistent that history has been determines how small a sample size you need to make any sort of conclusions. For example: Ramirez and Braun you can almost guarantee their OPS will go up. Aoki, almost impossible to tell.

Link to comment
Share on other sites

Good stuff. It should be noted that those number are somewhat narrowly useful in that they are premised on the idea of sample sizes needed for comparing the player to himself. Now quite often that's exactly what you want to be doing, but it also leaves open the very important questions of variance that due to the player himself (age, health and a ton of other factors).
Link to comment
Share on other sites

 

I guess it's not that surprising that pitching takes larger samples to become reliable. I do find it amazing that a stat like BA and BABIP which a lot of people use to evaluate players don't actually stabilize in a season.

 

The fact that BABIP doesn't stabilize in a season is how people use it to evaluate a player. Zack Greinke's career BABIP is about .300 (pretty much standard) but last season it was .320 (and much higher than that until the last 6 weeks or so)...so people utilize BABIP to say last season Greinke could have been unlucky and was likely a better pitcher than his results (ERA and WHIP) were showing. Or for a hitter, Mark Teixeira had one of his worst seasons last year. he has a career BABIP of around .300 but last season it was like .240 so that can be used to show that he was unlucky and is likely a better hitter than his results showed and could bounce back this year. Or the inverse for a guy like Emilio Bonifacio who had a good season last year but was likely bouyed by a BABIP around .375.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

The Twins Daily Caretaker Fund
The Brewer Fanatic Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Brewers community on the internet. Included with caretaking is ad-free browsing of Brewer Fanatic.

×
×
  • Create New...