Skip to main content

What Do You Mean by Average?

Describing something as average is typically innocuous.
But sometimes it can be deceiving. Take a look at the definition of average and you'll see a typical amount, or common, ordinary. When it comes to numbers and data it is also tied to what is called the arithmetic mean (sometimes just mean for short) -- which is adding up a bunch of values and dividing by the number of values you have. Key questions here: When are these ideas in alignment and when does it fail?

To help get to the bottom of this, at least from a statistical point of view, is to first talk about resistant statistics -- which are summarizations of data that are not highly influenced by individual values. Let's also have a quick reminder of the median or value that splits the data set into an upper and lower half when the data is ordered.The mean of a data set is not resistant to extreme values, while the median is, and we'll look into why this is the case and what this has to do with fantasy football and sports in general. Take a look at the table below for what you'd see in the box score Lamar Miller in Week 8, and Melvin Gordon in Week 6, each of this past season. 
Player Week Rushing Yards Carries Yards per Carry (YPC)
Melvin Gordon 6 132 18 7.33
Lamar Miller 8 133 18 7.39
That table makes it seem like each player had pretty much exactly the same game, and in terms of fantasy points from rushing, that is true. We see that Miller averaged 7.39 yards per carry and Gordon averaged 7.33. Back to how we described average at the beginning, as the typical amount. Do each of these YPC values represent the typical amount for each player's respective game? Spoiler alert: Nope. Take a look at the distribution of each player's yards over their 18 carries each. Each player's mean and median are also on the plot.

What seems to be the difference between each Miller's and Gordan's yardage? As noted, each player has the essentially the same mean, but Gordon's median is much higher. We also see Miller had one long run of 58 yards. That is the reason we also see his YPC is so high compared to his median and the same as Gordon's, showing that the mean (YPC) is not resistant to extreme values when compared to the rest of the data. Interpreting the median is straight forward for each player: Half the time Gordon carried the ball for at least 7 yards, while half of the time Miller carried for under 3.

So why does this matter? You may say "hey, I'll take that 58 yard gain. So what's the big deal?" Sure, I would take a 50+ yard rush as well. However, context is important. In this case, what value of Lamar Miller's day rushing would you say is more representative of the typical amount, 7.39 yards (the mean, YPC), or 2.5 (the median)? Talking about their rushing production through that lens makes things seem a lot different than looking at box scores and YPC. Let's say each running back is in the same 3rd and 3 situation. We just concluded that half of the time Miller rushed, he was held under 3 yards. What about the percentage of Gordon's carries under 3 yards that game? That was 16.7%, meaning the other 83% of his carries went for 3 or more yards. We often hear announcers and fans say things like "He was averaging 7 yards per carry, why wouldn't they run it!" Based on the median for Lamar Miller's week 8, it may be clear that's not the best choice. Obviously there are a bunch of circumstances when play-callers make their choices, but the main point is relying on just one single summarizing statistic -- in this case the mean -- is often misleading.

"How misleading can it be, Jerome?" is probably what you're screaming at this moment. "I want to test is out!", you exclaim. Luckily for you I made another interactive visual that can help show how resistant the median is compared to the mean and how sample size has an affect on this. Below you can set a sample size for the number of carries you want to test, then click the button below that to generate a random sample of yardages. The sample is drawn from every running play of the 2018 regular season. Once that pops up, just click on the plot to add a new point of that yardage to the data and see how that changes both the mean and the median. The slider will help zoom in and out if you want, and the "Clear Added Points" will remove what you added (Duh). The default number of carries is set at 20, which can be seen to represent one game. Add a 90 yard carry to see how much that changes the mean and median, respectively. Change the number of Rush Attempts to something like 250 to show represent a season.

Hopefully playing around with this has shed some light on to how vulnerable YPC, and any mean, is to extreme values. Now is median always better than the mean? Not necessarily. Keep in mind what each measures and what information is really needed to calculate it. To find the mean, all you need is the total and the number of values. Adding one point will probably change the mean. To find the median, you need the whole data set though adding a value might not change the median. The conclusion here is that it's better to know both of these instead of just one. So next time someone is trying to make a big point suing a player's YPC, Points Per Game, or any "something-per-something", ask if they know the median value. When they reply no, say that since the mean isn't a resistant statistic and is heavily influenced by extreme values you can't buy what they are selling.

As always, let me know what you think by email or twitter. If the interactive plot isn't working well on your device try getting to it directly using the link below. Hope you enjoyed this venture into statistics! And if I just tricked you into learning more about stats, then sorry, not sorry.

Popular posts from this blog

The Mysterious NFL Passer Rating

So WTF is the NFL's passer rating, really? It's one of those stats announcers and talking heads on TV like to use to make a point when it's mentioned only when it's very high or very low. I knew  only a few things  (and I think is the case for most people)  about passer ratings: Greater values are "better", it ranges from 0 to 158.3, and sometimes a quarterback would be better off throwing all their attempts into the ground, or so the announcer would say.  This prompted several questions. How is this rating calculated? Are greater values always better? Why is the max 158.3? And what is with that weird scenario I hear about where a QB would be better off with all incompletions?  I'll hit on a few of the key concepts and background of passer ratings, but if you want to read more on this the NFL has goes into more detail here . Quick Background of Passer Ratings The point of this passer rating system is to compare passers' performances from season t

Combine-ing Football and Physics

There are many ways football analysts can describe a player.  He's a high energy player...hits with a lot of force...has exceptional power…puts in the work. For the most part we all understand what is meant when these types of words are used to describe some physical attribute of a player. However, in the world of physics each of the words I used above has a specific meaning and, you guessed it, a formula. This is going to dive into what is sometimes referred to as Newtonian (or Classical) Mechanics , which looks at how forces applied to objects create and influence motion. This is, of course, named after Sir Isaac Newton , the founder of gravity and partially credited with inventing calculus (I'm team Leibniz , but that's maybe for a later time). You may have heard of Newton's Three Laws of Motion - that's what we're going to explore here and relate the mechanics of motion to the NFL combine.  Newton discovering gravity by getting donked in the head by an

Hitting the NBA Jackpot

Lotteries are typically really tough to win. Powerball, Mega Millions, and even state lotteries are all damn near impossible claim the jackpot. Even winning anything is pretty unlikely. Getting a Pick 5 (numbers 0-9) exactly right is 1 in 100,000. Getting luck enough to draft the next phenom like LeBron, Anthony Davis, or perhaps Zion Williamson? That's much, much easier than getting a few digits in the right order. NBA Draft System and Protected Picks A quick refresher: The NBA determines the selection order of the first 14 (of 30) teams for an upcoming draft of new players by a lottery system. These are teams that didn't make the playoffs. The rest of the draft order is determined by inverse order of regular season record. The NHL also has a lottery system for non-playoff teams but differs for teams in the post season where order is determined partially by playoff performance and regular season point total. In the NFL, non-playoff teams are ordered purely by regular sea