Welcome back to yet another blog analyis piece!
This post kicks off the sticky stat series. I am going to attempt to find sticky stats at each position group.
Sticky stats are stats that are consistent and predictable from year to year. So, we want to find a positive correlation - if one stat increases, so does the other - between stats year over year. A positive correlation suggests that a good performance in one year, is likely to be followed by a good performance in the following year.
As a rule of thumb, a strong correlation is between 0.60 and 0.80, and a very strong one is from 0.80 to 1.0. So, we'll be on the lookout for correlation values from year to year values of greater than 0.60.
We'll dive in by taking a look at the correlation between passing stats, year-to-year, from 2012 to 2022. The stats we'll be starting off with are: passing yards, passing attempts, total epa (see definition below), passing touchdowns, and completions.
We'll start by taking a look at every pass that occurred during that decade time span.
Note: EPA stands for Expected Points Added. This is a commonly used advanced statistic in football. In short, this stat measures how well a team or player performs on a play-by-play basis. The more likely a play is to lead to a score, the higher the EPA value. So, if a player catches an 80 yard bomb to set up a touchdown drive, but does not actually score the touchdown themselves, the EPA for that play would still be high, as that catch played a significant part in the touchdown, despite the lack of a touchdown. EPA is this idea that the touchdown is not worth all the marbles of a drive, and each play in a drive has some value towards to a score.
The above heat map shows the correlation stats between the previous season statistics and the current season statistics during the decade we are examining.
So, each of the correlation values in each square of the heat map come from comparing the 2013 value of the stat to the 2012 of the stat, the 2014 value of the stat to the 2013 value of the stat, and so on, until our final comparison of the data from 2022 to 2021.
This is a pretty encouraging start. We see that there generally is a strong correlation across the board, not only on a stat compared to its value the previous year, but for each stat compared to the each of the other stats in the previous year.
Another thing I would like to point out here: while passing touchdowns are techinally not as sticky as passing yards (0.74 vs. 0.79, respectively), they're almost equally sticky.
This is a little strange... aren't touchdowns supposed to be more unreliable than yards? What's going on here?
Well, if we plot the data, we see that using all pass attempts from 2012 to 2022 yields graphs like this:
And this:
So while you can see a positive correlation between stats, there do seem to be a lot of outliers. We don't want trick plays, injuries, etc. to muddle our data and lead us to innocent conclusions.
So, while on a general look, passing stats are sticky, we must dig futher to find something closer to the truth!
To try and eliminate potential noise or data that can mislead our takeaways, I filtered away players from the data set above that were non-starters. So, the data below - correlations and graphs - will be on only starters from 2012 to 2022.
I want to make it clear as to what I define a starter to be: a player who has played at least 12 games and has had 1500 yards in consecutive seasons. I used this definition as a proxy to avoid massive jumps in the data that were due to injury and jouryman players catching fire, as I believe both of these cases would add noise to the data set.
This means the data will be missing out on quarterbacks who sat and barely played their rookie season. Additionally, it will phase out quarterbacks with massive declines. However, I think there's an argument that can be made that these cases should roughly cancel, so including them would blurry the data a bit. Furthermore, they could be seen as outliers - as a rookie season and sharply declined quarterback season are not indicative of a player's career.
Without futher ado, the correlation heat map for starters:
Immediately, we see that all of the stats are far less correlated to their previous iteration. There still is some moderate correlation - which is defined as a correlation value between 0.40 and 0.60 - and even hints of strong correlation for some stats.
We can see that completions (0.63), passing yards (0.57), and passing attempts (0.58) are all still moderately sticky. So, NFL starters still exhibit my expectation from the data, albeit to a weaker degree than what I was anticipating.
Moreover, we can see that passing touchdowns are far harder to replicate year-to-year, with a correlation of 0.36. This does confirm the belief that passing touchdowns are a more finicky number for quarterbacks to repeat.
Out of curiousity, let's take a look at what some of the graphs look like of the stats from year-to-year, starting with our strongest correlation stat, completions:
This actually looks pretty linear, with the outlier of Kaepernick at the bottom left of the graph messing things up a bit. I would consider him an outlier, as while he techinally hit the threshold of a starter in 2012, he only started 7 of the 13 games he played in.
Removing him from the data set yields the following new completion graph:
This seems to be a better faith representation of what we call starters. It's possible I'm still missing filtering out certain situations, but that data point seemed very off, and after further inspection, it should not have been included.
That being said, this doesn't change much from a correlation standpoint:
If anything, our moderate-to-strong correlations got minimally weaker and our weak correlations got minimally stronger. Additionally, the belief that yardage is more reliable than touchdowns also continues to stand.
In fact, completions, passing attempts, and yardage all correlate pretty well with each other and are all sticky together, which when you think about it makes sense. You need to attempts to get completions, and more attempts should lead to more catches. Catches lead to passing yards, so more attempts, leads to more completions which leads to more yards.
Unfortunately, EPA is also not very sticky and it also doesn't correlate well with the other stats. This is a shame, as EPA is a good proxy for fantasy points in a season, so having a way to know if EPA will be high would likely mean knowing, relatively reliably, if fantasy points will be high.
For a better visualization of one of the weaker correlations, here is what the correlation looks now like for EPA:
Also touchdowns:
As indicated by the correlation heat map, there is a slightly positive relationship in both cases, but it is rather weak.
At this point, I was messing around with the data to see if I could find anything interesting or valuable.
I happened to notice a quite the difference in the strength of correlation between old starters and young starters. I define old as players who were 30 in the previous season, and young as players who were 30 in the current season. The contrast surprised me.
The correlation heat map for our savy vets:
And the correlation heat map for our young bucks:
That's a pretty staggering difference for what we called sticky stats for starters (passing yards, completions and attempts). All of the old quarterback correlations for these stats are, at best, weak-to-moderately strong. On the other hand, all of theses stats correlate moderately-to-strongly for young quarterbacks.
This is a pretty neat trend, as it shows a pattern that quarterbacks consistently improve their numbers year over year, up until about 30.
It probably would be a bit too strong to say that the best time to cash in for a quarterback would be around 30, based off of just this data, but this hints at potential signs of decay for the position coming sooner than one may have expected.
Some final findings to conclude this piece, that you may find interesting as well.
The heat map below shows the year-over-year correlation values for the average time to throw, average intended air yards, average intended air yards (a PFF metric for how aggressive a quarterback is), average expected completion percentage, average air yard distance, and season total fantasy points.
Note: these stats are from 2016-2022, as the data set I am using for these somewhat advanced stats does not go back further than 2016.
Please forgive the avg and mean redundancy in the heat map below, as it came from a Python data cleaning hack:
I choose these stats amongst starters as they are the remaining ones that I found to have at least moderate-to-strong correlations from year-to-year.
Some observations and ideas to ponder:
I was most surprised by the average time to throw increasing.
Here's the graph of that stat to better visualize it:
That's all for this one, here are the quick takeaways, as per usual:
Thanks for tuning into this piece!
Cheers,
Alex