In the following analysis I look into how sticky rushing stats are, primarily focusing on season totals.
The optimistic goal is to find stats that serve as strong and consistent predictors for rushing success, for both quarterbacks and running backs. Practically speaking, I shall be surveying the landscape of rushing numbers in hopes of discovering patterns (or lack thereof).
This investigation is a continuation upon research into how replicable (fantasy relevant) football stats are year-over-year. In order to grasp the bigger picture better, I encourage you to checkout part one and part two, each an examination of quarterback related stats.
One last note, before diving in, the data to make the correlation heat maps and scatter plots is from 2012-2022, unless stated otherwise.
To get a sense of what the data looks like, we start without any restrictions on the data set.
This yields the following correlation heat map:
From an unfiltered, general perspective, there is less to rely upon with rushing numbers compared to passing stats.
That being said, there still is some strong year-to-year correlation with rushing yards (0.74), carries (0.76), and fantasy points (0.74). Touchdowns are far less reliable, with a year-over-year correlation of 0.60. This matches the expectation: rushing touchdowns are a rather fluky stat.
To get a better sense of the data we are working with, the following scatter plot shows the strongest year-to-year correlation from the previous heat map - carries vs. carries_last:
There is a lot of noise towards the bottom left of the graph. Most of those data points offer little value, in terms of helping us find stats to serve as strong predictors of fantasy performance.
We will need to apply a filter to obtain information more relevant to fantasy production. However, it is nice to know that, generally speaking, rushing numbers are not totally random - in fact, the fantasy relevant ones, are somewhat-to-very (positively) predictable.
Our first filter will be to look exclusively at the running back position.
The following heat map displays the season-to-season correlation numbers:
While correlations from year-to-year are definitely weaker, there is more cross-correlation amongs stats. To me, this shows that running back numbers across the board are moderately predictable, and having just one (outisde of yards-per-carry) can paint the majority of the picture.
I want to highlight that rushing touchdowns, year-over-year, have a correlation of 0.53. Not random, but also not a strong predictor either. This is far worse from the general data set for passing touchdowns (the correlation was 0.74). The numbers continue to reflect the story line I have been told.
Before applying another filter, here is the scater plot of the strongest correlation from the heat map - carries vs. carries_last:
We still have a lot of noise for trying to decipher the data for fantasy relevant players. That being said, I want to repeat that it is a good thing that the general numbers are not nonsensical.
The next filter we apply is a proxy for starting running backs, or at least backs in a committee. Backs in the following analysis must have had at least 100 carries in back to back seasons:
Well... there goes any sense of predictablility.
To see what we're working, this is the scatter plot of the strongest correlation from that heat map - carries vs. carries_last:
The data is definitely all over the place. Not ideal in regards to finding a pattern amongst starting or fantasy relevant backs.
What happens if we strengthen our filter to 200 carries in consecutive years?
The bellcow heat map ends up looking like so:
Outside of touchdowns, pretty much every correlation across the board is worse. I guess there's a little safety in knowing that your bellcow back probably won't regress much when it comes to scoring touchdowns? The sample size is 77 players too, so I would not discount these results.
Before moving on to quarterbacks, I want to leave off on this:
The bar graph above shows the number of bellcows each season from 2013-2022. Again, I define a bellcow as a running back with more than 200 carries in consecutive seasons. So, the 11 players that appear on the first bar also had over 200 carries in 2012.
I find this graph to be interesting, as it seems to me that the death of the bell cow has been exaggerated. From 2013-2017 there were 42 running back seasons that qualified as "bellcow workload". This number has diminished from 2018-2022 to 35 such "bellcow workload" seasons.
Bellcows might not be easy to predict, but they aren't quite dead yet!
Onto quarterbacks, we start with the general data at the position:
The rushing stats are not quite as predictable across the board for quarterbacks, compared to running backs. However, quarterbacks seem to be quite reliable in the rushing yardage department, with a super strong correlation of 0.77. This is only slightly less than the correlation for passing yards year-over-year (which was 0.79), from the unfiltered data set.
Quarterbacks also have a stronger correlation with carries and fantasy points season-to-season, at 0.79 and 0.70, respectively. Unfortunately, rushing touchdowns year-over-year a moderate correlation of 0.49, which is weaker than the correlation for passing touchdowns year-over-year (0.74).
Despite being less predictable than passing stats, this trend for the position is quite surprising to me, as I was under the impression rushing stats for quarterbacks would be at least as fluky as running backs.
We do need to dive deeper, as this heat map takes into account all quarterback data points from 2012-2022, as seen from the scatter plot of the strongest correlation stat from the heat map, carries vs. carries_last:
Again, quite a bit of noise when trying to evaluate the data from a fantasy perspective. Far too many non-rushing quarterbacks are included in the data set.
This is what the heat map looks like for rushing quarterbacks (I define a quarterback to be rushing when they have more than 20 carries in consecutive seasons):
The correlations do weaken, but rushing yards and carries are still quite reliable stats. In fact, they are more reliable than the correlations for passing yards (0.57) and passing attempts (0.58) for starting quarterbacks during this same time period.
While this might not be the most apt comparison, I still find it somewhat eye-opening.
I believe this should be taken with a grain of salt, as the average rushing yardage per season for the rushing quarterbacks above is 264.67. Meanwhile, the average passing yardage for starting quarterbacks per season, during that same time frame, is 4010.67.
What does that mean?
Well, while rushing quarterbacks likely improve their rushing yardage at a better rate than starting quarterbacks improve their passing yardage, the fantasy gains from the rushing side are probably not as valuable as from the passsing side.
It still is exciting to see that rushing numbers are replicable when additional filters are applied. Rushing quarterbacks, with the evidence we have so far, can repeat their success on a yearly basis.
Note, the sample size is still decent enough with this definition for rushing quarterbacks, as we have 43 such qualifying seasons.
I would like to also point out that fantasy points only have a moderate correlation (0.49) and rushing touchdowns are essentially noise (0.15).
In this section, we go further into the rushing stats behind both the running back and quarterback position.
One motif that appeared in the quarterback passing stability analysis was the difference in correlation values between young and old quarterbacks.
Unfortunately, no such pattern appeared in this case, for backs that had consecutive seasons of over 100 carries.
Young - 26 and under for current season - is on the left, and old - 26 and over for the previous season - is on the right:
The same applies when using 28 as a threshold too (young on the left and old on the right again):
Not really much predictablity to go off. Running back appears to be a very noisy position, with little consistency from year-to-year, when it comes to fantasy relevancy.
The only heat map I was able to create with some strong correlations in regards to rushing yards and fantasy points had the following filter: over 200 carries in consecutive seasons and 28 years old in the previous season.
The resulting heat map:
That's a super strong correlation for rushing yards, at 0.83, and pretty strong for fantasy points at 0.68. Maybe we finally have something?
Looking futher, in the scatter plot for rushing yards vs. rushing yards last, we see that our sample size is, unfortunately, quite limited:
Only ten such seasons fulfill the criteria from the filter applied. Moreover, only five backs make up the list:
I guess I could see Henry staying relevant and making this list if he continues to get 200 carries a season, which may happen for another year or two. Otherwise, it's not a super helpful heat map and correlation find - these players all had really solid-to-fantastic careers.
From a basic stat perspective, predicting running back (fantasy) success appears to be pretty futile.
If we apply an even stronger filter to what a rushing quarterback is - more than 40 carries in consecutive seasons - we see that rushing correlations weaken considerably, but there still exists a moderate correlation for rushing yards (0.43) and carries (0.47) from season-to-season:
The sample size for this applied filter is 20 such seasons, so this subset of the data is probably just large enough to have some semblance of reliability.
Here's the scatter plot for the strongest correlation in the heat map above, rushing yards vs. rushing yards last:
I want to note that the oldest a quarterback was in the current season of the data points above, was 26. Time will tell in his new era of quarterback play if rushing is sustainable into late age.
Aside from potential longevitiy concerns, it is good to know that there exists some consistency in the realm of rushing quarterbacks, even if it is a bit weaker than passing stats for starting quarterbacks.
While probably not that useful for fantasy predictions, another interesting stat I noticed is that rushing quarterbacks (quarterbacks with more than 20 carries in consecutive seasons) have a higher rate of a successful play compared to bellcow running backs (backs with more than 200 carries in consecutive seasons).
Rushing quarterbacks were had a successful play rate of 55.47%, whereas bellcows had a successful play rate of 39.37%.
Combine that with the fact that rushing quarterbacks average 5.24 yards a carry, compared to 4.30 for bellcows, and I can see why everyone wants a rushing quarterback. Additionally, I figure this adds to the narrative that the bellcow is dead.
Two final graphs for you, showing the same correlations on running back data (from 2016-2022) with different filters. The first shows a correlation heat map of backs with over 100 carries in consecutive seasons:
The second shows a correlation heat map of backs with over 200 carries in consecutive seasons:
In both cases, the standout correlations are avg_time_to_los_mean vs. avg_time_to_los_mean_last (0.59 and 0.59) and rush_yards_over_expected_sum vs. rush_yards_over_expected_sum (0.48 and 0.55). Neither are particularly strong correlations, but it is intersting to see that good backs improve upon exceeding expectations in the running game, and that good backs take longer and longer to get to the line of scrimmage.
For more information on the advanced stats in the heat map above, I recommend checking out this data dictionary.
With all of this digging for stats that are predictable year-over-year, how does this relate back to fantasy?
Well, ideally, we want to find stats that are predictable and positvely so (sticky stats) from year-to-year, that also contribute (significantly) to fantasy production.
For example, we saw that rushing yards vs. rushing yards last had a correlation of 0.74, when looking at the unfiltered rushing data. Now, if rushing yards correlates strongly with fantasy production, this would mean that rushing yardage would be a pretty good indicator of fantasy performance.
Well, ideally, we want to find stats that are predictable and positively so (sticky stats) from year-to-year, that also contribute (significantly) to fantasy production.
The heat maps below show the correlation between fantasy points (current and previous season) and rushing stats that had at least a moderate correlation (at least 0.40) with themselves on a year-to-year basis.
The first one is generated from using the unfilitered data set of rushing stats from 2012-2022:
I find it strange that fumbles might end up being the best predictor for fantasy points, from a rushing perspective. I believe these strange results are coming from the grouping of running backs and quarterbacks.
Filtering the data for only running backs yields the following graph:
There are far more stats that have at least moderate year-to-year correlation and also correlate well with fantasy points in eiher the current or previous season. The three strongest predictors, and they are very strong, appear to be rushing yards, successful plays, and carries (curret or previous seasons).
Furthermore, each stat has a decently strong correlation with itself, year-over-year. 0.64 for rushing yards, 0.65 for carries, and 0.65 for successful plays. Exciting to see that rushing stats for all running backs are predictable.
What becomes disappointing, is that there is no stat that has at least a moderate correlation (0.40) with itself on a year-to-year basis and a moderate correlation with fantasy points simultaneously. It seems fair to expect backs to improve over time, but once they receive a fantasy relevant workload (over 100 carries), all bets are off. This is a point I intend to revisit later, with other statistical techniques, to see if there might be something useful lurking under the surface.
Let's take a look at only quarterbacks now:
None of these predictors are particularly strong. Perhaps the situation will improve as we remove the noise from quarterbacks who do not rush much.
Applying a filter of over 20 carries to quarterbacks:
It's a little better, but still not spectacular. Finally, the filter of over 40 carries applied to quarterbacks:
This is the best yet! The correlations for fantasy points and successful plays, as well as fantasy points and carries is moderate-to-strong. Additionally, carries has an okay year-to-year correlation with itself of 0.47. The same correlation for successful plays is 0.43. Not great, not bad.
I would say that on this evidence, in context with the rest of this research, rushing numbers for quarterbacks are sustainable year-over-year and for fantasy success. However, passing stats are still more reliable on a year-to-year basis, and as indicators of fantasy success.
On that note, I'll wrap up the analysis with the main takeaways:
Thanks for reading!
Cheers,
Alex
Term definitons below:
For more defintions and data tables, please check out this set of data dictionary definitons.
Unfiltered data from 2012-2022, on correlation year-to-year on simple stats:
Unfiltered data from 2016-2022, on correlation year-to-year on simple and advanced stats:
I would recommend taking advantage of the interactiveness of the graph, in order to explore this heat map.
Unfiltered data from 2016-2022, on all stats - correlations amongst current and previous season stats:
Like above, I would take advantage of the interactiveness of the graph, in order to explore this heat map.