Welcome back to another analysis piece!
This time, I'm revisiting running back metrics to see if there's something I missed in either the sticky analysis or yards per carry and carries pieces.
Yet again, the goal is to see if we can find any stat or set of stats that can be used as indicators of future fantasy performance. I will continue to use the 2012-2022 data set for this exploration.
The new angle for the same data takes into account the receiving work for backs. In previous explorations on this topic I have only examined the rushing stats of backs, which I realized is a little silly, because the distribution of fantasy points for backs is pretty even across receiving and rushing stats.
If this surprises you, it surprised me as well, and this bar graph captures the motivation quite well for this piece:
Generally speaking, receiving work matters for a backs fantasy relevance. In fact, the split of fantasy production from rushing and receiving is about 50-50 when looking at all running back seasons from 2012-2022. The graph above seems to reflect this, with younger backs producing more on the ground, and older ones through the air.
For those who would prefer this type of visualization to directly compare where the fantasy points are coming from for backs:
Before moving on, I do want to emphasize that these bar graphs are displaying the results across the whole data set.
However, if we dig further and only take a look at fantasy relevant backs, we see that the receiving is slightly less important:
The split is still about 60-40, in favor of rushing. Receiving production accounts for a significant portion of fantasy points, even for fantasy relevant backs.
At this point, we want to re-calculate correlations of stats for backs to see if any of the totals or receiving stats 1) self-correlate and 2) correlate to fantasy production. Before moving on, I want to note that all of the bar graphs below are graphed from highest self correlation (left) to lowest self correlation (right).
Here are the general corrrelations for all data points from 2012-2022:
That's a bit intense, so let's take a look at a more focused version of the bar graph above - keeping only stats that have values of at least 0.5 for self and fantasy correlation:
The stats above are scattered between rushing, total, and receiving, and there are some strong correlations present. Another validity check that we are on the right path, especially compared to previous analysis deep dives!
Let's apply a stronger filter to remove noise from our data set, at least 10 games played in season X (current) and X - 1 (previous):
Again, the focused view:
Once again, another checkpoint confirming that our investigation is in the right direction, as there is a balance of rushing, total, and receiving stats.
Finally, narrowing the data set to our fantasy relevant backs:
The focused view for additional clarity:
Fascinating, when filtering our data to just fantasy relevant backs, the stats that matter most for predicting fantasy football production appear to all be receiving stats. That's not what I was expecting at all, and makes me wonder how a model built on this information would perform on backs like Henry. I will follow-up this investigation with an article on predictive modeling for backs, that builds of the ideas found in this piece.
Upon this discovery, I immediately plotted some of the highest self correlating stats, their combined values, and fantasy performance, against age. I do want to note that age is calculated slightly differently in this section. Rather than keeping the starting age value, if a player changes age (i.e started the season at 24 and then turned 25 before the season ends), their age is calculated as 24.5. Moreover, in order for an age group (i.e. 24, 24.5, 25, etc.) to be considered, there must be at least ten samples of that age group performing at fantasy relevant standards from 2012-2022.
The following graph contains wopr_y, tgt_sh, and receptions, all of which are combined to create the new-metric. These four lines are then plotted with fantasy points, against age:
I encourage you to use the full interactive capabilities of the graph above. You can (de-)select lines on the graph by clicking on them on the legend. To restore the graph to its original state, double click on a line in the legend.
The three stats are practically mirrors of each other, so I recommend disabling them to see just the new-metric and fantasy points. While these two lines are not identical to each other, the new-metric does a decent job of mimicking the trends in the fantasy points line.
Well, what happens if we include the rushing stat that most closely followed fantasy points? If we add carries as part of the new-metric, we get the following plot:
Again, I encourage you to take advantage of the interactivity of the graph. If we compare the freshly calculated new-metric against fantasy points, we see that carries improves the quality of our predictive indicator (look closely between the ages of 24.5 and 25.5).
Going further, we see that this new-metric has a really strong correlation value with itself, fantasy points in the same season, and fantasy points with its previous season's value:
Before diving into the results of the figure, I want to clarify the graph above. The blue bar shows the correlation of the stat on the x-axis with fantasy points (PPR) in the same season. The red bar shows the correlation of the x-axis stat measured in the previous season (or season X - 1) and fantasy points (PPR) in the current season (or season X). The green bar shows the correlation of the x-axis stat to itself, year-over-year.
For fantasy indicator purposes, we are most interested in the red bar graph. Now, while the new-metric has a strong correlation of about 0.82. It also isn't the strongest of the x-axis stats.
This suggests that there might be an even better metric of fantasy performance, as I am doubtful a singular stat would be able to outperform a set of stats in this task, similar to metrics I looked into that were better than the trinity score for wide receivers.
I am quite excited by the prospects of this and look forward to exploring the possibilities for a more optimal metric. I'll be sharing that, along with some experimental modeling in the follow up to this piece.
Thanks to all for joining me on another analysis piece! Once again, the recap:
Thanks for reading!
Cheers,
Alex
Term definitions below:
For stat definitions on the graphs, please check out this previous glossary and these vignettes.