Competent quarterback play is a prerequisite for success in the NFL. Of the last 35 Super Bowls, 32 have featured at least one current Pro Football Hall of Fame quarterback, or a QB who seems likely to eventually be inducted (we're operating with the belief that Tom Brady, Aaron Rodgers, Ben Roethlisberger, Drew Brees and Patrick Mahomes are Canton-bound).
Nevertheless, football remains the ultimate team sport, and attributing a team's overall success solely to who aligns under center is an exercise rife with problematic assumptions. Yet, isolating a quarterback's play from his supporting cast (and his play-caller) remains elusive, statistically speaking.
Since the inception of the Next Gen Stats project, our focus has been to drive insightful analysis from cutting-edge metrics derived from on-field tracking technology. We started with statistics like how far the ball traveled in the air (i.e., air distance), the distance between a receiver and the nearest defender (i.e., separation) and even how fast the passer was traveling when he let go of the pass (i.e., quarterback speed). As our toolbox of statistics grew, so too did the complexity of our metrics. In 2018, we debuted completion probability, a predictive model that can estimate the chances of a completed pass (using metrics like air distance, separation and quarterback speed as features in the model). That same year, our team developed a different model to determine how many yards a receiver should be expected to gain after he catches the ball (expected yards after catch). These models help contextualize outstanding performances; though, in isolation, the statistical picture is rarely complete.
Enter the Next Gen Stats Passing Score powered by AWS.
We teamed up with the AWS Proserve data science group to develop a more comprehensive metric for evaluating passing performance. Built off of seven different AWS-powered machine-learning models, the NGS Passing Score seeks to assess a quarterback's execution on every pass attempt and transform that evaluation into a digestible score with a range between 50 and 99. The score can be aggregated on any sample of pass attempts while still maintaining validity in rank order (more on this later). Before we dive into the passing score formula, it is important to remember the why.
A brief history of advanced passing statistics
Reducing passing performance to a single number is not a new concept. Almost exactly a half century ago, then-NFL Commissioner Pete Rozelle asked a league committee to create a metric that could be used to crown the best passer in the league. Thus, passer rating was born. To this day, passer rating still reigns supreme as the most widely used metric to represent passing performance. The problem? Well, there are just too many to list. Chief among the criticisms of the metric: the formula hasn't changed over the last 50 years while the league's passing environment has continued to evolve.
Over the last few decades, there have been several worthy attempts to improve upon the validity of passer rating. Pro Football Focus has an army of charters who famously grade every play, which feeds into aggregate grades of their own. ESPN's Stats & Info group created QBR using the context of expected points. Aaron Schatz of Football Outsiders developed Defense-adjusted Value Over Average (DVOA for short), which adjusts for the strength of the opponent in the play-aggregation formula.
All of these statistical endeavors share a common goal: to isolate the contributions of a quarterback from the team's collective passing production and efficiency. No matter the difficulty of a throw, traditional box score statistics (like yards and touchdowns) will treat a 72-yard touchdown pass the same whether it was thrown 40 yards in the air (and hits the receiver in stride) or if it was thrown to a wide-open receiver 6 yards downfield (who outruns the entire defense). The NGS Passing Score, like other metrics before it, seeks to improve on the limitations of the traditional box score.
A model to predict the value of a pass, before the throw
We mentioned that the Next Gen Stats Passing Score is derived from a combination of seven different machine learning models. They are (listed in order of development): (I) completion probability, (II) expected yards after catch, (III) expected points, (IV) win probability -- and our newest models -- (V) interception probability (or pass outcome probability), (VI) predicted yards and (VII) predicted expected points added. A majority of these were already a part of the NGS toolbox. However, there was still a missing piece. A quarterback makes his decision to throw a pass not knowing the precise location of where his receivers (and opposing coverage defenders) will be when the pass arrives. So, we set out to replicate the quarterback's decision process, and predict the value of a pass attempt before the ball is thrown. By estimating the expected value of a pass attempt, we can more effectively evaluate quarterback efficiency -- relative to a league-average baseline -- and, in future iterations of the score, quarterback decision-making.
The output of our new predicted yards model serves as the basis for our play expectation metric. That is, since we can measure how well an average quarterback would perform in that situation (by leveraging our completion probability model), we can control for the level of difficulty of their passes. But that's not all: By going deeper into the outputs of our collective NGS predictive models, we find valuable data points that will serve as the components for our new passing score.
How the NGS Passing Score works
Instead of simply awarding all passing yards, touchdowns and interceptions to the quarterback, the NGS Passing Score equation leverages the outputs of our models to form the components that best ...
- Evaluate passing performance relative to a league-average expectation.
- Isolate the factors that the quarterback can control.
- Represent the most indicative features of winning football games.
- Encompass passing performance in a single composite score (ranging from 50 to 99).
- Generate valid scores at any sample size of pass attempts.
Armed with a collection of powerful AI-driven tools, it is only fitting we put the pieces together to form the seven measurable components that make up our new passing score (listed in order of weight in the formula):
(I) Expected Points Added Over Expected (EPAOE) accounts for 46 percent of the passing score. EPAOE measures production relative to an expected value (using our new expected yards model) and is calculated as the difference between the actual value of a pass and the predicted value of the pass before the ball is thrown, when accounting for the probability of each pass outcome (e.g., completion, incompletion or interception).
(II) Expected Points Added (EPA) accounts for 18 percent of the passing score. Instead of quantifying the success of a play in terms of yards gained, EPA represents success in terms of points added relative to the current play.
(III) Completion Percentage Over Expected (CPOE) accounts for 11 percent of the passing score. CPOE is a derivative of completion probability, which measures the success of a pass relative to the difficulty of the throw. The CPOE feature used in the score does adjust for dropped passes.
(IV) Interception Probability (INT Probability) accounts for 11 percent of the passing score. INT Probability measures the likelihood that a pass will be intercepted if thrown.
(V) Air Expected Points Added (Air EPA) accounts for 7 percent of the passing score. Air EPA is equal to the value of a completion plus the yards a receiver would be expected to gain after the catch. Air EPA is a proxy for the optimal reward of a pass within the control of the quarterback.
(VI) Expected Air EPA (xAir EPA) accounts for 7 percent of the passing score. xAir EPA is equal to the value of a completion (plus expected YAC), relative to the likelihood of a completion (e.g., completion probability).
(VII) Win Probability (WP) is not a feature in the model, but it is used as an aggregation play-weight. On any given play, the offense's pre-snap win probability for is used as a weight in the passing score formula, where closer to 50 percent win probability equals one and closer to 10 percent or 90 percent equals 0.6.
Each component is converted to a standardized z-score based on the population of all pass attempts from the 2018 through 2021 seasons (n = 70,439). To reduce the impact any one component has on dominating the score, each individual z-score was selectively clipped at 3 or 4 standard deviations below and above the mean. The linear combination of components make up an individual play score that ranges from 50 to 100.
Now onto the aggregation step: Take the average of individual play scores weighted by the offenses' pre-play win probability using a parabolic shape centered at 0.5. In other words, plays in close games have a greater weight in the formula than plays closer to the extremes. Plays with less than 10 percent win probability or greater than 90 percent win probability are worth roughly 60 percent less in the aggregation formula than a play in a game at 50-50 odds.
But that's not all. To account for small-sample bias, the passing score aggregation formula uses the James-Stein estimator to "shrink" predictions closer to the population average. This bayesian approach became a popular technique in recent years to solve for small-sample issues when predicting batting average in baseball. Because the NGS Passing Score leverages this solution, we will have the ability to evaluate quarterback play at the season, game and situational level, while still maintaining a consistent distribution shape of scores.
A passing stat that correlates with wins
So how well does the NGS Passing Score correlate with winning football games?
We took a look at 202 individual seasons from 88 different quarterbacks over the last four years, grouped each season into buckets of five (95-plus, 90 to 95, 85 to 90, etc.), and compared the win-loss record and percentage of playoff berths across each bucket.
The relationship between a player's single-season NGS Passing Score and winning percentage is quite strong. A score around 85 serves as an indicator of a winning percentage near the .500 mark. A score above 85, and your team is more than likely winning with, rather than in spite of, their quarterback. A score of 90-plus -- those are the players you win because of.
We group single-season scores into five-point buckets at the single-season level, with clear thresholds for quality of play. The distribution of passing score points to 80 as a rough Mendoza Line for starting-level QB performance. Quarterbacks falling below that line are often young players acclimating to the league, or replacement-level talent that teams will look to upgrade from in the following season. Quarterbacks falling between a score of 80 and 90 are a mix of guys you can win in spite of and win with. Finally, passing scores above 90 can be roughly considered elite, the players you win because of. Over the last four seasons, there have been 10 quarterbacks to finish the season with a score of 95-plus. Only two (Deshaun Watson in 2020 and Derek Carr in 2019) played for teams that failed to make the playoffs during the season in which they reached that mark.
The tradeoff between correlation and stability
So how does our passing score compare to the validity and reliability of the PFF passing grade and ESPN's QBR? Across identical samples, we investigated the correlation between each metric and measures of winning football games (represented by win percentage, playoff percentage and whether the team had a winning season). In addition to correlation tests, we also set out to evaluate the year-over-year stability of each metric. That is, for a specific player, how similar was the player's metric from one season to the next?
The NGS Passing Score has a stronger correlation to win percentage, making the playoffs and finishing with a winning season than both the PFF passing grade and QBR metric, while the PFF passing grade tops the chart in year-over-year stability. There is certainly a trade-off when it comes to both correlation and stability; a higher number is not always better. A metric can be highly correlated to winning games, but not representative of the individual quarterback's skill level. Conversely, a metric high in stability might not correlate with any meaningful outcome of value. Nevertheless, the decisions made during the modeling and aggregation process were geared toward the maximization of both objectives.
More scores to come
This is just the beginning. We will continue to iterate and apply our scoring methodology to other advanced stats and position groups. The NGS Passing Score is a smaller component of an even bigger score -- the Quarterback Score. But that requires components representing rushing performance, the ability to avoid pressure and sacks, and even the elusive analysis of determining the optimal target at every time stamp.
How did the quarterbacks leading their teams into the playoffs perform in the NGS Passing Score during the regular season? Click here to find out as we rank all 14 postseason QB1s based on the Next Gen Stats Passing Score.