Welcome to EvanMiya College Basketball Analytics! The main objective of our work is to assess college basketball team and player strength. We have created an advanced statistical metric, Bayesian Performance Rating (BPR), which quantifies how effective a team or player is, using advanced box-score metrics and play-by-play data. This metric is predictive in nature, which means that each rating is fine-tuned to predict performance in future games.
Note: Some of the methodology is slightly outdated, and will be updated soon.
There are several pages of analysis (plus several more that appear when appropriate):
Now for some more detail into how we get these numbers:
We have box score data available for every game played in the each college basketball season, along with play-by-play data, which includes substitutions. The possession by possession data is the main component used to drive our analysis.
One key step that we take to gain the best predictions from our data is to only look at possessions in a game that “mattered”. Analyzing possessions when the game is already well out of hand isn't as valuable to us as possessions when the winner hasn't been decided yet. We use the in-game naive win probability (which assumes that teams are equally matched) in order to assess when a game was out of hand. Once a team has a win probability of at least 99%, we start down-weighting the possessions until the win probability is greater than 99.99%, at which point we discard all possessions entirely. In the rare situation where the losing team mounts a comeback and the win probability of the winning team sinks below 99%, we start giving each possession full weight again.
From a coach's perspective, every possession matters, even when your team has seemingly won or lost with minutes to spare. However, for predictive purposes, we can't properly assess the strength of a team when both teams aren't putting their normal lineups in or aren't playing as hard as they might if the outcome of the game were still in question.
The purpose behind the Bayesian Performance Rating (BPR) at a team level is to provide each team a true offensive and a true defensive rating that best explains all of the real game results that we observed from the season. These can be used, along with the BPR ratings of the opposing team, to estimate each team's expected offensive and defensive efficiency (points scored per 100 possessions) in a game. Taking the possession by possession results from each game, and adjusting for home court advantage (more on that in a moment), we run a bayesian regression to find the offensive (OPBR) and defensive (DBPR) coefficients for each team. These coefficients are designed to have 0 as the national average. Thus, very good teams will have higher positive offensive and defensive ratings. A team's overall BPR is just the sum of its OBPR and DBPR.
For example, from the 2019-20 season, 4th ranked Baylor's calculated OBPR was 30.2, and their DBPR was 35.9. On the other hand, 319th ranked Idaho had an OBPR of -23.0 and a DBPR of -13.5.
Our team ratings also incorporate team-specific home court advantages, and adjust for each team's pace of play, seen in the True Tempo metric, which is the adjusted pace of play for each team, based on if they were playing the D1 team with the average tempo.
In the Bayesian Performance Rating for players, each player has an Offensive BPR and a Defensive BPR, which are added together to make the player's overall BPR. Player BPR has two components: player impact and player efficiency.
The player impact part of BPR attempts to quantify a player's value to his team by looking at how efficiently his team performed on offense and defense for every possession he played. In addition, we want to adjust for the strength of his teammates on the court with him, along with the strength of opposing players for each possession he was on the court. There are some good existing advanced metrics that attempt to do this, such as Adjusted Plus-Minus. This type of metric focuses on the idea that a player's contribution to his team's margin of victory matters most. APM does not use any individual player statistics, but instead utilizes the score outcome of each possession to determine what players are better than others at positively affecting the outcome of the game, in the form of offensive and defensive efficiency. Our player impact ratings are created in a similar fashion, but we make a few adjustments to negate some of the weaknesses of this type of model, which we will explain later on.
Similar to the BPR team ratings, we want to assign a “true” offensive and defensive rating to each player, which indicates his value to his team when he is on the court. Very good players will have higher positive offensive and defensive ratings, with the average D1 player OBPR and DBPR being set at 0.
The main draw of this type of model is that we not only assess the value of a player to his team, but also account for the strength of the other teammates he shares the court with, along with the strength of the opponent players he faces. If we were to look at a more crude measure of player impact, like plus-minus or basic team efficiency when he is on the floor, it can be helpful, but doesn't answer questions such as “did he play with good teammates or bad teammates?” and “Did he play so well because he only played in garbage time against inferior opponents?”. By using a model that adjusts for the strength of all players on the court, we can more accurately assess the value that a player brings to his team when he is on the court.
There are a few shortcomings to this model the way things currently stand. One issue is that there is a lot of “noise” in this data. Due to the randomness of basketball possessions, it can be difficult to know whether a player rating estimate reflects the truth about that player's ability or is due to random chance. The model can “overfit” the data, leading to conclusions about players that just don't make sense when compared to the eye-test. For example, a deep-bench player who happened to be on the court for a handful of minutes when his team outscored the opponent 20-0 could be given an incredibly high rating because it appears that his appearance was what made the difference for his team. To account for this, we use a bayesian approach by setting a prior distribution for each player's OBPR and DBPR centered at 0, so that players who don't play many minutes will having ratings near 0, while those who have more substantial playing time can have their ratings move away from 0 as more information about their impact is accrued throughout the season. The informativeness of the prior distribution was decided using cross-validation.
Another issue with the player impact model is that it relies heavily on the assumption that a roster of players will frequently rotate in and out of the game so that we benefit from seeing lots of different lineup combinations, allowing us to distinguish each individual's impact on his team, when compared to his teammates. These player ratings become less reliable when there are pairs of teammates who are almost always on the court together, or rarely every share possessions together. In situations where player A and B are on the court together 95% of the time, it is difficult to distinguish which teammate is having the larger impact for his team.
This technique has turned out to be incredibly beneficial at generating player ratings that more accurately represent both the value and skill of each player at the offensive and defensive end. An example of this is 2018-2019 Brandon Clarke, who had a tremendous season for Gonzaga before becoming a first round draft pick. In the player impact ratings, he is ranked 10th best in the country for 2018-2019. However, once we use his high Box BPR rating to inform his prior distribution for offensive and defensive ratings, he finishes 2nd in the country in our final BPR, behind Zion Williamson. Zach Norvell, a fellow teammate of his, sees his ranking drop from 7th to 48th once we incorporate his Box BPR for the year, which was much lower.
Using Box BPR to influence our ratings doesn't change the fact that we can still easily detect good performances from players who otherwise may not fill up the stat sheet. A prime example of an underrated “intangibles” guy is 2019-2020 Alabama forward Herbert Jones, who had the highest DBPR and third highest overall BPR that year, despite having a much lower Box BPR. The degree to which he elevated his team's performance when he was on the court was astronomic, compared to Alabama's numbers without him.
The Team Breakdown tool is used to gain detailed insights into the performance of a team, broken down player by player. This is especially helpful when trying to explain the offensive and defensive ratings assigned to each player.
Here is the recommended approach for using the Team Breakdown:
My name is Evan Miyakawa, and I have my masters degree in statistics and am currently finishing up my doctorate in statistics at Baylor University. I graduated with my Bachelor's Degree in 2017 from Taylor University. You can find out more on my LinkedIn page.
My college basketball research has been featured in articles by Sports Illustrated, CBS Sports, ESPN, the AP, and others.
Feel free to email me at firstname.lastname@example.org with any questions or requests. I occasionally appear on radio shows and podcasts to talk about college basketball. I also work with college basketball coaches who want to utilize analytics for their teams.
You can also follow me on Twitter.
Access the EvanMiya blog here.
Randy Kennedy Show - Sports Talk 995
Left Coast Sports
Country Roads Confidential - West Virginia 247Sports
Spartan Radio Network
Hoopin' With Hoops - Greg Peterson
Upper Left Sports
Locked On Zags
Our Daily Bears
Hope & Rauf Presented by Heat Check CBB
Sports Illustrated: Forde Minutes: What's Plaguing College Basketball's Powerhouses? - Pat Forde
Associated Press: NCAA teams hit by COVID pauses take hope from antibodies - David Skretta
Sports Illustrated: Five Tips for Filling Out Your NCAA Tournament Bracket - Molly Geary
Spokesman Review: Numbers are adding up for No. 1 Gonzaga, but more data is needed - Jim Meehan
The Definitive Guide to Covid Pause Effect for CBB Teams - Evan Miyakawa
Spokesman Review: Rested or rusted? Analytics have helped Washington State gauge performance following COVID-19 layoffs - Theo Lawson
Kentucky Sports Radio: Advanced statistics implies that Kentucky’s Davion Mintz needs more minutes
A Sea of Blue: How to fix Kentucky