(Super Bowl games don’t actually have a home and away team, but it’s a reasonable way to visualize this data). games %>% count ( home_digit = home_score %% 10, visitors_digit = visitors_score %% 10 ) %>% mutate ( percent = n / sum ( n )) %>% ggplot ( aes ( home_digit, visitors_digit, fill = percent )) + geom_tile () + scale_x_continuous ( breaks = 0 : 9 ) + scale_y_continuous ( breaks = 0 : 9 ) + scale_fill_gradient2 ( high = "#C8102E", low = "white", labels = scales :: percent_format ()) + theme_minimal () + labs ( x = "Last digit of home team score", y = "Last digit of visitor team score", title = "Common pairs of NFL scores", fill = "% of games since 1978" ) For this graph we’ll use red and white for the Patriots. So let’s consider the most common pairs of scores. It’s likely that each team’s strategy influences the other (and both may be influenced by common factors such as the weather or referee), meaning that the two digits in a score aren’t statistically independent. But you really don’t want to bet on 2 or 5. It’s closely followed by 0, and 4 and 5 are pretty common as well. 5 ) + scale_x_continuous ( breaks = 0 : 9 ) + labs ( title = "Frequency of final digits in NFL games since 1966" )Īs we suspected, the most common digit is 7. ggplot ( scores, aes ( digit )) + geom_histogram ( fill = "#8D9093", binwidth =. (I’ll do this one in silver for the Eagles). So let’s create a histogram of the most common final digits. 27 is also common, so 7 as the least significant digit is already looking like a solid bet. This could come from two touchdowns (each with an extra point, or one with two extra points) and a field goal. The most commmon (by a wide margin) is 17. (If you’re a football fan, you may want to try guessing before you look at them!) scores %>% count ( score, sort = TRUE ) # A tibble: 60 x 2 This gives us a set of over 18,000 football scores (from over 9,000 games) we can analyze.įirst we could ask what the most common scores are. # season week kickoff home_team visitin… type score digit library ( tidyverse ) theme_set ( theme_light ()) # Git clone first from games % map_df ( read_csv ) # Gather into one tall data frame of scores, and calculate the digit # The dataset has a couple games from before 1978, but only a few per year scores % filter ( season >= 1978 ) %>% gather ( type, score, home_score, visitors_score ) %>% mutate ( digit = score %% 10 ) scores # A tibble: 18,148 x 8 Instead, we’ll look at all NFL games since 1978, thanks to this helpful GitHub dataset from James Every. But there’ve been only 51, so the data is rather noisy. In a perfect world we could use that data, since Super Bowl scores are likely to be subtly different than the regular season. We could use just the Super Bowl scores, perhaps scraped from Wikipedia. By the end of tonight you can see how my predictions did! Most common digits Like in most of my posts, I’ll be using the R language and the tidyverse collection of packages. Here I’ll take a data-driven approach to find the most common digits, and pairs of digits, in NFL scores. Since not every score is equally likely, the least-significant digits aren’t equally likely either. If your team sets the digits in advance, you can get a statistical edge. If (for example) the game ended up being Eagles 20-Patriots 24, the player who had picked the row with 0 and the column with 4 would win. Then the winner is picked based on the least significant digit of the score. It’s played with a ten-by-ten grid, like this one from :Įach row and column gets an assortment of digits from 0-9 representing each team’s score, and each person playing the game (after putting in some money) adds their initials to one of the boxes. My new office introduced me to a betting game I wasn’t previously familiar with: Super Bowl squares.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |