Hundred Game Division Winners

As we get closer to the end of the baseball season, the San Francisco Giants and Los Angeles Dodgers both have over 100 wins. A friend asked if there had been situations where two 100 win teams had been in the same division AND there had been no other 100 win teams in other divisions. It’s kind of a weird question, but it inspired me to look at the Lahman database to find an answer.

library(Lahman)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

The Lahman database contains a table called Teams that reports on the various team statistics for every baseball season from 1871 to the present day.

data(Teams)
str(Teams)
## 'data.frame':    2955 obs. of  48 variables:
##  $ yearID        : int  1871 1871 1871 1871 1871 1871 1871 1871 1871 1872 ...
##  $ lgID          : Factor w/ 7 levels "AA","AL","FL",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ teamID        : Factor w/ 149 levels "ALT","ANA","ARI",..: 24 31 39 56 90 97 111 136 142 8 ...
##  $ franchID      : Factor w/ 120 levels "ALT","ANA","ARI",..: 13 36 25 56 70 85 91 109 77 9 ...
##  $ divID         : chr  NA NA NA NA ...
##  $ Rank          : int  3 2 8 7 5 1 9 6 4 2 ...
##  $ G             : int  31 28 29 19 33 28 25 29 32 58 ...
##  $ Ghome         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ W             : int  20 19 10 7 16 21 4 13 15 35 ...
##  $ L             : int  10 9 19 12 17 7 21 15 15 19 ...
##  $ DivWin        : chr  NA NA NA NA ...
##  $ WCWin         : chr  NA NA NA NA ...
##  $ LgWin         : chr  "N" "N" "N" "N" ...
##  $ WSWin         : chr  NA NA NA NA ...
##  $ R             : int  401 302 249 137 302 376 231 351 310 617 ...
##  $ AB            : int  1372 1196 1186 746 1404 1281 1036 1248 1353 2571 ...
##  $ H             : int  426 323 328 178 403 410 274 384 375 753 ...
##  $ X2B           : int  70 52 35 19 43 66 44 51 54 106 ...
##  $ X3B           : int  37 21 40 8 21 27 25 34 26 31 ...
##  $ HR            : int  3 10 7 2 1 9 3 6 6 14 ...
##  $ BB            : int  60 60 26 33 33 46 38 49 48 29 ...
##  $ SO            : int  19 22 25 9 15 23 30 19 13 28 ...
##  $ SB            : int  73 69 18 16 46 56 53 62 48 53 ...
##  $ CS            : int  16 21 8 4 15 12 10 24 13 18 ...
##  $ HBP           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ SF            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ RA            : int  303 241 341 243 313 266 287 362 303 434 ...
##  $ ER            : int  109 77 116 97 121 137 108 153 137 166 ...
##  $ ERA           : num  3.55 2.76 4.11 5.17 3.72 4.95 4.3 5.51 4.37 2.9 ...
##  $ CG            : int  22 25 23 19 32 27 23 28 32 48 ...
##  $ SHO           : int  1 0 0 1 1 0 1 0 0 1 ...
##  $ SV            : int  3 1 0 0 0 0 0 0 0 1 ...
##  $ IPouts        : int  828 753 762 507 879 747 678 750 846 1548 ...
##  $ HA            : int  367 308 346 261 373 329 315 431 371 573 ...
##  $ HRA           : int  2 6 13 5 7 3 3 4 4 3 ...
##  $ BBA           : int  42 28 53 21 42 53 34 75 45 63 ...
##  $ SOA           : int  23 22 34 17 22 16 16 12 13 77 ...
##  $ E             : int  243 229 234 163 235 194 220 198 218 432 ...
##  $ DP            : int  24 16 15 8 14 13 14 22 20 22 ...
##  $ FP            : num  0.834 0.829 0.818 0.803 0.84 0.845 0.821 0.845 0.85 0.83 ...
##  $ name          : chr  "Boston Red Stockings" "Chicago White Stockings" "Cleveland Forest Citys" "Fort Wayne Kekiongas" ...
##  $ park          : chr  "South End Grounds I" "Union Base-Ball Grounds" "National Association Grounds" "Hamilton Field" ...
##  $ attendance    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ BPF           : int  103 104 96 101 90 102 97 101 94 106 ...
##  $ PPF           : int  98 102 100 107 88 98 99 100 98 102 ...
##  $ teamIDBR      : chr  "BOS" "CHI" "CLE" "KEK" ...
##  $ teamIDlahman45: chr  "BS1" "CH1" "CL1" "FW1" ...
##  $ teamIDretro   : chr  "BS1" "CH1" "CL1" "FW1" ...

All I needed was the year, league, team, division, and wins. I also selected the games played and losses for completeness.

teams <- Teams %>% select(yearID, lgID, teamID, divID, Rank, G, W, L, name)
table(teams$divID, teams$lgID, useNA = "if")
##       
##         AA  AL  FL  NA  NL  PL  UA
##   C      0 135   0   0 150   0   0
##   E      0 302   0   0 286   0   0
##   W      0 283   0   0 282   0   0
##   <NA>  85 560  16  50 786   8  12

I filtered by 100 win teams then grouped these by year, league, and division. I counted the teams in each division that had won 100 games. Then I filtered by the years where there were more than 1 team in a division that had won at least 100 games.

teams %>% filter(W >= 100) %>% 
  group_by(yearID, lgID, divID) %>%
  summarize(n_100_win_teams = n()) %>% 
  filter(n_100_win_teams > 1)
## `summarise()` has grouped output by 'yearID', 'lgID'. You can override using the `.groups` argument.
## # A tibble: 10 × 4
## # Groups:   yearID, lgID [10]
##    yearID lgID  divID n_100_win_teams
##     <int> <fct> <chr>           <int>
##  1   1909 NL    <NA>                2
##  2   1915 AL    <NA>                2
##  3   1942 NL    <NA>                2
##  4   1954 AL    <NA>                2
##  5   1961 AL    <NA>                2
##  6   1962 NL    <NA>                2
##  7   1980 AL    E                   2
##  8   1993 NL    W                   2
##  9   2001 AL    W                   2
## 10   2018 AL    E                   2

In 2018 the AL East had two 100 team winners, the Red Sox and Yankees (108 and 100 wins respectively). However there was actually another team that had 100 wins, the Houston Astros, so that doesn’t quite fit my friend’s question.

teams %>% filter(yearID == 2018 & W >= 100) %>% arrange(-W)
##   yearID lgID teamID divID Rank   G   W  L             name
## 1   2018   AL    BOS     E    1 162 108 54   Boston Red Sox
## 2   2018   AL    HOU     W    1 162 103 59   Houston Astros
## 3   2018   AL    NYA     E    2 162 100 62 New York Yankees

In 2001, there were two teams that won 100 games, the Mariners and Athletics. The Mariners had 116 wins, in Ichiro Suzuki’s rookie year. The team lost to the 95 win Yankees in the ALCS. The As actually won 102 games but still finished 14 games back of the Mariners. They lost to the Yankees in the division series. The Yankees then lost in the World Series to the Diamondbacks in 7 games.

teams %>% filter(yearID == 2001 & W>=100) %>% arrange(-W)
##   yearID lgID teamID divID Rank   G   W  L              name
## 1   2001   AL    SEA     W    1 162 116 46  Seattle Mariners
## 2   2001   AL    OAK     W    2 162 102 60 Oakland Athletics

Some other teams included the Braves in their last year in the NL West, beating out the Giants 104-103 wins. Neither made it to the World Series, with the Blue Jays defeating the Phillies due to Joe Carter’s walk off home run in game 6.

teams %>% filter(yearID == 1993 & W>=100) %>% arrange(-W)
##   yearID lgID teamID divID Rank   G   W  L                 name
## 1   1993   NL    ATL     W    1 162 104 58       Atlanta Braves
## 2   1993   NL    SFN     W    2 162 103 59 San Francisco Giants

The 1962 season was also a cool story with the Giants and Dodgers both finishing with 101 wins. This was from before there were divisions in baseball and only one team would make the world series. As a result the two teams played an additional 3 game regular-season series with the Giants coming out ahead 2-1.

teams %>% filter(yearID == 1962 & W >= 100) %>% arrange(-W)
##   yearID lgID teamID divID Rank   G   W  L                 name
## 1   1962   NL    SFN  <NA>    1 165 103 62 San Francisco Giants
## 2   1962   NL    LAN  <NA>    2 165 102 63  Los Angeles Dodgers
teams %>% filter(yearID == 1961 & W >= 100) %>% arrange(-W)
##   yearID lgID teamID divID Rank   G   W  L             name
## 1   1961   AL    NYA  <NA>    1 163 109 53 New York Yankees
## 2   1961   AL    DET  <NA>    2 163 101 61   Detroit Tigers

Another question is how often are there 100 game win teams in baseball?

First we can check how many times there are 100 win teams in a season.

teams %>% mutate(w100 = (W>=100)) %>% 
  group_by(yearID) %>% 
  summarize(n_teams = n(), w100teams = sum(w100)) %>%
  arrange(-w100teams)
## # A tibble: 150 × 3
##    yearID n_teams w100teams
##     <int>   <int>     <int>
##  1   2019      30         4
##  2   1942      16         3
##  3   1977      26         3
##  4   1998      30         3
##  5   2002      30         3
##  6   2003      30         3
##  7   2017      30         3
##  8   2018      30         3
##  9   1909      16         2
## 10   1910      16         2
## # … with 140 more rows
teams %>% mutate(w100 = (W>=100)) %>% 
  group_by(yearID) %>% 
  summarize(n_teams = n(), w100teams = sum(w100)) %>% 
  summarize(n_seasons = sum(n_teams), w100teams = sum(w100teams))
## # A tibble: 1 × 2
##   n_seasons w100teams
##       <int>     <int>
## 1      2955       109

There have been 109 hundred win seasons in the 2955 baseball team seasons in the database.

teams %>% mutate(w100 = (W>=100)) %>% 
  group_by(yearID) %>% 
  summarize(n_teams = n(), w100teams = sum(w100)) %>% 
  summarize(n_seasons = sum(n_teams), w100teams = sum(w100teams)) %>% 
  summarize(w100teams/n_seasons)
## # A tibble: 1 × 1
##   `w100teams/n_seasons`
##                   <dbl>
## 1                0.0369

That breaks down to about 3.7%. Given that there are 30 teams in the league, we would expect about 1 team to reach the 100 win plateau per year.