It’s fantasy baseball season again, and it was time for my kids to do their draft with my friend’s kids. None of them know anything about baseball players except for the very top stars (Mookie Betts, Mike Trout), so they rely a lot on analyst predictions. Of course, one person’s predictions are not always that great, so inspired by Nate Silver, I set out to build a combination of several models to see if I could get a more accurate outcome.
There are several prediction algorithms available on Fangraphs (ZIPS, Steamer, Depth Charts, etc). They all take into account various player characteristics and performance metrics to predict what a player will do in the next season. I used the Fangraphs auction calculator to calculate values for each player based on the unique parameters of the kids’ league. There are 4 teams, each with one player per fielding position and 5 starting pitchers. The point totals have not changed much lately but tend to reward total bases and strikeouts more than stolen bases and ERA.
The site allows you to export one table for batting predictions and one table for pitching predictions. I downloaded both sets of predictions for the following prediction models: ZIPS, Steamer, Fans, Depth Charts.
Data Munging
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dc1 <- tbl_df(read.csv("../datasets/dc1.csv", colClasses = "character"))
dc2 <- tbl_df(read.csv("../datasets/dc2.csv", colClasses = "character"))
fans1 <- tbl_df(read.csv("../datasets/fans1.csv", colClasses = "character"))
fans2 <- tbl_df(read.csv("../datasets/fans2.csv", colClasses = "character"))
steamer1 <- tbl_df(read.csv("../datasets/steamer1.csv", colClasses = "character"))
steamer2 <- tbl_df(read.csv("../datasets/steamer2.csv", colClasses = "character"))
zips1 <- tbl_df(read.csv("../datasets/zips1.csv", colClasses = "character"))
zips2 <- tbl_df(read.csv("../datasets/zips2.csv", colClasses = "character"))
I appended a column to each to label which prediction model the prediction had come from.
dc1 <- dc1 %>% mutate(pred_model = "dc")
dc2 <- dc2 %>% mutate(pred_model = "dc")
fans1 <- fans1 %>% mutate(pred_model = "fans")
fans2 <- fans2 %>% mutate(pred_model = "fans")
steamer1 <- steamer1 %>% mutate(pred_model = "steamer")
steamer2 <- steamer2 %>% mutate(pred_model = "steamer")
zips1 <- zips1 %>% mutate(pred_model = "zips")
zips2 <- zips2 %>% mutate(pred_model = "zips")
Then I appended them all together in long form to make it easier to manipulate in R’s dplyr package.
m1 <- bind_rows(dc1, dc2, fans1, fans2, steamer1, steamer2, zips1, zips2)
glimpse(m1)
## Observations: 3,231
## Variables: 11
## $ PlayerName <chr> "Mike Trout", "Mookie Betts", "Jose Ramirez", "Gary S…
## $ Team <chr> "Angels", "Red Sox", "Indians", "Yankees", "Phillies"…
## $ POS <chr> "OF", "OF", "3B", "C", "C", "3B", "OF/DH", "1B", "OF"…
## $ ADP <chr> "1.1", "2.0", "5.0", "53.4", "42.4", "7.1", "6.1", "2…
## $ PA <chr> "658", "679", "651", "543", "544", "651", "651", "658…
## $ rPTS <chr> "1187.0", "1114.0", "1050.0", "766.0", "726.0", "990.…
## $ PTS <chr> "$252.8", "$176.2", "$108.9", "($189.4)", "($231.4)",…
## $ aPOS <chr> "$7.6", "$7.6", "$50.7", "$342.7", "$342.7", "$50.7",…
## $ Dollars <chr> "$261.5", "$184.8", "$160.7", "$154.4", "$112.3", "$9…
## $ pred_model <chr> "dc", "dc", "dc", "dc", "dc", "dc", "dc", "dc", "dc",…
## $ IP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
I converted each column to an appropriate data type for analysis.
m1$PlayerName <- factor(m1$PlayerName)
m1$Team <- factor(m1$Team)
m1$POS <- factor(m1$POS)
m1$ADP <- as.numeric(m1$ADP)
m1$PA <- as.integer(m1$PA)
m1$rPTS <- as.numeric(m1$rPTS)
m1$pred_model <- factor (m1$pred_model)
m1$IP <- as.numeric(m1$IP)
However the dollar value columns had a dollar sign and parentheses to signify negative values. There’s probably an R function to do this, but I was not able to look it up on the airplane as I was writing this post. I wrote my own.
dol.to.num <- function(x) {
x <- gsub("\\$", "", x)
x <- gsub("\\(", "-", x)
x <- gsub("\\)", "", x)
x <- as.numeric(x)
}
m1$PTS <- dol.to.num(m1$PTS)
m1$aPOS <- dol.to.num(m1$aPOS)
m1$Dollars <- dol.to.num(m1$Dollars)
glimpse(m1)
## Observations: 3,231
## Variables: 11
## $ PlayerName <fct> Mike Trout, Mookie Betts, Jose Ramirez, Gary Sanchez,…
## $ Team <fct> Angels, Red Sox, Indians, Yankees, Phillies, Rockies,…
## $ POS <fct> OF, OF, 3B, C, C, 3B, OF/DH, 1B, OF, 1B, 1B, OF, 2B/3…
## $ ADP <dbl> 1.1, 2.0, 5.0, 53.4, 42.4, 7.1, 6.1, 21.8, 16.3, 19.3…
## $ PA <int> 658, 679, 651, 543, 544, 651, 651, 658, 623, 658, 651…
## $ rPTS <dbl> 1187, 1114, 1050, 766, 726, 990, 1019, 975, 1011, 964…
## $ PTS <dbl> 252.8, 176.2, 108.9, -189.4, -231.4, 45.9, 76.4, 30.2…
## $ aPOS <dbl> 7.6, 7.6, 50.7, 342.7, 342.7, 50.7, 7.6, 47.6, 7.6, 4…
## $ Dollars <dbl> 261.5, 184.8, 160.7, 154.4, 112.3, 97.6, 85.0, 78.7, …
## $ pred_model <fct> dc, dc, dc, dc, dc, dc, dc, dc, dc, dc, dc, dc, dc, d…
## $ IP <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Looks pretty good!
Player Values
Here I grouped the players by name and then took a basic mean of the 4 models’ predictions for their values.
s1 <- m1 %>% group_by(PlayerName, POS) %>%
summarize(mean_Dollars = mean(Dollars)) %>%
arrange(-mean_Dollars)
Overall
as.data.frame(s1)[1:60,]
## PlayerName POS mean_Dollars
## 1 Max Scherzer SP 331.87500
## 2 Jacob deGrom SP 276.32500
## 3 Chris Sale SP 241.45000
## 4 Justin Verlander SP 235.27500
## 5 Gerrit Cole SP 205.00000
## 6 Mike Trout OF 195.00000
## 7 Jose Ramirez 3B 178.40000
## 8 Mookie Betts OF 166.25000
## 9 Corey Kluber SP 145.27500
## 10 Blake Snell SP 138.52500
## 11 Nolan Arenado 3B 119.87500
## 12 Aaron Nola SP 109.82500
## 13 Trevor Bauer SP 101.85000
## 14 J.T. Realmuto C 97.75000
## 15 Bryce Harper OF 79.10000
## 16 Christian Yelich OF 75.70000
## 17 Luis Severino SP 74.97500
## 18 Freddie Freeman 1B 73.22500
## 19 Paul Goldschmidt 1B 71.05000
## 20 Gary Sanchez C 68.55000
## 21 German Marquez SP 62.87500
## 22 Carlos Carrasco SP 59.90000
## 23 Jose Altuve 2B 57.45000
## 24 Anthony Rizzo 1B 57.17500
## 25 Ronald Acuna Jr. OF 54.35000
## 26 Kris Bryant 3B/OF 52.62500
## 27 Patrick Corbin SP 45.27500
## 28 Giancarlo Stanton OF/DH 43.10000
## 29 Juan Soto OF 37.12500
## 30 J.D. Martinez OF/DH 35.90000
## 31 Trea Turner SS 34.10000
## 32 Francisco Lindor SS 32.92500
## 33 Javier Baez 2B/3B/SS 32.00000
## 34 Charlie Blackmon OF 31.85000
## 35 Yasmani Grandal C 31.25000
## 36 Rhys Hoskins OF 23.75000
## 37 Jose Berrios SP 22.17500
## 38 Chris Archer SP 20.55000
## 39 Alex Bregman 3B/SS 18.45000
## 40 Cody Bellinger 1B/OF 16.95000
## 41 Aaron Judge OF 15.40000
## 42 Andrew Benintendi OF 9.15000
## 43 Whit Merrifield 2B/OF 8.80000
## 44 Joey Votto 1B 7.97500
## 45 Jack Flaherty SP 6.57500
## 46 Salvador Perez C/DH 6.20000
## 47 Trevor Story SS 4.62500
## 48 Noah Syndergaard SP 3.50000
## 49 Zack Greinke SP -2.27500
## 50 Walker Buehler SP -7.22500
## 51 Stephen Strasburg SP -8.30000
## 52 Jonathan Villar 2B -10.16667
## 53 Eugenio Suarez 3B -10.85000
## 54 Manny Machado SS -11.10000
## 55 Matt Carpenter 1B/3B -11.85000
## 56 James Paxton SP -11.90000
## 57 Buster Posey C -15.15000
## 58 Khris Davis DH -16.40000
## 59 Ozzie Albies 2B -18.15000
## 60 Madison Bumgarner SP -20.10000
Here they are by position.
Catcher
head(s1[grep("C", s1$POS), ])
## # A tibble: 6 x 3
## # Groups: PlayerName [6]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 J.T. Realmuto C 97.8
## 2 Gary Sanchez C 68.6
## 3 Yasmani Grandal C 31.2
## 4 Salvador Perez C/DH 6.2
## 5 Buster Posey C -15.2
## 6 Willson Contreras C -24.0
First Base
head(s1[grep("1B", s1$POS), ])
## # A tibble: 6 x 3
## # Groups: PlayerName [6]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 Freddie Freeman 1B 73.2
## 2 Paul Goldschmidt 1B 71.0
## 3 Anthony Rizzo 1B 57.2
## 4 Cody Bellinger 1B/OF 17.0
## 5 Joey Votto 1B 7.98
## 6 Matt Carpenter 1B/3B -11.9
Second Base
head(s1[grep("2B", s1$POS), ])
## # A tibble: 6 x 3
## # Groups: PlayerName [6]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 Jose Altuve 2B 57.4
## 2 Javier Baez 2B/3B/SS 32
## 3 Whit Merrifield 2B/OF 8.8
## 4 Jonathan Villar 2B -10.2
## 5 Ozzie Albies 2B -18.2
## 6 Rougned Odor 2B -26.8
Third Base
head(s1[grep("3B", s1$POS), ])
## # A tibble: 6 x 3
## # Groups: PlayerName [6]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 Jose Ramirez 3B 178.
## 2 Nolan Arenado 3B 120.
## 3 Kris Bryant 3B/OF 52.6
## 4 Javier Baez 2B/3B/SS 32
## 5 Alex Bregman 3B/SS 18.5
## 6 Eugenio Suarez 3B -10.9
Shortstop
head(s1[grep("SS", s1$POS), ])
## # A tibble: 6 x 3
## # Groups: PlayerName [6]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 Trea Turner SS 34.1
## 2 Francisco Lindor SS 32.9
## 3 Javier Baez 2B/3B/SS 32
## 4 Alex Bregman 3B/SS 18.5
## 5 Trevor Story SS 4.62
## 6 Manny Machado SS -11.1
Outfield
head(s1[grep("OF", s1$POS), ], 20)
## # A tibble: 20 x 3
## # Groups: PlayerName [20]
## PlayerName POS mean_Dollars
## <fct> <fct> <dbl>
## 1 Mike Trout OF 195
## 2 Mookie Betts OF 166.
## 3 Bryce Harper OF 79.1
## 4 Christian Yelich OF 75.7
## 5 Ronald Acuna Jr. OF 54.4
## 6 Kris Bryant 3B/OF 52.6
## 7 Giancarlo Stanton OF/DH 43.1
## 8 Juan Soto OF 37.1
## 9 J.D. Martinez OF/DH 35.9
## 10 Charlie Blackmon OF 31.8
## 11 Rhys Hoskins OF 23.8
## 12 Cody Bellinger 1B/OF 17.0
## 13 Aaron Judge OF 15.4
## 14 Andrew Benintendi OF 9.15
## 15 Whit Merrifield 2B/OF 8.8
## 16 Joey Gallo 1B/OF -43.4
## 17 Wil Myers 3B/OF -62.0
## 18 Mitch Haniger OF -67.1
## 19 Nicholas Castellanos OF -70.6
## 20 Michael Conforto OF -79.6
Pitcher
as.data.frame(s1[grep("SP", s1$POS), ])[1:25, ]
## PlayerName POS mean_Dollars
## 1 Max Scherzer SP 331.87500
## 2 Jacob deGrom SP 276.32500
## 3 Chris Sale SP 241.45000
## 4 Justin Verlander SP 235.27500
## 5 Gerrit Cole SP 205.00000
## 6 Corey Kluber SP 145.27500
## 7 Blake Snell SP 138.52500
## 8 Aaron Nola SP 109.82500
## 9 Trevor Bauer SP 101.85000
## 10 Luis Severino SP 74.97500
## 11 German Marquez SP 62.87500
## 12 Carlos Carrasco SP 59.90000
## 13 Patrick Corbin SP 45.27500
## 14 Jose Berrios SP 22.17500
## 15 Chris Archer SP 20.55000
## 16 Jack Flaherty SP 6.57500
## 17 Noah Syndergaard SP 3.50000
## 18 Zack Greinke SP -2.27500
## 19 Walker Buehler SP -7.22500
## 20 Stephen Strasburg SP -8.30000
## 21 James Paxton SP -11.90000
## 22 Madison Bumgarner SP -20.10000
## 23 Robbie Ray SP -21.10000
## 24 Jameson Taillon SP -21.76667
## 25 Mike Clevinger SP -26.07500