I was evolving some Pokemon in Pokemon Go and wondered how much the Pokemon’s attributes change after evolution. The attributes of interest were combat power (CP) and hit points (HP). I focused my analysis from Pidgeys because I had the most data points for this species.
I collected data from a couple of days’ evolution, available for viewing here. Both times I was using a Lucky Egg.
x <- read.csv("../datasets/evolution2.csv")
x <- tbl_df(x)
pidgeys <- filter(x, pokemon == "Pidgey")
pidgeys
## # A tibble: 55 × 7
## pokemon CP_pre HP_pre kg_pre CP_post HP_post kg_post
## <fctr> <int> <int> <dbl> <int> <int> <dbl>
## 1 Pidgey 270 NA NA 533 NA NA
## 2 Pidgey 267 NA NA 515 NA NA
## 3 Pidgey 259 NA NA 526 NA NA
## 4 Pidgey 212 NA NA 413 NA NA
## 5 Pidgey 209 43 2.20 403 66 36.70
## 6 Pidgey 203 41 1.18 395 64 19.75
## 7 Pidgey 201 43 1.70 392 66 28.38
## 8 Pidgey 198 41 2.26 396 65 37.72
## 9 Pidgey 191 NA NA 370 NA NA
## 10 Pidgey 163 42 1.08 322 63 18.07
## # ... with 45 more rows
Question 1: What is the relationship between CP before and CP after evolution?
To explore what happened to CP before and after evolution, I plotted these on a graph.
ggplot(pidgeys, aes(x = CP_pre, y = CP_post)) + geom_point(shape = 1) +
ggtitle("CP pre and post evolution")
The relationship was roughly linear. I modeled the relationship using simple linear regression.
pidgey_CP_model <- lm(CP_post ~ CP_pre, data = pidgeys)
summary(pidgey_CP_model)
##
## Call:
## lm(formula = CP_post ~ CP_pre, data = pidgeys)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.8820 -3.3266 -0.9626 3.2191 19.6111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.780734 1.573649 0.496 0.622
## CP_pre 1.952155 0.008776 222.438 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.153 on 53 degrees of freedom
## Multiple R-squared: 0.9989, Adjusted R-squared: 0.9989
## F-statistic: 4.948e+04 on 1 and 53 DF, p-value: < 2.2e-16
Based on the data, the estimated multipler was 1.95 with a standard deviation of 0.009. The model explained roughly 99.9% of the variation in CP after evolution. Here is the model in equation form.
\[ CP_{post} = 1.925 \times CP_{pre} + 0.781 + \epsilon\]
Question 2: How does evolution affect HP?
To view the relationship between pre and post HP, I plotted HP before and after evolution.
ggplot(pidgeys, aes(x = HP_pre, y = HP_post)) + geom_point(shape = 1) +
ggtitle("HP pre and post evolution")
## Warning: Removed 5 rows containing missing values (geom_point).
There was also a roughly linear relationship but it did not appear to be as tightly linear as the CP before and after.
pidgey_HP_model <- lm(HP_post ~ HP_pre, data = pidgeys)
summary(pidgey_HP_model)
##
## Call:
## lm(formula = HP_post ~ HP_pre, data = pidgeys)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2680 -0.6441 -0.0057 0.8595 2.5623
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.12035 0.51552 -0.233 0.816
## HP_pre 1.53883 0.01414 108.806 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.245 on 48 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.996, Adjusted R-squared: 0.9959
## F-statistic: 1.184e+04 on 1 and 48 DF, p-value: < 2.2e-16
From these observations it looks like the model is:
\[HP_{post} = 1.539 \times HP_{pre} -0.120 + \epsilon\]
This model explained >99% of the variance in HP after evolution.
Question 3: What is the relationship between CP and HP?
I plotted the relationship between CP and HP for Pidgeys.
ggplot(pidgeys, aes(x = CP_pre, y = HP_pre)) + geom_point(shape = 1) +
ggtitle("Pidgey CP and HP Relationship")
## Warning: Removed 5 rows containing missing values (geom_point).
This did not look like a linear relationship at all. I tried to fit several models. First we tried the linear one as a baseline. This was actually pretty good, although the diagnostics don’t look great.
pidgey_CP_HP_model <- lm(HP_pre ~ CP_pre, data = pidgeys)
summary(pidgey_CP_HP_model)
##
## Call:
## lm(formula = HP_pre ~ CP_pre, data = pidgeys)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.105 -1.619 0.620 1.872 5.252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.821364 0.764216 20.70 <2e-16 ***
## CP_pre 0.128385 0.004448 28.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.965 on 48 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.9455, Adjusted R-squared: 0.9444
## F-statistic: 832.9 on 1 and 48 DF, p-value: < 2.2e-16
plot(pidgey_CP_HP_model)
I next tried a logarithmic model. The diagnostics on this weren’t great either. The R squared was actually even worse than the linear model.
pidgey_logCP_HP_model <- lm(HP_pre ~ log(CP_pre), data = pidgeys)
summary(pidgey_logCP_HP_model)
##
## Call:
## lm(formula = HP_pre ~ log(CP_pre), data = pidgeys)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.224 -2.286 -1.111 2.047 10.300
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -32.8371 2.8319 -11.60 1.6e-15 ***
## log(CP_pre) 14.3065 0.5944 24.07 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.514 on 48 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.9235, Adjusted R-squared: 0.9219
## F-statistic: 579.2 on 1 and 48 DF, p-value: < 2.2e-16
plot(pidgey_logCP_HP_model)
I then tried a square root model. The R squared was much better, and the diagnostics looked better.
pidgey_sqrtCP_HP_model <- lm(HP_pre ~ I(CP_pre^(0.5)), data = pidgeys)
summary(pidgey_sqrtCP_HP_model)
##
## Call:
## lm(formula = HP_pre ~ I(CP_pre^(0.5)), data = pidgeys)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.424 -1.178 -0.116 1.035 3.611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01716 0.66486 -0.026 0.98
## I(CP_pre^(0.5)) 3.04014 0.05548 54.799 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.593 on 48 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.9843, Adjusted R-squared: 0.9839
## F-statistic: 3003 on 1 and 48 DF, p-value: < 2.2e-16
plot(pidgey_sqrtCP_HP_model)
Here’s the model in equation form:
\[HP = 3.04 \times \sqrt{CP} - 0.017 + \epsilon\] Those numbers were so close to round numbers that we could say:
\[HP \approx 3 \times \sqrt{CP}\]
Discussion
There appears to be a linear relationship between pre and post evolution CP and pre and post evolution HP. The relationship between CP and HP appears to be exponential.
There are some other people who have done similar analyses on the relationship between pre and post evolution CP. There are even calculators that provide estimates of the post evolution CP. Here’s one example.
The original version of this paper had a negative intercept in the CP evolution model. This model’s intercept is not significantly different from 0. The implication of the original negative intercept was that one might have a Pigeotto that had lower CP than the original Pidgey. With the updated model, this is no longer possible.
I was not able to find other HP evolution analysis online. It appears that at least for Pidgeys, there is a simple linear relationship between pre and post evolution HP.
Here’s a data set that could be used for cross validation: https://www.openintro.org/stat/data/?data=pokemon