Pidgey Evolution: Effects on Combat Power and Hit Points

I was evolving some Pokemon in Pokemon Go today and wondered how much the Pokemon’s attributes change after evolution. The attributes of interest were combat power (CP) and hit points (HP). I focused my analysis from Pidgeys because I had the most data points for this species.

I collected data from one day’s evolution, available for viewing here.

x <- read.csv("../datasets/evolution.csv")
x <- tbl_df(x)
pidgeys <- filter(x, pokemon == "Pidgey")
pidgeys
## # A tibble: 12 × 7
##    pokemon CP_pre HP_pre kg_pre CP_post HP_post kg_post
##     <fctr>  <int>  <int>  <dbl>   <int>   <int>   <dbl>
## 1   Pidgey    270     NA     NA     533      NA      NA
## 2   Pidgey    267     NA     NA     515      NA      NA
## 3   Pidgey    259     NA     NA     526      NA      NA
## 4   Pidgey    212     NA     NA     413      NA      NA
## 5   Pidgey    209     43   2.20     403      66   36.70
## 6   Pidgey    203     41   1.18     395      64   19.75
## 7   Pidgey    201     43   1.70     392      66   28.38
## 8   Pidgey    198     41   2.26     396      65   37.72
## 9   Pidgey    191     NA     NA     370      NA      NA
## 10  Pidgey    163     42   1.08     322      63   18.07
## 11  Pidgey    154     35   1.35     295      55   22.47
## 12  Pidgey    152     35   2.50     293      54   41.63

Question 1: What is the relationship between CP before and CP after evolution?

To explore what happened to CP before and after evolution, I plotted these on a graph.

library(ggplot2)
ggplot(pidgeys, aes(x = CP_pre, y = CP_post)) + geom_point(shape = 1) + 
  ggtitle("CP pre and post evolution")

The relationship was roughly linear with apparently random variations from the line.

I modeled the relationship using simple linear regression.

pidgey_CP_model <- lm(CP_post ~ CP_pre, data = pidgeys)
summary(pidgey_CP_model)
## 
## Call:
## lm(formula = CP_post ~ CP_pre, data = pidgeys)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.796  -2.865  -1.633   1.562  15.409 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -14.03520   11.58352  -1.212    0.253    
## CP_pre        2.02558    0.05509  36.770 5.27e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.484 on 10 degrees of freedom
## Multiple R-squared:  0.9927, Adjusted R-squared:  0.9919 
## F-statistic:  1352 on 1 and 10 DF,  p-value: 5.266e-12

Based on the data, the estimated multipler was 2.026 with a standard deviation of 0.055. The model explained roughly 99% of the variation in CP after evolution. Here is the model in equation form.

\[ CP_{post} = 2.02558 \times CP_{pre} - 14.0352 + \epsilon\]

Those numbers were close enough to 2 and -14 to speculate that the Pokemon Go programmers used whole numbers in determining how much CP you get afterwards.

There is a maximum CP for Pokemon, but this might suggest that there is also a minimum CP. If we assume that the programmers would not want the evolved form to have a lower CP than the original form, we could solve for the theoretical minimum CP of a Pidgey.

We set the \(CP_{post}\) to be equal to the \(CP_{pre}\) in the equation and solve for \(CP_{pre}\). Ignoring \(\epsilon\) for simplicity we get:

\[ CP_{pre} = 2.02558 \times CP_{pre} - 14.0352\] \[ -1.02558 \times CP_{pre} = - 14.0352\] \[ CP_{pre} = 13.68513\] Since collecting the original data, I have caught more Pidgeys with lower CP (minimum 21). I will update the analysis once I’ve done their evolutions.

Question 2: How does evolution affect HP?

To view the relationship between pre and post HP, I plotted HP before and after evolution. I did not have as many data points on this attribute.

ggplot(pidgeys, aes(x = HP_pre, y = HP_post)) + geom_point(shape = 1) + 
  ggtitle("HP pre and post evolution")
## Warning: Removed 5 rows containing missing values (geom_point).

There was also a roughly linear relationship but it did not appear to be as linear as the CP before and after.

pidgey_HP_model <- lm(HP_post ~ HP_pre, data = pidgeys)
summary(pidgey_HP_model)
## 
## Call:
## lm(formula = HP_post ~ HP_pre, data = pidgeys)
## 
## Residuals:
##       5       6       7       8      10      11      12 
## -0.1544  0.7104 -0.1544  1.7104 -1.7220  0.3050 -0.6950 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.5598     5.5276   0.825 0.446970    
## HP_pre        1.4324     0.1377  10.400 0.000142 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.185 on 5 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.9558, Adjusted R-squared:  0.947 
## F-statistic: 108.2 on 1 and 5 DF,  p-value: 0.0001416

From these observations it looks like the model is:

\[HP_{post} = 1.4324 \times HP_{pre} + 4.5598 + \epsilon\]

As I wrote above for the CP analysis, I have collected more Pidgeys now and will update this analysis once I have evolved them.

Discussion

There are some other people who have done similar analyses on the relationship between pre and post evolution CP. There are even calculators that provide estimates of the post evolution CP. Here’s one example.

I was not able to find HP evolution analysis, so expanding this analysis could be of novel interest.

Here’s a data set that could be used for cross validation: https://www.openintro.org/stat/data/?data=pokemon