library(duplicator)

Introduction

This is the 4th installment in our series replicating the core components of Missirian and Schenkler’s 2017: Asylum applications respond to temperature fluctuations. In our first entry, we introduced the premise of their analysis and several outstanding questions after reading the supplementary materials describing the methods. In the second installment we carried out data pre-processing by calculating weighted mean surface temperature and total precipitation by country and summarized outgoing annual asylum application from the authors study countries. In the third entry we carried out some visual exploration of the data to mimic some of the author’s manuscript figures and verify we generated a similar dataset after pre-processing. In our final entry in the replication series, we will attempt to reproduce the author’s core model presented in the manuscript and provide some additional goodness of fit tests.

Missirian and Schenkler’s Model

Similar to other sections of the author’s supplementary materials, there is an extensive narrative describing the procedures, yet no explicit specification of the detailed methods. The authors detail the economic model of migratory decisions that formulates the theoretical backdrop for their applied model, however, they do not specify the parameterization of the core model presented in the main article.

The supplementary materials states:

We run a fixed effect model linking asylum applications to the European Union to weather in the source country. Specifically, the model is:

\[y_{st} = \alpha_s + \sum_{\tau=0}^l W_s(t-\tau)\beta_\tau+\gamma_\tau+\epsilon_{st}\]

Log asylum applications \(y_{st}\) from source country \(s\) to any member of the European Union in year \(t\) are regressed on source-country fixed effects αs to account for baseline differences as well the weather \(W_{st} = [W_{st1}, . . . ,W_{st2}]\) (\(k\) weather variables) in source country \(s\) in year \(t\). Since we are including source-country fixed effects, our regression only uses the random and exogenous year-to-year variation in weather (anomalies) in the identification but does not rely on baseline difference (e.g., different forms of government might result in different average number of refugees fleeing a country). This is preferable for obtaining causal estimates of the relationship…We include year fixed effects \(\gamma_\tau\) as temporal controls to adjust for common shocks like the 2008 financial crisis. Error term \(\epsilon_{st}\) uses White’s correction (robust standard errors).

My interpretation of this model given the metrics we calculated during our data pre-processing that comprise our model.dat object is:

\[y_{st} = apps\]

\[\alpha_{st} = apps.base, temp.base, precip.base\]

\[\sum_{\tau=0}^l W_s(t-\tau) = temp.anom, precip.anom, temp.sq.anom, precip.sq.anom\]

\[\gamma_\tau = year\]

While \(\epsilon_{st}\) are model estimated “robust” errors. The error term leads us to our next question. Despite my intuition we can not be certain of their actual model specification, nor the programs and packages used to implement the model. Not all models are implemented the same across common statistical software. Although this is more common with more complex frameworks like random effects and time series models, even simple linear and general linearized models may have varying default setting and options. In this particular instance, “robust” errors are a default feature of STATA that require more effort to replicate in R. “Robust errors” has multiple meanings; they most commonly refer to heteroskedasticity and White’s Standard Errors, but the term is also used to describe autocorrelated errors, or both heteroskedastic and autocorrelated errors. The authors do not provide residual plots, predicted vs. observed, or out of sample predictive plots so we have little information regarding the goodness of fit or appropriateness of their model and the effect of utilizing “robust” error structures. The only modeling diagnostics are p-values and R2 values listed in tables S1–S6; both of which are widely regarded by the modern applied statistical community as being largely worthless. Table S1 only presents model estimated coefficients and significance levels for the temperature and precipitation terms while not presenting any information pertaining to the additional terms implied by the narrative of their econometric model of migration. Lastly, it’s also unclear why Table S1 that while the title and narrative focus of the manuscript focuses on temperature anomalies, the presented final model curves and coefficient tables only present mean temperature and precipitaion values. This is not meant to be a criticism of the methods they employ, but rather to present the impossibility of replication with the information provided.

Implementing Their Model With R

We’ll start by attempting to implement and evaluate Missirian and Schenkler’s model with base R’s lm() function.

core.lm <- lm(apps ~ temp.mean + temp.sq.mean + year + temp.base + apps.base, data = duplicator::model.dat)
summary(core.lm)
#> 
#> Call:
#> lm(formula = apps ~ temp.mean + temp.sq.mean + year + temp.base + 
#>     apps.base, data = duplicator::model.dat)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.5854 -0.4129 -0.0109  0.4035  2.4936 
#> 
#> Coefficients:
#>                Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  -7.138e+01  8.158e+00  -8.750   <2e-16 ***
#> temp.mean     1.519e-02  4.228e-02   0.359    0.719    
#> temp.sq.mean  1.000e-04  5.651e-04   0.177    0.860    
#> year          3.559e-02  4.059e-03   8.768   <2e-16 ***
#> temp.base    -1.965e-02  3.493e-02  -0.563    0.574    
#> apps.base     9.998e-01  8.762e-03 114.112   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.6803 on 1539 degrees of freedom
#> Multiple R-squared:  0.8986, Adjusted R-squared:  0.8983 
#> F-statistic:  2728 on 5 and 1539 DF,  p-value: < 2.2e-16
plot(core.lm)