Introduction

Note: This vignette was updated August, 2020 to reflect changes to the UNHCR Data API, available datasets, and subsequent streamlining of the untools package.

The Office of the United Nations High Commissioner for Refugees (UNHCR) provides several data sets describing annual movements of populations of concern. These include asylum seekers, asylum application data, asylum application decisions, refugees, internally displaced persons (UNHCR and IDMC tracked), returned refugees, returned internally displaced persons, stateless persons, Palestinian refugees, Venezuelans displaced abroad, and other populations of concern. The UNHCR Refugee Data Finder web portal serves as a central hub for several data sets summarizing the aforementioned populations by year, month, gender, age, origin, and destination. The web portal is a fine exploratory tool, but can be cumbersome for research purposes. Today we will be summarizing and exploring the UNHCR Population data for the most common UNHCR dataset that tracks refugees and asylum seekers. This dataset consists of annual dyadic flows for all populations of concern between countries of origin (citizenship) and countries of destination (asylum/residency). Although the earliest years of record are 1951, exhibit caution when performing analysis and causal inference for years prior to 1990.

Getting Started

Installing `untools`

The Populations of Concern dataset can be acquired directly using the getUNref() function from the untools package. You can install the current release of untools from GitLab with the devtools package.

devtools::install_gitlab("/dante-sttr/untools", dependencies=TRUE)

Acquiring the Data

Load both untools and data.table for some light data wrangling. Then use getUNref() to download the most recent data from the UNHCR API.

library('data.table')
library('untools')

unhcr.ts<-untools::getUNref()

In addition to the getUNref() function we demonstrate in this vignette, the dataset is available for download directly from the UNHCR Refugee Data Finder web portal. The Time Series dataset is one of the cleanest data sets provided by the UNHCR, however, it might be challenging for a beginning programmer. The untools package is designed to provide simplified tools for data acquisition, processing, and visualization for popular United Nations data sets.

year	coo_id	coo_name	coo	coo_iso	coa_id	coa_name	coa	coa_iso	refugees	vda
1951	262	Unknown	UKN	NULL	11	Australia	AUL	AUS	180000	NA
1951	262	Unknown	UKN	NULL	12	Austria	AUS	AUT	282000	NA
1951	262	Unknown	UKN	NULL	17	Belgium	BEL	BEL	55000	NA
1951	262	Unknown	UKN	NULL	33	Canada	CAN	CAN	168511	NA
1951	262	Unknown	UKN	NULL	50	Denmark	DEN	DNK	2000	NA

This is a fairly simple dataset, consisting of the year of observation (year), country of origin (coo, coo_name), destination/asylum country (coa, coa_name), and fields for the different population types (refugees, asylum_seekers, returned_refugees, idps, returned_idps, stateless, ooc, and vda). Previous iterations of this dataset included numerous special characters, unusual formatting, and other issues not conducive to programmatic research and analysis, however, the July 2020 revision of the UNHCR datasets has largely removed these concerns. Please refer to the getUNref helpfile for more information detailing the different populations of concern and additional fields not addressed in this vignette.

names(unhcr.ts)
#>  [1] "year"              "coo_id"            "coo_name"         
#>  [4] "coo"               "coo_iso"           "coa_id"           
#>  [7] "coa_name"          "coa"               "coa_iso"          
#> [10] "refugees"          "asylum_seekers"    "returned_refugees"
#> [13] "idps"              "returned_idps"     "stateless"        
#> [16] "ooc"               "vda"

untools Functions for UNHCR Data

The prepUNref Function

The untools package provides several functions for processing and visualizing UNHCR data. The prepUNref() function will help process raw UNHCR time series data by converting to wide or long form, selecting years of specific interest, selecting populations of interest, and summing across groups. Using prepUNref() with no additional parameters will subset the data for only refugee and asylum_seekers, and convert the data from wide to long form that is more conducive to visualization and analysis.

unhcr.ts.dante<-prepUNref(unhcr.ts)

year	coo_id	coo_name	coo	coo_iso	coa_id	coa_name	coa	coa_iso	type	persons
1951	262	Unknown	UKN	NULL	11	Australia	AUL	AUS	refugees	180000
1951	262	Unknown	UKN	NULL	12	Austria	AUS	AUT	refugees	282000
1951	262	Unknown	UKN	NULL	17	Belgium	BEL	BEL	refugees	55000
1951	262	Unknown	UKN	NULL	33	Canada	CAN	CAN	refugees	168511
1951	262	Unknown	UKN	NULL	50	Denmark	DEN	DNK	refugees	2000

There are several records with Unknown as the country of origin. While these are not trivial, for this exploration we will focus on known dyadic flows between countries.

unhcr.ts.dante<-unhcr.ts.dante[!(coo=='UKN')]

year	coo_id	coo_name	coo	coo_iso	coa_id	coa_name	coa	coa_iso	type	persons
1960	6	Angola	ANG	AGO	41	Dem. Rep. of the Congo	COD	COD	refugees	150000
1961	161	Rwanda	RWA	RWA	16	Burundi	BDI	BDI	refugees	30000
1961	6	Angola	ANG	AGO	41	Dem. Rep. of the Congo	COD	COD	refugees	150000
1961	161	Rwanda	RWA	RWA	41	Dem. Rep. of the Congo	COD	COD	refugees	53000
1961	161	Rwanda	RWA	RWA	186	United Rep. of Tanzania	TAN	TZA	refugees	12000

By default, prepUNref() selects all years and all affected populations, but the user can specify populations and years of interest by using the groups and range options. For example, specifying groups = c('refugees') and range = c(2000,2017) will only process refugees between 2000 and 2017.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('refugees'), range = c(2000, 2017))

Lastly, prepUNref() provides 2 additional logical switches; wide and sum. By default, prepUNref() returns long data frames. This is most convenient for plotting and modeling, however, sometimes it’s interesting to explore data in wide form; especially time series data sets. Moreover, the sum_groups option will aggregate the totals across all specified groups. Lets use these 2 switches to look at the sum of Syrian refugee and asylum seeking out-flows to Germany between 2014-2017 using wide = TRUE.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('refugees', 'asylum_seekers'), range = c(2014, 2017), sum_groups=TRUE, wide=TRUE)

coo_id	coo_name	coo	coo_iso	coa_id	coa_name	coa	coa_iso	2014	2015	2016	2017
185	Syrian Arab Rep.	SYR	SYR	72	Germany	GFR	DEU	70585	197186	475649	567507

Static Grouped Flows

With more than 100,000 unique country-country-year records, outflows, inflows, and varying populations of interest, visualizing the UNHCR can be overwhelming. The untools packages provides multiple default plotting functions objects produced by the prepUNref() function. An easy launching point to investigate flows between countries are static barplots of dyadic flows in or out of a target country. Using plot() on an object produced by prepUNref() with sum_groups = TRUE will produce a barplot for the target country and the top 8 destination or origin countries. The user specifies the country of interest, a year of interest, and whether they want to view inflows (mode = 'in') or outflows (mode = 'out'). Let’s start by viewing asylum seeking inflows to the United States in 2013.

unhcr.ts.dante<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), sum_groups = TRUE, range = c(2012, 2017))
usa.in<-plot(unhcr.ts.dante, country = 'USA', mode = 'in', yr = c(2013, 2013))

Somewhat surprisingly, China tops the list, while Central America rounds out the rest of the top 5. By default, plot() will list up to 8 countries and will use the maximum year in the prepared dataset if no other year(s) are specified. We can view asylum seeking outflows from the Philippines in 2019 with a simple call.

unhcr.ts.dante<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers'), range = c(2012, 2019))
phl.out<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out')

Stacked Static Flows by Population Type

Up until this point we’ve visualized cumulative migrant flows across all groups or a singular group, but it may be of interest to examine relative proportions of asylum seekers, refugees, and stateless persons. The untools package provides default plotting functions to visualize stacked bar charts of migrant inflows or outflows by groups. Let’s re-examine inflows of migrants to the USA in 2017, but this time include breakdowns by type. To maintain effected population breakdowns specify sum_groups = FALSE.

unhcr.stacked<-untools::prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees','stateless'), range = c(2000, 2017), sum_groups = FALSE)
usa.stacked.in<-plot(unhcr.stacked, country = 'USA', mode = 'in')

Plotting Time Series

Although static plots of migrant flows are interesting, it’s often more illuminating to examing time series data for migrant inflows and outflows. The untools package also provides default plotting functions to visualize time series migrant flows for data frames produced with prepUNref() using sum_groups = TRUE. The default plotting function will produce a plot for all years present in the raw data using the 5 countries with the highest totals in the maximum year of the dataset. Let’s view annual cumulative refugee and asylum seeking in-flows to the USA from 2000-2017.

unhcr.ts.dante<-prepUNref(unhcr.ts, groups = c('asylum_seekers', 'refugees'), range = c(2000, 2017), sum_groups = TRUE)
usa.ts.in<-plot(unhcr.ts.dante, country = 'USA', mode = 'in')

Lastly, similar to the static default plotting functions, we can specify mode = 'out' to view outflows from a given country.

phl.ts.out<-plot(unhcr.ts.dante, country = 'PHL', mode = 'out')

Exploring United Nations Refugee & Asylum Data With the untools Package

Joshua Brinks

2021-02-05

Introduction

Getting Started

Installing `untools`

Acquiring the Data

untools Functions for UNHCR Data

The prepUNref Function

Static Grouped Flows

Stacked Static Flows by Population Type

Plotting Time Series

Exploring United Nations Refugee & Asylum Data With the untools Package

Joshua Brinks

2021-02-05

Introduction

Getting Started

Installing untools

Acquiring the Data

untools Functions for UNHCR Data

The prepUNref Function

Static Grouped Flows

Stacked Static Flows by Population Type

Plotting Time Series

Installing `untools`