Adults and Children on the Move

Visualising UNICEF Data on Migration in Africa
Data Analysis
Web Scraping
Networks
R
Author
Published

August 7, 2024

UNICEF Data and Migration

According to the International Organization for Migration (IOM), there were approximately 280 million international migrants in the world in 2020. Roughly 36 million were children. That’s a lot of kids on the move! This post illustrates their spacial distribution, and presents some broad brush descriptions about international international migration patterns in Africa, a continent expected to experience substantial demographic growth in the coming decades. In particular, we will focus on where international migrants are located within Africa rather than the destinations of African migrants across the globe.

I use country-level data from the UNICEF Data Warehouse. The reason behind this choice is that it allows to separate total migration into two components: below and above 18 years of age. I think that this distinction is useful because children are somewhat overshadowed in migration debates: they are a vulnerable population that mostly moves as a result of somebody else’s decision, yet they bear the burnt of it. The data split information on migrant stocks (not flows) in 5-year bins from 1990 to 2020. The difference between flows and stocks is relevant in many economic disciplines: imagine stocks as a series of pictures that freeze any event in time, and think of flows as what changes between these pictures: pretty much like a spot the difference puzzle!

After doing some light computations, I make the data fit into the following format. I also pull some information on the relative size of the international migrants’ community relative to each country’s population.

Code
# getwd() # Know what your working directory is 
# setwd("/Users/gasparetortorici/Desktop/")  # Set a working directory
 
library(tidyverse)  
library(tidytext)  
library(rvest)  
library(wordcloud)  
library(RColorBrewer)  
library(widyr)  
library(textstem)  
library(data.table)  
library(sf)  
library(ggthemes)  
 
remove(list = ls(all = T)) # Clean up the environment 
 
a <- fread("africa_df.csv") # UNICEF data is chunky! Read it fast with fread() from the data.table package
Country Code Country Year Unit Age Value Where
1 KEN Kenya 2020 Absolute Total 1050 Africa
2 KEN Kenya 2020 Absolute Under 18 297 Africa
3 KEN Kenya 2020 Absolute Over 18 753 Africa
4 KEN Kenya 2020 % Population Total 2 Africa

It is worth mentioning that these data have a few limitations: they do not take into account internal migrants, effectively understating the actual number of migrants within each country, especially in urban areas; the categorisation below/above 18 is quite rough because 2-year-olds invariably move with their tutors, while 16- or 17-year-olds may move independently; there is, unfortunately, no gender dimension. As far as I understand, refugees are not included in these figures, even though sometimes lines can be blurred.

An Aggregate Picture

Code
a <- a %>% 
     filter(nchar(country_code) < 4) 

total <- a %>%
         group_by(year, age, where) %>%
         summarise(sum = sum(value, na.rm = T)) %>%
         ungroup() %>%
         pivot_wider(names_from = age, values_from = sum) %>%
         mutate(across(matches("18"), ~ . / total * 100, .names = "share_{.col}")) %>%
         mutate(ratio_under_over = under_18/over_18) %>%
         mutate(across(c("over_18", "total", "under_18"), ~ ./10000))

total$where <- ifelse(total$where == "africa", "Africa", "Rest of the world")

total <- total %>%
         select(- ratio_under_over) %>%
         pivot_longer(names_to = "key", values_to = "value", 3:7) %>%
         mutate(type = ifelse(grepl("share", key), "Shares", "Absolute Numbers")) %>%
         filter(key != "total") %>%
         mutate(key = case_when(key == "over_18" ~ "Over 18",
                                key == "share_over_18" ~ "Share over 18",
                                key == "share_under_18" ~ "Share under 18",
                                TRUE ~ "Under 18"))

total$key <- fct_relevel(total$key, "Over 18", "Under 18", "Share over 18", "Share under 18")

total %>%
    ggplot(aes(x = year, y = value, fill = key)) +
      geom_bar(stat = "identity", position = "stack", alpha = 0.75, size = .15) +
      scale_fill_manual(values = c("#003e1f", "#b2df8a",  "#f9c22e", "#001524")) +
      labs(x = "Year", y = "Percent | Million International Migrants", fill = "") +
      theme_minimal() +
      theme(legend.position = "bottom") +
      facet_wrap(type ~ where, scales = "free_y")

We divide the data into African and non-African countries. While international migration is increasing globally, African countries still have a relatively low share of international migrants – only about 8.5% in 2020, as seen by comparing the scales in the top panels. For context, the United States alone hosts approximately 45 million foreign-born individuals.

Interestingly, migration trends for adults and children tend to align, as families often migrate together. However, in Africa, children make up a larger proportion of international migrants (25% to 30%) compared to the rest of the world. This share shows a slight but noticeable decline over time. I included both absolute and relative stacked bars to illustrate this trend better.

Adults and Children Migrants Across Africa

We now generate two choropleth maps of Africa, separating adult and children migrants in 2020. The goal is to examine the correlation between these two groups across the whole continent. Countries with larger migrant populations are darker.

Code
crs <- 4326 

shp_africa <- st_read(dsn = "world-administrative-boundaries/world-administrative-boundaries.shp", quiet = T) %>%
              st_transform(crs = crs) %>%
              filter(continent == "Africa") %>%
              rename(country_code = iso3)

shp_world <- st_read(dsn = "world-administrative-boundaries/world-administrative-boundaries.shp", quiet = T) %>%
             st_transform(crs = crs) 

# 0ver 18

africa_over <- a %>%
               filter(year == 2020, age == "over_18", unit != "percent", where == "africa") %>%
               mutate(value = value/1000)

shp_africa$value <- africa_over$value[match(shp_africa$country_code, africa_over$country_code, nomatch = NA)]

ggplot() +
  geom_sf(data = shp_africa, size = .25, color = "black", aes(fill = value)) +
  geom_sf(data = shp_world, size = .25, color = "black", fill = NA) +
  scale_fill_gradient("Million International Migrants (Adults)", low = "#b2df8a", high = "#003e1f") +
  theme_map() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "pt"),
        legend.position = "bottom") +
  coord_sf(crs = "+proj=laea +lat_0=0 +lon_0=20 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs")

Code
# Under 18

africa_under <- a %>% 
                filter(year == 2020, age == "under_18", unit != "percent", where == "africa") %>%
                mutate(value = value/1000)

shp_africa$value <- africa_under$value[match(shp_africa$country_code, africa_under$country_code, nomatch = NA)]

ggplot() +
  geom_sf(data = shp_africa, size = .25, color = "black", aes(fill = value)) +
  geom_sf(data = shp_world, size = .25, color = "black", fill = NA) +
  scale_fill_gradient("Million International Migrants (Children)", low = "#b2df8a", high = "#003e1f") +
  theme_map() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "pt"),
        legend.position = "bottom") +
  coord_sf(crs = "+proj=laea +lat_0=0 +lon_0=20 +x_0=0 +y_0=0 +ellps=GRS80 +units=m +no_defs")

While most countries are shaded similarly in both maps, there are some glaring exceptions. This is interesting because it might hint at the fact that there may be large numbers of unaccompanied children: a notable example is Uganda, where this phenomenon is particularly evident.

Absolute and relative

Would this ranking change if absolute numbers were scaled by population? The stock of migrants in South Africa and Côte d’Ivoire is comparable, but the latter has half the people. What are the countries that have the largest share of migrants as a percentage of their population? This variable is only readily available at the aggregate level. Let us plot that for 2020.

Code
# Relative to population

africa_comparison <- a %>% filter(year == 2020, age == "total", where == "africa")

africa_comparison <- shp_africa %>%
                     select(- value) %>%
                     full_join(africa_comparison, by = "country_code") %>%
                     mutate(value = ifelse(country_code == "MYT", NA, value)) %>%
                     st_as_sf()

# We need to split the data because scales are different (absolute and percent)
# Facets won't do the job

percentage <- africa_comparison %>% filter(unit == "percent")

ggplot() +
  geom_sf(data = percentage %>% filter(!is.na(unit)), size = .25, color = "black", aes(fill = value)) +
  geom_sf(data = shp_world %>% filter(continent == "Africa"), size = .25, color = "black", fill = NA) +
  scale_fill_gradient("Percentage of Country's Population", low = "#f9c22e", high = "#001524") +
  theme_map() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "pt"),
        legend.position = "bottom") 

These are the countries sorted by percentage of immigrant relative to population.

Rank Year Country Percentage of Country’s Population
1 2020 Gabon 19%
2 2020 Equatorial Guinea 16%
3 2020 Seychelles 13%
4 2020 Djibouti 12%
4 2020 Libya 12%
5 2020 Côte d’Ivoire 10%
6 2020 Gambia 9%
7 2020 South Sudan 8%
8 2020 Congo 7%
9 2020 Botswana 5%
Code
absolute <- africa_comparison %>% filter(unit == "absolute") %>% mutate(value = value/1000)

ggplot() +
  geom_sf(data = absolute %>% filter(!is.na(unit)), size = .25, color = "black", aes(fill = value)) +
  geom_sf(data = shp_world %>% filter(continent == "Africa"), size = .25, color = "black", fill = NA) +
  scale_fill_gradient("Million International Migrants", low = "#f9c22e", high = "#001524") +
  theme_map() +
  theme(plot.margin = unit(c(0, 0, 0, 0), "pt"),
        legend.position = "bottom") 

The following table focuses on the first 10 countries.

Code
africa_comparison <- a %>% filter(year == 2020, age == "total", where == "africa")

first_absolute <- africa_comparison %>%
                  filter(unit == "absolute") %>% 
                  mutate(ranking = dense_rank(desc(value))) %>%
                  arrange(ranking) %>% 
                  slice(1:10)

first_relative <- africa_comparison %>%
                  filter(unit == "percent") %>% 
                  filter(country_code != "MYT") %>%
                  filter(country_code != "REU") %>%
                  mutate(ranking = dense_rank(desc(value))) %>%
                  arrange(ranking) %>% 
                  slice(1:10)
Rank Year Country Million International Immigrants
1 2020 South Africa 2.860
2 2020 Côte d’Ivoire 2.565
3 2020 Uganda 1.720
4 2020 Sudan 1.379
5 2020 Nigeria 1.309
6 2020 Ethiopia 1.086
7 2020 Kenya 1.050
8 2020 Democratic Republic of the Congo 9.53
9 2020 South Sudan 8.82
10 2020 Libya 8.27

Visualising Things Differently: A Waffle Chart

In my own experience, whenever one has to deal with more than 5 to 10 countries at the time, graphs get messy real quick. As we have seen above, maps are a great tool: not only they look great, but they allow to detect geographical patterns that might be missed upon first inspection. However, unless one used facets or plotly/shiny animations, the time dimension would be hard to capture. Waffle charts can be quite handy in these cases. Imagine – ehm… – a waffle: each vertical square is a 5-year bin, each horizontal square is a country. If you shade squares according to how many migrants there are in each period across countries, you end up with a neat visualization of how countries compare to each other over time, and how migration has evolved. If you want to squeeze in another bit of useful information, you might want to reorder the horizontal axis by the size of migration stock. Here is the result.

Code
waffle <- a %>% 
          filter(unit == "absolute", age == "total", where == "africa") %>%
          select(country_string, year, value) 

# lapply(waffle, class)

custom_levels <- c(unique(waffle$country_string))

waffle$country_string <- factor(waffle$country_string, levels = custom_levels)

ranking <- waffle %>%
           filter(year == 2020) %>%
           arrange(desc(value)) %>%
           mutate(ranking = dense_rank(desc(value)))

waffle <- waffle %>%
          mutate(ranking = ranking$ranking[match(country_string, ranking$country_string, nomatch = NA)]) %>%
          mutate(Value = cut(value, 
                             breaks = quantile(value, probs = seq(0, 1, by = 0.1), na.rm = TRUE),
                             labels = paste0("Q", 1:10),
                             include.lowest = TRUE))

color_scale <- colorRampPalette(c("#b2df8a", "#003e1f"))(10)

gh_waffle <- function(data){

  p <- ggplot() +
    geom_tile(data = data,
              aes(y = factor(year), x = reorder(factor(country_string), ranking), fill = Value),
              color = 'white', size = .15) +
    scale_fill_manual(values = color_scale, guide = guide_legend(nrow = 1)) +
    theme_tufte(base_family = 'Palatino') +
    theme(axis.title = element_blank(),
          axis.ticks = element_blank(),
          legend.position = 'bottom',
          axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
    coord_fixed()

  print(p)}

# gh_waffle(waffle)
# ggsave("waffle.png", height = 6, width = 10)

Citation

BibTeX citation:
@online{tortorici2024,
  author = {Tortorici, Gaspare},
  title = {Adults and {Children} on the {Move}},
  date = {2024-08-07},
  url = {https://www.gasparetortorici.info/posts/07_08_2024_post/website_article_unicef.html},
  langid = {en}
}
For attribution, please cite this work as:
Tortorici, Gaspare. 2024. “Adults and Children on the Move.” August 7, 2024. https://www.gasparetortorici.info/posts/07_08_2024_post/website_article_unicef.html.