<- read_rds("data/Sweden_trade_exchange/plotly_data.rds")
df_plotly
<- df_plotly %>%
c ggplot(aes(trade_value, exchange_students)) +
geom_point(aes(size = pop_est, colour = Continent)) +
geom_smooth(group = 1, method = "lm", se = F) +
geom_text_repel(aes(trade_value, exchange_students, label = name), alpha = .7, cex = 3) +
scale_y_log10() +
scale_x_log10(labels = dollar_format(), limits = c(100000,NA)) +
scale_size_continuous(labels = scales::comma_format()) +
scale_color_brewer(palette = "Paired") +
geom_hline(yintercept = 0, lty = 2) +
labs(title = "Correlation between Sweden's imports of goods and exchange students",
x = "Value of imports into Sweden in 2018 (USD)",
y = "Number of exchange students in Lund Uni Whatsapp group",
size = "Population")
c
Swedish trade in goods and exchange students
I wanted to see if there was an association between the volume of trade between a country and Sweden, and the number of exchange students that have come to Lund Univesity from there, this semester.
Purpose
In a presentation last week organized by the Laboratory for the Economics of Africa’s Past I learned about the persistence of economic connectivity between areas from the Roman era until today. It was fascinating. The presenter, Prof. Erik Hornung, mentioned offhand that the choice of where to spend a semester exchange was likely a function of the economic and social connections between your home country and your host country – whether you know someone there, have family, or are familiar due to cultural similarities.
I am in Sweden this semester, on an exchange at Lund University, in the South of the country. This is due in part to my supervisor’s connections with the economic historians here.
I wanted to see if I could back up this assertion about exchange students with some data. More specifically, I wanted to see if there was an association between the volume of trade between a country and Sweden, and the number of exchange students that have come to Lund from there, this semester.
As you see in the graphic below, there appears to be a strong positive correlation between Swedish imports of goods and imports of exchange students at Lund.
The most populous European nations of Germany and France lead the rankings, with 46 and 37 exchange students respectively. The largest non-European senders are China and India, followed by the United States. South Africa has the largest representation for Africa, with 4 students.
Read on to see the creation of the graphic.
Data
To answer my question, I link two pieces of data: trade statistics on Sweden’s imports, and the nationalities of the incoming exchange students at Lund.
Trade data
The first I downloaded from an amazing trade tool called the Observatory of Economic Complexity (Linked here). It has wonderful visualizations of trade by type of goods and by country. Have a look at this tree map below for an example.
The tree map shows the origins of Sweden’s imports by value in 2018. Sweden cares about limiting carbon emissions, and so it makes sense that the majority of their imports are sourced within Europe.
Data on exchange students
The second data source is a rough proxy for the nationalities of my colleagues. I collected phone numbers from a big WhatsApp group called “Lund University ’21” and extract the international dialing codes. There are several shortcomings to this data source – self-selection into WhatsApp use may differ by country, as might the desire to be part of a large group. Further, WhatsApp groups are limited in size at 256 members, just more than half the total number of exchange students at Lund this semester. My sample is unlikely to perfectly represent my population of interest, but it is a good enough starting point.
Data processing
In the chunks of code below I scrape a list of international dialing codes from the web, along with other country level information including GDP per capita and land area.
# website with data on dialing codes
<- "https://countrycode.org/"
url
# scrape table with rvest
<- read_html(url) %>%
table html_nodes("table") %>%
html_table()
# processing
<- table[[1]] %>% as_tibble(.name_repair = "minimal") %>% unnest()
table
<- table %>% as_tibble() %>% janitor::clean_names()
table
# cleaning names and formatting columns as numbers rather than characters
<- table %>%
table_df mutate(population = parse_number(population),
area_km2 = parse_number(area_km2),
gdp_usd = parse_number(gdp_usd),
gdp_usd = gdp_usd*10e9)
Here is a table of the data scraped from countrycode.org.
# display table
datatable(table_df)
We can show the regions of the world by the first digit of their dialing code in a neat map. Expand the chunk by clicking code
to see how easy it is to make an interactive graphic with ggplotly
.
# packages for map plotting and matching countries.
p_load(rnaturalearth, countrycode)
# extracting first digit of dialing code
<- table_df %>%
table_df mutate(iso_a3 = countrycode(country, origin = "country.name", destination = "iso3c")) %>%
mutate(first_digit = substring(country_code, 1, 1))
# creating dataframe with mapping geometry
<- ne_countries(scale = "medium", returnclass = "sf")
world
# joining up to table of dialing codes
<- world %>% as_tibble() %>% left_join(table_df, by = "iso_a3")
map_df
# creating plot
<- map_df %>%
a filter(!is.na(first_digit)) %>%
ggplot(aes(geometry = geometry, fill = first_digit)) +
geom_sf() +
scale_fill_brewer(palette = "Set3") +
labs(title = "Countries coloured by first digit of international dialing code",
fill = "First digit")
# display interactive plot
ggplotly(a)