Data
Data angieliuBelow I describe the variables used in this analysis along with their sources and explanations of what each variable measures. To build my dataset, I merge 6 different datasets together, which are explained more in-depth below.
Table 1: Sectors and subsectors examined
- Economic infrastructure and services
- Transport, communications
- Energy
- Banking, business and other services
- Humanitarian
- Production
- Agriculture, forestry and fishing
- Industry, mining and construction
- Trade and tourism
- Social Infrastructure and services
- Education
- Water supply and sanitation
Table 2: Variable Sources and Notes
- Development Assistance Committee (DAC) bilateral aid: OECD. This dataset comes from the DAC, the world’s richest donor countries from the OECD. The DAC is a forum within the OECD that promotes developmental aid cooperation and other policies to contribute to the Sustainable Development Goals. The countries in the DAC are 30 of the world’s wealthiest donor countries in the OECD2. Member countries, per 2 the OECD’s criteria, need to have existing strategies, policies and institutional frameworks for development cooperation, a history of giving aid and systems that promote accountability for aid given. More importantly, the data are broken out into detailed categories. Data from the Query Wizard for International Statistics (QWIDS), which preselects a general aid dataset to use, are shown below. The query selected the biggest categories and a selection of their subcategories. For example, Education is under the Social Infrastructure and Services umbrella. There are more subcategories of aid, but the QWIDS query only included some selected subcategories. A comprehensive analysis on the remaining subcategories has potential for new research, which will be discussed in the Further Thoughts section of this paper. The data spans from 2007 to 2015 and includes only donor country data. Organizations like the UN or the Bill and Melinda Gates Foundation excluded. To merge this dataset with the others, I define the donor country as the destination and the recipient country as the origin country. Due to this definition, the recipients/origin countries are not DAC countries. DAC countries do offer aid to each other and there is a considerable amount of migration within this group of countries, but my dataset does not capture these migration stocks. Omitting this group of data should not sway the results, as the DAC countries are among the wealthiest in the world and often, the most politically stable.
- GDP per capita in 2011 PPP: World Bank. Purchasing power parity is a method of standardizing the cost of goods in each country. It is calculated by examining the price of a similar “basket of goods” and comparing that same basket to another country.
- Migrant Stock by Origin and Destination: UN. The United Nations’ Population Division gathers data on the number of migrants in a given country. In some cases, due to missing data, they will project the number of migrants using trends. The UN defines an international migrant stock as people “born in a country other than that in which they reside.” They compile these estimates every five years, so this dataset only has information on migrant stock by origin and destination at year 2010 and year 2015.
- Distance between origin and destination, Common language, Colonized by destination country: United States International Trade Commission. This dataset is from the United States International Trade Commission and describes characteristics and relationships between two countries.
- Free, Partly Free, Not Free Status: Freedom House. Freedom House is an independent organization, which is dedicated to the expansion of freedom and democracy in the world. They analyze the state of political and civil rights around the world and designate scores to reflect the situation at a given country and construct three measures to score and rate countries and territories in the world to capture the extent of freedom. I use their Status variable, which is a composite of a country’s political and civil rights scores. Status has three levels: Free, Partly Free and Not Free. For the purposes of this analysis, I make Status a binary variable in which Free and Partly Free are lumped together under “Free” denoted by 1 and countries that are classified as “Not Free” as denoted by 0.
- Conflict: Uppsala Conflict Data Program (UCDP). I use the UCDP/PRIO Armed Conflict Dataset. I use the incompatibility variable to capture the existence of conflict using a binary variable. An incompatibility is the use of armed force between two parties, with the government of a state being a counterparty, that results in at least 25 battle-related deaths in a calendar year. The existence of a conflict, as defined by UCDP/PRIO is denoted by 1 and no conflict is denoted by 0.
Table 3: Categorized Summary Statistics, by Period
Period 1
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
---|---|---|---|---|---|---|---|
MigStock | 3,235 | 55,535.210 | 403,251.200 | 1.000 | 495.000 | 20,630.00 | 12,168,662.000 |
distance | 3,990 | 7,339.731 | 3,546.232 | 345.373 | 4,589.460 | 9,618.853 | 18,708.700 |
lnGDPperCap_o | 3,904 | 8.430 | 0.952 | 6.487 | 7.594 | 9.206 | 10.728 |
lnGDPperCap_d | 3,990 | 10.645 | 0.215 | 10.194 | 10.501 | 10.717 | 11.491 |
Bilateral | 3,990 | 48.962 | 185.723 | 0.000 | 1.200 | 29.457 | 4,077.910 |
Economic Infrastructure & Services | 821 | 13.910 | 104.659 | 0.000 | 0.020 | 2.040 | 2,387.990 |
Humanitarian | 699 | 6.293 | 30.301 | 0.000 | 0.130 | 2.840 | 446.460 |
Production Sector | 823 | 3.786 | 15.218 | 0.000 | 0.070 | 2.195 | 254.710 |
Social Infrastructure & Services | 1,504 | 16.366 | 80.846 | 0.000 | 0.140 | 7.110 | 1,625.090 |
Period 2
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max | |
---|---|---|---|---|---|---|---|---|
MigStock | 3,883 | 62,699,470 | 433,066.100 | 1.000 | 567.00 |
|
12,275,876.000 | |
distance | 4,836 | 7,346.041 | 3,601.207 | 345.373 | 4,582.014 | 9,737.048 | 18,708.700 | |
lnGDPperCap_o | 16,560 | 9.025 | 1.165 | 6.326 | 8.107 | 9.864 | 11.815 | |
lnGDPperCap_d | 8,258 | 9.992 | 1.088 | 6.326 | 9.469 | 10.682 | 11.815 | |
Bilateral | 28,084 | 53.251 | 185.240 | 0.000 | 1.103 | 33.245 | 4,862.170 | |
Economic Infrastructure & Services | 913 | 16.361 | 104.482 | 0.000 | 0.010 | 2.100 | 2,364.600 | |
Humanitarian | 882 | 7.634 | 35.289 | 0.000 | 0.070 | 3.558 | 768.230 | |
Production Sector | 1,031 | 4.643 | 26.046 | 0.000 | 0.050 | 2.390 | 642.280 | |
Social Infrastructure & Services | 1,916 | 15.209 | 73.499 | 0.000 | 0.120 | 7.565 | 1,865.410 |
Table 4: Uncategorized Summary Statistics, by Period
Period 1
Statistic | N | Mean | St. Dev. | Min | Ptctl(25) | Ptctl(75) | Max |
---|---|---|---|---|---|---|---|
MigStock | 9,019 | 67,395.620 | 385,492.900 | 1.000 | 757.000 | 31,296.000 | 12,168,662.000 |
distance | 11,582 | 7,421.602 | 3,544.230 | 345.373 | 4,666.997 | 9,845.781 | 18,215.300 |
lnGDPperCap_o | 11,317 | 8.233 | 0.932 | 6.515 | 7.386 | 8.981 | 10.728 |
lnGDPperCap_d | 11,582 | 10.610 | 0.188 | 10.194 | 10.497 | 10.677 | 11.461 |
Bilateral | 11,582 | 77.600 | 243.546 | 0.000 | 3.493 | 55.590 | 4,077.910 |
Agriculture, Forestry & Fishing | 1,981 | 4.215 | 20.074 | 0.000 | 0.050 | 2.020 | 404.190 |
Education | 3,046 | 4.907 | 18.795 | 0.000 | 0.060 | 2.450 | 307.120 |
Energy | 1,044 | 9.810 | 48.366 | 0.000 | 0.010 | 1.320 | 813.400 |
Food | 615 | 4.027 | 7.798 | 0.000 | 0.060 | 4.450 | 68.460 |
Industry, Mining and Construction | 1,207 | 1.980 | 22.074 | 0.000 | 0.010 | 0.520 | 678.770 |
Trade & Tourism | 957 | 0.679 | 3.227 | 0.000 | 0.010 | 0.320 | 77.080 |
Transport & Communications | 1,149 | 11.225 | 57.157 | 0.000 | 0.000 | 0.670 | 829.650 |
Water Supply & Sanitation | 1,583 | 5.819 | 27.407 | 0.000 | 0.010 | 1.320 | 470.850 |
Period 2
Statistic | N | Mean | St. Dev. | Min | Ptctl(25) | Ptctl(75) | Max |
---|---|---|---|---|---|---|---|
MigStock | 19,856 | 91,270.390 | 599,836.300 | 1.000 | 940.000 | 38,756.000 | 12,275,876.000 |
distance | 25,720 | 7,380.477 | 3,514.563 | 345.373 | 4,688.914 | 9,739.940 | 19,314.750 |
lnGDPperCap_o | 37,019 | 8.749 | 1.064 | 6.326 | 7.879 | 9.513 | 11.815 |
lnGDPperCap_d | 29,142 | 10.439 | 0.667 | 6.326 | 10.485 | 10.696 | 11.815 |
Bilateral | 44,524 | 50.761 | 185.575 | 0.000 | 0.510 | 28.980 | 4,862.170 |
Agriculture, Forestry & Fishing | 4,403 | 3.489 | 12.815 | 0.000 | 0.040 | 1.950 | 285.860 |
Education | 7,350 | 4.045 | 15.971 | 0.000 | 0.050 | 1.990 | 485.590 |
Energy | 2,418 | 12.559 | 63.122 | 0.000 | 0.000 | 1.320 | 919.590 |
Food | 888 | 3.720 | 7.958 | 0.000 | 0.060 | 3.893 | 94.520 |
Industry, Mining and Construction | 2,758 | 1.780 | 16.226 | 0.000 | 0.010 | 0.530 | 566.820 |
Trade & Tourism | 2,029 | 0.813 | 3.314 | 0.000 | 0.010 | 0.320 | 70.090 |
Transport & Communications | 2,422 | 12.749 | 99.906 | 0.000 | 0.000 | 0.610 | 2,861.340 |
Water Supply & Sanitation | 3,452 | 5.571 | 31.167 | 0.000 | 0.010 | 1.240 | 900.100 |
From the summary statistics, I observe that the dataset used for Period 1 (2007 - 2010) has a total of 3,990 number of observations with 23 OECD destination countries and 124 origin countries. Period 2 (2011 - 2015) dataset has 3,698 number of observations using 81 countries’ data with the categorized aid types (Categorized Summary Statistics). When looking at the data in its selected subcategories or the uncategorized analysis, there is a total of 9,019 observations and the same number of OECD destination and origin countries in Period 1. Period 2 has 19,856 observations in uncategorized analysis (Uncategorized Summary Statistics). In my dataset, there is missing data due to unavailability or conditions that may not apply to the specific country-year-aid pair. In the regressions, I omit the missing data. The data covers three years using the 2010 estimate of migrant stock. This way, I can look at the effects of the various types of aid over a course of three years. The standard deviations of Bilateral Aid and all the aid types (Energy, Food Aid, Industry, Mining and Construction, Trade and Tourism, Transport and Communications and Water Supply and Sanitation) are very high due to the highly variable nature of aid that is dependent on relations between the destination and origin country like distance, migrant stock and population (Gurevich and Herman 2018). Overall, migrant stock is higher in all countries in the second period, which is consistent with the increasing pace of migration globally. There is a considerable amount of focus, measured by number of observations, on Social Infrastructure and Services Aid in both periods. The remaining categories have roughly similar observations.
Figure 1: Number of observations by aid type
The following maps show the top donor countries in this dataset which are the United States, Japan and the United Kingdom. Bilateral aid is a summation of the years covered in each period. The darker the shading, the more bilateral aid donated. The top donors are unchanged over the two periods.
Figure 2: Map of Top Donor Countries, Period 1 (2007-2010)
Figure 3: Map of Top Donor Countries, Period 2 (2011-2015)
The following data visualizations provide a more detailed look into the amount of aid received in a specific year during each given time period. Notably, Afghanistan and Iraq show up most frequently in the period from 2007 to 2010. In the second period (2011 to 2015), there does not appear to be any trend in which countries appear in the top 30 recipients by aid type each year. This could be due to factors such as geopolitical events such as Sudan and South Sudan splitting into two countries in 2013 (CIA 2018) and the Crimean Peninsula being annexed from Ukraine by the Russian Federation in 2014 (Treisman 2016).
Figure 4: Top Recipients by Aid Type and Amount
2 Note that there are fewer than 30 donor countries in this dataset, as some countries elect not to report these figures or only participate in other forms of aid types that are omitted here such as Multi Sector or Program aid.
Gravity Model
Gravity Model angieliuI use a gravity model to determine the relationship between migration and development aid. The gravity model looks at the interaction between migration flow and its drivers, in this case, the types of developmental aid. The gravity model is the best choice for this analysis, because the OECD developmental finance dataset reports bilateral net aid, and gravity models can incorporate paired country time and fixed effects into a regression. The time effects will be particularly helpful in this instance because amounts of bilateral aid vary over time as countries’ situations change. I estimate the following model:
ln Migrant Stocki,j,t = β1 ln (GDP per Capitai,t) + β2 ln (GDP per Capitaj,t) + β3Conflicti,t + β4Freei,t + β5ln(Distance)i,j + β6Landlockedi + β7Common Languagei,j + β8i Former Colony of ji,j + β9ln(1 + Bilateral Aid)j-->i,t + β10ln(1 + Aid Type)j-->i,t + δi + δj + δt + εi,j,t
Continuous variables are expressed using natural log to allow for interpretation of the results as percent changes from proportional movements in the explanatory variables (World Bank 2018). The dependent variable is the migrant stock that originated in country i and is resident in country j at time t. I use migrant stock instead of migrant flow, because I can control the unique characteristics of specific time periods, destination and origin. Migrant flow measures the number of individuals entering/leaving a country during a specific period, whereas migrant stock captures the number of migrants at a given point in time. The research question is whether aid is effective in reducing the drivers of migration, but the decision to migrate depends on time, origin and available destination countries. Using migrant flow would wash out these variables in this analysis. This migrant stock data has only 2010 and 2015 estimates and the OECD’s aid data covers 2007 to 2015. To extract the most data and acknowledge that some time is needed to see aid’s effects, I split the aid data into two periods: 2007 to 2010 and 2011 to 2015 and use corresponding migrant stock from 2010 and 2015 from the UN as the dependent variable. The time periods are unfortunately unevenly split with period 1 having 3 years while period 2 has 4 years. Thus, the time periods are not meant for comparison, but rather additional information to look at the aid-migration link in two time periods.
To control for each years’ unique characteristics within each time period such as the presence of more conflict or uneven crop yields, I added a fixed effect δt to control for this. I also use fixed effects with origin δi and destination δj. I add to my regression equation to capture the network effects derived from bilateral aid from one country to another. We add ln(1 + Bilateral Aid)j-->i,t to our regression equation to capture the network effects derived from bilateral aid. The terms ln (GDP per Capitai,t) and ln (GDP per Capitaj,t) are added to reflect the abundant literature which says economic potential is a key driver of migration. GDP per capita is a proxy for potential wages migrants may earn at the destination country and reflects the economic condition of countries. I add a constant of 1 to GDP per Capita, Bilateral Aid and Aid type3 to work around the small values of these variables which would turn negative when the log is taken. These aid amounts are expressed in 2016 constant USD millions, so a constant of 1 will affect the results negligibly. I also separate my analysis into the categorized aid groups and selected subcategories within the larger category to further examine the differences between aid types. In addition to the aid type as a regressor, I add traditional variables of a gravity model such as distance between countries, the status of political and civil freedoms in a country and dummy variables to capture the following: conflict, landlocked, commonality in language and colonial history to determine the pull factors of migration. These variables have been shown to be strong determinants of migration. This is further evidenced by the regression tables below that show a strong statistical significance of these variables in predicting migration flows.
I use a quasi-poisson model with a log link or Poisson Pseudo Maximum Likelihood (PPML) model. I use PPML as opposed to an ordinary least square model (OLS), based on evidence that OLS overestimates many determinants of migration such as geographic distance between countries (Silva and Tenreyro 2006). OLS exhibits this behavior because migration data often has unequal variability across the range of variables of the predictors. Migration patterns are dependent on many factors and it is often the case that there is zero migration between a specific country pair.
3 Note that Aid is multiplied by a million to convert it to millions for an apples-to-apples comparison with the rest of aid and population statistics