Working with Migration Data (BIOIMMIG)

With its migration and refugee samples, SOEP provides a broad spectrum of information on persons with a refugee and migration background.

In the BIOIMMIG data set you will find relevant information on the history of flight and migration, such as motives for fleeing and migration, the circumstances after arrival in Germany, but also information on relatives in the country of origin and the desire to return to the country of origin in edited form. For more information about this data set and a list of the variables it contains, see the BIOIMMIG Documentation.

In the following, we will use this record and other information from the SOEP to create a status variable that you can use to distinguish whether or not people with a migration background also have an escape background.

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do
  • H:/material/exercises/output
  • H:/material/exercises/temp
  • H:/material/exercises/log

These are used to store commands, log files, data sets and temporary data sets. Open an empty do file and define your created paths with globals:

1
2
3
4
5
6
7
8
9
***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\complete\soep-core\soep.v33.2\stata_en\"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

The global „AVZ“ defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

Task 1: Preparation of BIOIMMIG

a) In which variable can you find information about the status of each person when they immigrated to Germany?

Open the record or browse the BIOIMMIG documentation and search for a variable describing the immigration status. The biimgrp variable from the BIOIMMIG data set is the appropriate variable.

1
2
3
4
5
6
7
8
9
*** Exercise 1 ******************************************************************

/*
a)	In which variable can you find information about the status of each person when they immigrated to Germany?
*/

* Immigration status is stored in the variable biimgrp.

use $MY_IN_PATH\bioimmig.dta, clear

b) Identify this variable in the BIOIMMIG data set and load it from the data set, together with the person number and the survey year.

Open your data set only with the required variables to maintain clarity in your analysis data set.

1
2
3
4
5
/*
b)	Identify this variable in the BIOIMMIG data set and load it from the data set, together with the person number and the survey year.
*/

use persnr syear biimgrp using $MY_IN_PATH\bioimmig.dta, clear

c) What are the values of this variable?

Familiarize yourself with your research-relevant analysis variable and check coding and case numbers.

1
2
3
4
5
/*
c)	What are the values of this variable? 
*/

tab biimgrp, m //Characteristics of the variable are examined.
../_images/mig_1.PNG

d) On the basis of this variable, generate the variable “Escape”, which only distinguishes between three groups:

  • 0 = Cases where no information is available
  • 1 = All persons without escape background
  • 2 = Asylum seekers / fugitives

After you have familiarized yourself with the research-relevant analysis variable, recode the variable to suit your project. Then check the case numbers of your generated variable with the source variable.

1
2
3
4
5
6
7
8
9
/*
d)	On the basis of this variable, generate the variable "Escape", which only distinguishes between three groups:
    0 = Cases where no information is available
    1 = All persons without escape background 
    2 = Asylum seekers / refugees
*/

recode biimgrp (-5 -2 -1 = 0 "No Answer") (1 2 3 4 6 = 1 "no Escape") (5 = 2 "Escape"), gen(Escape)
tab biimgrp Escape, m // biimgrp and escape are compared.
../_images/mig_2.PNG

e) It may happen that initially there is no information on the status of immigration, but this will change in a later year. Limit the data record to the last observation that is available for the respective person, since this way the specification with the most information content is used.

1
2
3
4
5
6
7
8
e)	It may happen that tinitially there is no information on the status of 
*   immigration, but this will change in a later year. Limit the data record to 
*   the last observation that is available for the respective person, since this 
*   way the specification with the most information content is used. 
*/

bysort persnr: egen syear_max = max(syear) //A variable is created, which shows the last existing yearly observation
keep if syear_max == syear //Annual observations which are not the last observation are deleted.

f) Save the generated data record on your personal drive temporarily .

1
2
3
4
f)	Save the generated data record on your personal drive temporarily 
*/

save $MY_OUT_TEMP\biimgrp.dta, replace

Task 2: Add basic variables from PPFAD and weights

a) Load the following information from PPFAD:

If you want to familiarize yourself with the PPFAD data set, visit the chapter Working with Tracking Data (PPFAD).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/*
a)	Use the following information from PPFAD: 
  - Never changing Person ID „persnr“
  - Household number "hhnr" and the current household number "bghhnr". 
  - the net variable with information about the interview type "bgnetto".
  - the sex of the person "sex"
  - the year of birth "semester"
  - Variables on the migration background "migback", "germborn" "corigin" "immiyear"
  - Information about the survey status: "bgnetto" and "psample".
*/

use persnr hhnr bghhnr bgnetto psample sex gebjahr germborn corigin immiyear migback  using $MY_IN_PATH\ppfad.dta, clear

b) Merge the previously generated data record using the person number.

If you don’t understand how to create your own cross-section dataset, visit the chapter Generating a cross-section Data Set.

1
2
3
4
5
/*
b)	Merge the previously generated data record using the person number.
*/

merge 1:1 persnr using $MY_OUT_TEMP\biimgrp.dta, nogen

c) Add the corresponding person extrapolation factors to the data record.

1
2
3
4
c)	Add the corresponding person extrapolation factors to the data record.
*/

merge 1:1 persnr using $MY_IN_PATH\phrf.dta, keepus(bgphrf) nogen

d) Only keep respondents for whom a youth or individual questionnaire was realized in 2016.

For example, to exclude children who have not provided immigration status information, use the net code from PPFAD. Only keep persons who have conducted a completed individual or youth interview.

1
2
3
4
5
6
7
/*
d)	Only keep individuals for whom a youth or personal questionnaire was realized in 2016.
*/

tab bgnetto, m //Variable values are displayed

keep if inrange(bgnetto, 10, 19) // People who have a code between 10 and 19 will be kept.
../_images/mig_3.PNG

Task 3: Generate a status variable with the following categories:.

  • No immigrant background
  • Migration 2nd generation
  • Immigration without information
  • Immigration, not flight
  • Immigration, Flight

To generate this status variable, check the contents of the existing migration variables from PPFAD (migback germborn).

1
2
3
4
5
/*
Generate a status variable with the following categories:
*/

tab migback
../_images/mig_4.PNG
1
tab germborn
../_images/mig_5.PNG

Use the migration variables from PPFAD (migback, germborn) and link this information with your previously generated escape variable to build the described status variable from Task 3.

1
2
3
4
5
6
7
8
9
gen Status = 0 // All persons will first receive the missing code for "no info".
replace Status = 1 if migback == 1 & germborn == 1 // "no migback"
replace Status = 2 if migback == 3                 // "2nd generation" (2nd generation migrants born by definition in Germany, therefore "& germborn == 1" here unnecessary
replace Status = 3 if germborn == 2 & Escape == 0  // "Immigrants without information" 
replace Status = 4 if germborn == 2 & Escape == 1  // "Immigrants, no escape"
replace Status = 5 if germborn == 2 & Escape == 2  // "Immigrant, escape"

label def Statuslbl 0"no info" 1"no migback" 2"2. Generation" 3"Immigrants without information"  4"Immigrants, no escape" 5"Immigrant, escape"
label val Status Statuslbl // Values of the status veriable receive label

Task 4: Content analysis:

a) How many refugees (foreign-born with refugee/asylum titles) are now in your record?

Look at your status variable previously generated in task 3 to answer the question

1
2
3
4
5
6
7
*** Exercise 4 ******************************************************************

/*
a)	How many refugees (foreign-born with refugee/asylum titles) are now in your record?
*/

tab Status, m //Display Generated Status Variable
../_images/mig_6.PNG

All 4,514 respondents who received the value 5 for the generated status variable have a direct migration background (migback==2), were not born in Germany (germborn==2) and fled their home country (flight==2 and biimgrp==5).

b) How many are there if you take the person extrapolation factors into account? Interpret the results.

Look at your status variable previously generated in task 3 to answer the question

1
2
3
4
5
/*
b)	How many are there if you take the person extrapolation factors into account? Interpret the results.
*/

tab Status [aw=bgphrf], m  //Display generated status variable weighted with analytic weights
../_images/mig_7.PNG

After weighting, there are only about 675 fugitives in the data set. The weighting thus corrected the number of fugitives downwards.

c) How many persons are represented by the sample taking the extrapolation factors into account?

To use frequency weights in STATA, integer weights are required. Create an integer frequency weight from the weighting factor provided so that you can make representative statements. Then take a look at the new results.

1
2
3
4
5
6
/*
c)	How many persons are represented by the sample taking the extrapolation factors into account?
*/

gen fweight = round(bgphrf) //Frequency weights for stata require integer weight
tab Status [fw=fweight], m  //Display generated status variable weighted with frequency weights
../_images/mig_8.PNG

Around 1,600,000 people are represented.

d) What is the proportion of people over 40 years of age among the fugitives?

Since the data in this exercise come from the wave “bg”, we are currently in the survey year 2016; if you need a description of the wave designations, please refer to the chapter Naming Convention of Data Sets and Variables. To generate a suitable age variable, you can use the year of birth (year of birth). If we look at the survey year 2016, all persons born in 1976 or earlier were over 40 years old. Generate a suitable age variable and look at the proportion of fugitives over 40 years of age in weighted form:

1
2
3
4
5
6
7
8
/*
d)	What is the proportion of people over 40 years of age among the fugitives?
*/

gen ue_40 = 0
replace ue_40 = 1 if gebjahr <= 1976 // Persons receive proficiency 1 if they were born before 1975.

tab Status ue_40 [aw=bgphrf], m row nofreq
../_images/mig_9.PNG

The proportion of refugees over 40 years of age is about 47%.