Working with Tracking Data (PPFAD)

For all years since 1984, the PPFAD data set contains information on all persons who have ever lived in a SOEP household at a survey time (i.e. all respondents, but also children under 17 years of age and persons who have never given an interview). PPFAD is important for the distinction of the research units (persons), especially for longitudinal analyses. In addition, paneldata.org uses PPFAD to differentiate the study population.

Time constant information of persons:

  • Never changing Person ID (adults, adolescents, children)
  • Original Household Number
  • Gender, year of birth, month of birth, year of death if applicable
  • Migrant Background
  • Sample Membership (psample)

Time-varying information from people:

  • Current Household Number: If you move to another household, the household number changes (hhnrakt or $hhnr)
  • Survey Status ($netto, $netold)
  • Population Membership (private household, institutional households)
  • Survey Region (East or West Germany)

The data set is explained in more detail in a documentation:

Dokumentation PPFAD:

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do
  • H:/material/exercises/output
  • H:/material/exercises/temp
  • H:/material/exercises/log

These are used to store your script, log files, datasets and temporary datasets. Open an empty do file and define your created paths with globals:

1
2
3
4
5
6
7
8
9
***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\complete\soep-core\soep.v33.2\stata_en\"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

The global „AVZ“ defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

Based on the data in PPFAD, answer the following questions:

1. Look at the two people with the person ID (variable persnr) 2102 and 19202

a) What gender are they? When were they born and possibly died?

Open the PPFAD dataset. Search the data set for variables that describe gender, year of birth and year of death. Display the information of the variables for persons 2102 and 19202.

1
2
3
4
use "${MY_IN_PATH}ppfad.dta", clear

* a) What gender are they? When were they born and eventually died?
list persnr sex gebjahr todjahr if persnr == 2102 | persnr == 19202
../_images/aufgabe_1.a.PNG

b) Were these people and their parents born in Germany?

In the data set, search for a variable that describes the migration background. Display the information of the variable for persons 2102 and 19202.

1
2
* b) Were these people and their parents born in Germany?
list persnr migback if persnr == 2102 | persnr == 19202
../_images/aufgabe_1.b.PNG

c) If they have immigrated: In which year and from which country?

Search the data set for a variable that describes the country of birth and the year of moving to Germany. Display the information of the variables for persons 2102 and 19202.

1
2
*c) If they have immigrated: In which year and from which country?
list persnr immiyear corigin if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.c.PNG

d) Are these people from East or West Germany?

Search the data set for a variable that describes east-west affiliation. Display the information of the variables for persons 2102 and 19202.

1
2
*d) Are these people from East or West Germany?
list persnr loc1989 psample if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.d.PNG

e) From which sources does the information on the migration background and the year of death come?

Search the data set for info variables that show you sources of information for the year of death and the migration background. Display the information of the variables for persons 2102 and 19202.

1
2
*e) From which sources does the information on the migration background and the year of death come?
list miginfo todinfo if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.e.PNG

2. How many people lived in a realised private household in 2016 and answered the individual questionnaire?

Remember that the wave-specific survey year in SOEP is abbreviated with letters. SOEP started in 1984 (wave a) and was in a survey wave “bg” in 2016. For more information on this topic, please refer to the DTC subchapter Naming Convention of Data Sets and Variables.

If you are interested in the 2016 survey year, the wave name indicates that you should be interested in variables with the abbreviation “bg”. Search the data set for variables with the abbreviation “bg” that describe the population. Display the characteristics of the population variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
********************************************************************************
*** Exercise 2) ***
* How many people lived in a realised private household in 2016 and answered the 
* personal questionnaire?

********************************************************************************

* informationen from:
* 2016 -> Wave bg
* private household -> bgpop
* Individual questionnaire -> bgnetto

tab bgpop
../_images/aufgabe_2_1.PNG

Values 1 and 2 are relevant to answer the question because they describe realized households. Search the data set for variables with the abbreviation “bg” that describe the survey status. Display the characteristics of the survey status:

1
tab bgnetto
../_images/aufgabe_2_2.PNG

Respondents with survey status between 10 and 15 or survey status 19 completed the individual questionnaire. Cross-tab the variables bgpop and bgnetto with an appropriate restricting condition to answer the question.

1
tab bgnetto bgpop if ((bgnetto >= 10 & bgnetto <= 15) | bgnetto==19) & (bgpop==1 | bgpop==2)
../_images/aufgabe_2_3.PNG

3. PPFAD allows you to see which populations can be viewed from a longitudinal perspective:

a) How many people who answered the individual questionnaire in 2000 also took part in the survey in 2014?

Remember that the wave-specific survey year in SOEP is abbreviated with letters. SOEP started in 1984 (wave a) and was in a survey wave “bg” in 2016. For more information on the subject, see the subchapter Naming Convention of Data Sets and Variables. The wave name shows that you are interested in the survey years 2000 and 2014. The survey years include the wave names “q”(2000) and “be”(2014). Search the data set for variables with the abbreviations “q” and “be” that describe the survey status. Display the characteristics of the survey status under the condition that the individual questionnaire has been answered:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
* a)How many people who answered the personal questionnaire in 2000 also took 
*   part in the survey in 2014?

* informationen from:
*	2000 -> wave q
*  	2014 -> wave be     
* 	Individual questionnaire -> $netto

tab qnetto benetto  if qnetto>=10 & qnetto<=19 & benetto>=10 & benetto<=19
*or:
//fre qnetto benetto  if qnetto>=10 & qnetto<=19 & benetto>=10 & benetto<=19
../_images/aufgabe_3_a.PNG

A total of 7639 respondents completed the individual questionnaire in 2000 and 2014.

b) How many people answered the individual questionnaire every year from 2000 to 2014?

The survey years include the wave designations from “q”(2000) to “be”(2014). View the relevant survey status codes to answer the question. Please consider all persons who have answered the individual questionnaire:

1
2
3
4
5
* b) How many people answered the individual questionnaire every year from 2000 
*    to 2014?

/* to see all the codes */
lab list bgnetto
../_images/aufgabe_3_b.PNG

Define a variable list that shows all survey statuses ($netto) of the 15 survey waves considered in total.

1
2
3
local v "netto"
local vlist "q`v' r`v' s`v' t`v' u`v' v`v' w`v' x`v' y`v' z`v' ba`v' bb`v' bc`v' bd`v' be`v'"  
/* --> 15 waves */

Generate a variable that shows the number of waves of completed person interviews. Note that the values 10,12,13,14,15,16,18,19 of the $netto variable mean realized interviews.

1
2
capture drop h1
egen h1 = anycount(`vlist'), values(10 12 13 14 15 16 18 19)

Display a table with its newly generated variable.

1
tab h1 if h1 == 15
../_images/aufgabe_3_b4.PNG

A total of 6665 people completed the individual questionnaire every year over the period 2000-2014.

c) How many people who turned 15 in 2011 and lived as children in a survey household took part in the survey in 2016?

The survey year 2011 is represented by the wave “bb” and the survey year 2016 is represented by the wave “bg”. To answer the question, a variable must be generated that identifies people who were 15 years old in 2011. The age of the respondent can be determined with the year of birth and you can limit children using the net code. Generate a variable with people who turned 15 in 2011 and lived in a survey household as a child.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
* c) How many people who turned 15 in 2011 and lived as children in a survey 
*    household took part in the survey in 2016?

*   informationen from:
*  	2011 -> wave bb
*	Age  -> 15  
*	Child -> bbnetto   
*	2016 -> wave bg
* 	Individual Questionnaire -> bgnetto

/* People who turned 15 in 2011 and lived in a survey household as a child...*/
capture drop a15kind
gen a15kind = 1 if 2011-gebjahr == 15 & bbnetto >= 20 & bbnetto < 30

In order to identify all persons who were 15 years old in 2011, lived in a survey household as a child and completed the individual questionnaire in 2016, you must use the net codes again. Create a table from the net code of 2016 to narrow down the cases appropriately.

1
2
3
4
// fre bgnetto if a15kind == 1 & bgnetto >= 10 & bgnetto < 20
* oder:
tab bgnetto if a15kind == 1 & bgnetto >= 10 & bgnetto < 20

../_images/aufgabe_3_c2.PNG

In 2016, a total of 309 people who were 15 years old and were part of a survey household as a child in 2011, completed a individual interview.

d) The person with persnr=588010 was born in 1984 in a panel household and was still part of the sample in 2009. The person has changed households twice during this time. In which years?

To identify how often and when a person has changed the household, you must display all available household numbers in ppfad for person 588010.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
* still part of the sample in 2009. The person has changed households twice during
* this time. In which years?

* Information from:
* -> household numbers

list *hhnr if persnr == 588010
/* -> changed household 
 in year d (1987)
 in year y (2008)
 no participation since bb (2011) 
*/
../_images/aufgabe_3_d.PNG

The person 588010 has participated in the survey since the wave “b” (1985) in household 58807. From wave “d” (1987) to wave “x” (2007) the person was in household 73407, from wave “y” (2008) the person was in household 132608.