Working with Tracking Data (PPATH / PPFAD)

For all years since 1984, the PPATH data set contains information on all persons who have ever lived in a SOEP household when a survey was conducted (i.e., all adult respondents as well as children under 17 years of age and household members who have never given an interview). PPATH is important in distinguishing research units (persons), especially for longitudinal analysis. In addition, paneldata.org uses PPATH to differentiate the study population.

Time-constant information on individuals:

  • Permanent Individual ID (adults, adolescents, children)

  • Original Household Number

  • Gender, year of birth, month of birth, year of death if applicable

  • Migration Background

  • Sample Membership (psample)

Time-varying information from individuals:

  • Current Household Number: If you move to another household, the household number changes (hhnrakt or $hhnr)

  • Survey Status ($netto, $netold)

  • Population Membership (private household, institutional household)

  • Survey Region (East or West Germany)

The data set is explained in more detail in the following documentation:

Dokumentation PPATH:

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do

  • H:/material/exercises/output

  • H:/material/exercises/temp

  • H:/material/exercises/log

These are used to store your script, log files, datasets and temporary datasets. Open an empty do-file and define your paths with globals:

1
2
3
4
5
6
7
8
9
***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\complete\soep-core\soep.v33.2\stata_en\"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

Attention

Please note that since version 34 (v34), PPFAD has been renamed PPATH. The following ecxercises are done with version 33.2 (v33.2), where the tracking file was named PPFAD.

The global „AVZ“ defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to the data you ordered.

Based on the data in PPATH, answer the following questions:

1. Look at the two people with the individual IDs (variable persnr) 2102 and 19202

a) What sex are they? When were they born and where, if applicable, did they die?

Open the PPATH dataset. Search the data set for variables that describe sex, year of birth, and year of death. Display the information from the variables for individuals 2102 and 19202.

1
2
3
4
use "${MY_IN_PATH}ppfad.dta", clear

* a) What gender are they? When were they born and eventually died?
list persnr sex gebjahr todjahr if persnr == 2102 | persnr == 19202
../_images/aufgabe_1.a.PNG

b) Were these people and their parents born in Germany?

In the data set, search for a variable that describes the migration background. Display the information from the variables for indivduals 2102 and 19202.

1
2
* b) Were these people and their parents born in Germany?
list persnr migback if persnr == 2102 | persnr == 19202
../_images/aufgabe_1.b.PNG

c) If they immigrated to Germany, in what year and from what country?

Search the data set for a variable that describes the country of birth and the year of moving to Germany. Display the information from the variables on indivduals 2102 and 19202.

1
2
*c) If they have immigrated: In which year and from which country?
list persnr immiyear corigin if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.c.PNG

d) Are these people from East or West Germany?

Search the data set for a variable that tells whether respondents are from the East or West. Display the information from the variables for individuals 2102 and 19202.

1
2
*d) Are these people from East or West Germany?
list persnr loc1989 psample if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.d.PNG

e) What sources provide the information on the migration background and year of death?

Search the data set for variables that give you the sources of information for year of death and migration background. Display the information from the variables for individuals 2102 and 19202.

1
2
*e) From which sources does the information on the migration background and the year of death come?
list miginfo todinfo if persnr  == 2102 | persnr == 19202
../_images/aufgabe_1.e.PNG

2. How many people lived in a private household that was interviewed in 2016 and completed the individual questionnaire?

Remember that the wave-specific survey year in SOEP is abbreviated with letters. SOEP started with wave “a” in 1984 and had reached wave “bg” in 2016. For more information on this topic, please refer to the DTC subchapter Naming Convention of Data Sets and Variables.

If you are interested in the 2016 survey year, the wave name indicates that you should be interested in variables with the abbreviation “bg”. Search the data set for variables with the abbreviation “bg” that describe the population. Display the characteristics of the population variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
********************************************************************************
*** Exercise 2) ***
* How many people lived in a realised private household in 2016 and answered the 
* personal questionnaire?

********************************************************************************

* informationen from:
* 2016 -> Wave bg
* private household -> bgpop
* Individual questionnaire -> bgnetto

tab bgpop
../_images/aufgabe_2_1.PNG

Values 1 and 2 are relevant to answer the question because they describe realized households. Search the data set for variables with the abbreviation “bg” that describe the survey status. Display the characteristics of the survey status:

1
tab bgnetto
../_images/aufgabe_2_2.PNG

Respondents with survey status between 10 and 15 or survey status 19 completed the individual questionnaire. Cross-tabulate the variables bgpop and bgnetto with an appropriate restricting condition to answer the question.

1
tab bgnetto bgpop if ((bgnetto >= 10 & bgnetto <= 15) | bgnetto==19) & (bgpop==1 | bgpop==2)
../_images/aufgabe_2_3.PNG

3. PPATH allows you to see which populations can be viewed from a longitudinal perspective:

a) How many people who answered the individual questionnaire in 2000 also took part in the survey in 2014?

Remember that the wave-specific survey year in SOEP is abbreviated with letters. SOEP started with wave “a” in 1984 and had reached wave “bg” in 2016. For more information on this subject, see the subchapter Naming Convention of Data Sets and Variables. The wave name shows that you are interested in the survey years 2000 and 2014. The survey years include the wave names “q”(2000) and “be”(2014). Search the data set for variables with the abbreviations “q” and “be” that describe the survey status. Display the characteristics of the survey status under the condition that the individual questionnaire has been answered:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
* a)How many people who answered the personal questionnaire in 2000 also took 
*   part in the survey in 2014?

* informationen from:
*	2000 -> wave q
*  	2014 -> wave be     
* 	Individual questionnaire -> $netto

tab qnetto benetto  if qnetto>=10 & qnetto<=19 & benetto>=10 & benetto<=19
*or:
//fre qnetto benetto  if qnetto>=10 & qnetto<=19 & benetto>=10 & benetto<=19
../_images/aufgabe_3_a.PNG

A total of 7,639 respondents completed the individual questionnaire in 2000 and 2014.

b) How many people answered the individual questionnaire every year from 2000 to 2014?

The survey years include the wave designations from “q”(2000) to “be”(2014). View the relevant survey status codes to answer the question. Please consider all individuals who completed the individual questionnaire:

1
2
3
4
5
* b) How many people answered the individual questionnaire every year from 2000 
*    to 2014?

/* to see all the codes */
lab list bgnetto
../_images/aufgabe_3_b.PNG

Define a variable list that shows all survey statuses ($netto) from the 15 survey waves considered in total.

1
2
3
local v "netto"
local vlist "q`v' r`v' s`v' t`v' u`v' v`v' w`v' x`v' y`v' z`v' ba`v' bb`v' bc`v' bd`v' be`v'"  
/* --> 15 waves */

Generate a variable that shows the number of waves of completed person interviews. Note that the values 10,12,13,14,15,16,18,19 of the $netto variable mean realized interviews.

1
2
capture drop h1
egen h1 = anycount(`vlist'), values(10 12 13 14 15 16 18 19)

Display a table with its newly generated variable.

1
tab h1 if h1 == 15
../_images/aufgabe_3_b4.PNG

A total of 6,665 people completed the individual questionnaire every year over the period 2000-2014.

c) How many people who turned 15 in 2011 and spent at least part of their childhood in a SOEP household took part in the survey in 2016?

The survey year 2011 is represented by the wave “bb” and the survey year 2016 is represented by the wave “bg”. To answer the question, a variable must be generated that identifies people who were 15 years old in 2011. The age of the respondent can be determined with the year of birth, and you can limit children using the net code. Generate a variable with people who turned 15 in 2011 and had lived in a survey household as a child.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
* c) How many people who turned 15 in 2011 and lived as children in a survey 
*    household took part in the survey in 2016?

*   informationen from:
*  	2011 -> wave bb
*	Age  -> 15  
*	Child -> bbnetto   
*	2016 -> wave bg
* 	Individual Questionnaire -> bgnetto

/* People who turned 15 in 2011 and lived in a survey household as a child...*/
capture drop a15kind
gen a15kind = 1 if 2011-gebjahr == 15 & bbnetto >= 20 & bbnetto < 30

In order to identify all persons who were 15 years old in 2011, lived in a survey household as a child, and completed the individual questionnaire in 2016, you must use the net codes again. Create a table using the net code from 2016 to narrow down the cases appropriately.

1
2
3
4
// fre bgnetto if a15kind == 1 & bgnetto >= 10 & bgnetto < 20
* oder:
tab bgnetto if a15kind == 1 & bgnetto >= 10 & bgnetto < 20

../_images/aufgabe_3_c2.PNG

In 2016, a total of 309 people who were 15 years old at the time of the survey and had been part of a survey household as a child in 2011 completed an individual interview.

d) The person with persnr=588010 was born in 1984 in a panel household and was still part of the sample in 2009. The person changed households twice during this time. In which years?

To identify how often and when a person changed households, you must display all available household numbers in ppath for person 588010.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
* still part of the sample in 2009. The person has changed households twice during
* this time. In which years?

* Information from:
* -> household numbers

list *hhnr if persnr == 588010
/* -> changed household 
 in year d (1987)
 in year y (2008)
 no participation since bb (2011) 
*/
../_images/aufgabe_3_d.PNG

Person 588010 has participated in the survey since the wave “b” (1985) as part of household 58807. From wave “d” (1987) to wave “x” (2007) the person was in household 73407, from wave “y” (2008) on, the person was in household 132608.