Longitudinal Data Analysis

Simple cross-sectional analyses show that married people have higher life satisfaction than singles. You want to check this on the basis of longitudinal analysis with the SOEP.

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do

  • H:/material/exercises/output

  • H:/material/exercises/temp

  • H:/material/exercises/log

These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define the paths you created with globals:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
***********************************************
* Set some useful commands
***********************************************
version 13
clear all
set more off
**increase buffer size
set scrollbufsize 2000000
**now restart stata!

***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\distribution\soep-long\soep.v33.1\stata_en\"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

Create a master file that uses the important variables from ppathl.

You should always add some variables from PPATHL to your dataset by default. Download the following information from PPATHL:

  • Individual identifier "pid"

  • Household identifier "hid"

  • Survey year "syear"

  • The net variable with information on the interview type "netto"

  • The weighting variable "phrf"

  • The sex of the person "sex"

  • The migration background "migback"

1
2
3
4
*-------------------------------------------------------------------------------
*** Step 1) Start with basic information from PPFADL ***

use pid hid syear netto phrf migback sex using ${MY_IN_PATH}\ppfadl.dta 

Attention

Please note that since version 34 (v34), PPFADL has been renamed PPATHL. The following exercises are done with version 33.1 (v33.1), where the tracking file was named PPFADL.

Search for matching variables and add them to your data set

To perform your analysis, you need different SOEP variables. The SOEP offers various options for a variable search:

In this case, you need the variables "pgfamstd" (martial status) and "plh0182" (life satisfaction).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
*-------------------------------------------------------------------------------
*** Step 2) Add the relavant variables: here: family status and life satisfaction ***
merge 1:1 pid syear using ${MY_IN_PATH}\pgen, keepusing(pgfamstd) keep(1 3) nogen	

		// merges family status from pgen
		// Documentation for PGEN can be found here
		// http://panel.gsoep.de/soep-docs/surveypapers/diw_ssp0307.pdf)

		
*describe using pl (directory)
		// for checking out variable names without opening the dataset
		
merge 1:1 pid syear using ${MY_IN_PATH}\pl, keepusing(plh0182) keep(1 3) nogen
		// merges life satisfaction from pl 

save $MY_OUT_DATA\ppfad.dta, replace

Clean and inspect the data

Recode all missing values with commas to period decimal format.

1
2
3
*-------------------------------------------------------------------------------
*** Step 3) Clean and inspect the data
mvdecode _all, mv(-8/-1)

Since you are interested in individual characteristics in your analysis: Delete all measurements that are not based on successful individual interviews.

1
2
tab netto
drop if netto>19
../_images/SOEPlong_01.PNG

How many people contribute measurements and what is the proportion of people contributing at least 10 measurements?

Define the data set as a panel data set.

1
2
3
**define the data set as panel data
xtset pid syear
xtdes
../_images/SOEPlong_02.PNG

86,079 respondents have contributed information in waves a (1984) to bg (2016) and 75% of the 86,079 respondents have provided information for at least 10 waves.

How many people took part in the survey in 2010 and contributed to continuous measurements up to 2014?

1
xtdes if syear>=2010 & syear<=2014
../_images/SOEPlong_03.PNG

14,673 respondents provided continuous information from 2010 to 2014.

Univariate inspection & analysis

How does the mean of life satisfaction change over time?

1
2
3
*-------------------------------------------------------------------------------
*** Step 4) univariate inspection & analysis
table syear, content (mean plh0182)
../_images/SOEPlong_04.PNG

What proportion of people are a) married in 2014 or b) have a migration background? Compare weighted with unweighted frequency tables: Who is overrepresented in SOEP?

1
2
3
tab1 pgfamstd migback if syear==2014
tab pgfamstd [aw=phrf] if syear==2014
tab migback [aw=phrf] if syear==2014
../_images/SOEPlong_05.PNG
../_images/SOEPlong_06.PNG

The data show that married people are overrepresented in the SOEP and single people are underrepresented. The weighting makes it representative again for Germany.

../_images/SOEPlong_05b.PNG
../_images/SOEPlong_07.PNG

In the SOEP sample, respondents with a direct or indirect migration background are overrepresented.

How many of those persons who reported a life satisfaction scale value of 7 in one survey year also indicated the scale value of 7 in the following survey year?

1
xttrans plh0182
../_images/SOEPlong_08.PNG

34.57% of the respondents who reported a life satisfaction of 7 again reported a value of 7 in the following year.

Is it more likely that a highly dissatisfied person (value: 0) will be less dissatisfied the following year or that a very satisfied (value: 10) person will be less satisfied the following year?

1
xttrans plh0182
../_images/SOEPlong_08.PNG

The rows reflect the initial values, and the columns reflect the final values. Around 20% of those who were completely dissatisfied (value: 0) in the base year remained completely dissatisfied in the following year. About 80% of these completely dissatisfied people from the base year were more satisfied in the following year. Of the completely satisfied persons (value: 10), about 37% remained just as satisfied in the following year, but 63% became less satisfied. It is more likely that a completely dissatisfied person will become more satisfied in the following year than that a completely satisfied person will become less satisfied.

Which transitions in marital status can be observed particularly frequently in the data?

1
xttrans pgfamstd
../_images/SOEPlong_09.PNG

Survey respondents who were married but separated in the base year and reported divorce as their family status in the following year can be observed particularly frequently. (About 19%).

Simple cross sectional analyses

You now want to find the correlation between marital status and life satisfaction. Is there an effect of marriage on life satisfaction? And if so, is it a sustained effect?

First, calculate the correlation between family status and life satisfaction in from a cross-sectional perspective for 2010: Are married people happier than singles?

1
2
3
*-------------------------------------------------------------------------------
*** Step 5)simple cross sectional analyses
table pgfamstd if syear==2010, content (mean plh0182)
../_images/SOEPlong_10.PNG

At first glance, married couples seem happier than singles.

Now generate a variable that indicates a transition from “single” to “married”.

How many such transitions can you find in the data?

1
2
3
4
***perform longitudinal analysis
**define event: transition to marriage	
generate to_mar=1 if pgfamstd==1 & l.pgfamstd==3
tab to_mar
../_images/SOEPlong_11.PNG

A total of 4,834 people can be observed changing status from single to married.

What is the average level of life satisfaction immediately after the transition to marriage (i.e., in the first survey in which the transition can be observed) and how high is life satisfaction immediately before the transition to marriage?

1
2
3
4
5
6
7
**standard way of life-event analysis
sum plh0182 if to_mar==1
sum l.plh0182 if to_mar==1

**alternative way
generate dif_sat= plh0182- l.plh0182
mean dif_sat if to_mar==1
../_images/SOEPlong_12.PNG
../_images/SOEPlong_13.PNG

Before the transition to marriage, the average life satisfaction of the respondents is 7.54. In the following year, that is, after the transition to marriage, the average life satisfaction of the respondents is 7.65. It can be seen that with the transition to marriage, average life satisfaction rises slightly by 0.11.

Map the complete satisfaction history around the “marriage entry” event [3 years before; 3 years after].

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
**preparing illustration of trajectory
generate t=0 if to_mar==1 & l.to_mar~=1 &l2.to_mar~=1 & l3.to_mar~=1 & l4.to_mar~=1 & l5.to_mar~=1 & l6.to_mar~=1 & l7.to_mar~=1 & l8.to_mar~=1 & l9.to_mar~=1 & l10.to_mar~=1 & l11.to_mar~=1 & l12.to_mar~=1 & l13.to_mar~=1 & l14.to_mar~=1
replace t=1 if l.t==0
replace t=2 if l2.t==0
replace t=3 if l3.t==0
replace t=-1 if f.t==0
replace t=-2 if f2.t==0
replace t=-3 if f3.t==0

table t, content (mean plh0182 n plh0182)
../_images/SOEPlong_14.PNG

Choose a suitable presentation for your results and let Stata create a graphic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
** Preparing graph of event analysis												
sort t
cap drop meanplh0182
by t: egen meanplh0182 = mean(plh0182)

cap drop upper
gen upper = .
forval i = -3/3{ 
	su plh0182 if t == `i'
	replace upper = r(mean) + 1.96 * r(sd)/sqrt(r(N)) if t == `i'
}

cap drop lower
gen lower = .
forval i = -3/3{ 
	su plh0182 if t == `i'
	replace lower = r(mean) - 1.96 * r(sd)/sqrt(r(N)) if t == `i'
}

twoway (line meanplh0182 t) (rcap upper lower t, lcolor("red")) , title("Satisfaction with life relative to year of marriage") legend(label(1 "Avg. life satisfaction") label(2 "95% Conf. interval")) scheme(s1mono) xtitle("Years relative to marriage") ytitle("Avg. life satisfaction")
../_images/SOEPlong_15.PNG

The graph shows that a positive effect on life satisfaction can be observed when family status changes from single to married. In the following years of the existing marriage, life satisfaction decreases again and approaches the initial satisfaction before the marriage.