Longitudinal Data Analysis

Simple cross section analyses show that married people have a higher life satisfaction than singles. You want to check this on the basis of longitudinal analyses with the SOEP.

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do
  • H:/material/exercises/output
  • H:/material/exercises/temp
  • H:/material/exercises/log

These are used to store your script, log files, datasets and temporary datasets. Open an empty do file and define your created paths with globals:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
***********************************************
* Set some useful commands
***********************************************
version 13
clear all
set more off
**increase buffer size
set scrollbufsize 2000000
**now restart stata!

***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\distribution\soep-long\soep.v33.1\stata_en\"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

The global „AVZ“ defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

Create a master file that uses the important variables from ppathl.

You should always add some variables from PPATHL to your data set by default. Download the following information from PPATHL:

1
2
3
4
*-------------------------------------------------------------------------------
*** Step 1) Start with basic information from PPFADL ***

use pid hid syear netto phrf migback sex using ${MY_IN_PATH}\ppfadl.dta 

Attention

Please note that since version 34 (v34), PPFADL is renamed in PPATHL. The following ecxercises are done with version 33.1 (v33.1), where the tracking file was named PPFADL.

Search for matching variables and add them to your data set

To perform your analysis, you need different SOEP variables. The SOEP offers various options for a variable search:

In this case you need the variables "pgfamstd" (martial status) and "plh0182" (life satisfaction).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
*-------------------------------------------------------------------------------
*** Step 2) Add the relavant variables: here: family status and life satisfaction ***
merge 1:1 pid syear using ${MY_IN_PATH}\pgen, keepusing(pgfamstd) keep(1 3) nogen	

		// merges family status from pgen
		// Documentation for PGEN can be found here
		// http://panel.gsoep.de/soep-docs/surveypapers/diw_ssp0307.pdf)

		
*describe using pl (directory)
		// for checking out variable names without opening the dataset
		
merge 1:1 pid syear using ${MY_IN_PATH}\pl, keepusing(plh0182) keep(1 3) nogen
		// merges life satisfaction from pl 

save $MY_OUT_DATA\ppfad.dta, replace

Clean and inspect the data

Recode all missings into the format of a point.

1
2
3
*-------------------------------------------------------------------------------
*** Step 3) Clean and inspect the data
mvdecode _all, mv(-8/-1)

Since you are interested in individual characteristics in your analysis: Delete all measurements that are not based on successful personal interviews.

1
2
tab netto
drop if netto>19
../_images/SOEPlong_01.PNG

How many people contribute measurements and what is the proportion of people contributing at least 10 measurements?

Define the data set as a panel data set.

1
2
3
**define the data set as panel data
xtset pid syear
xtdes
../_images/SOEPlong_02.PNG

86079 respondents have contributed information within waves a (1984) - bg (2016) and 75% of the 86079 respondents have provided information for at least 10 waves

How many people took part in the survey in 2010 and contributed to continuous measurements until 2014?

1
xtdes if syear>=2010 & syear<=2014
../_images/SOEPlong_03.PNG

14673 respondents provided continuous information from 2010 to 2014.

Univariate inspection & analysis

How does the mean of life satisfaction change over time?

1
2
3
*-------------------------------------------------------------------------------
*** Step 4) univariate inspection & analysis
table syear, content (mean plh0182)
../_images/SOEPlong_04.PNG

How high is the proportion of people who will be a) married in 2014 or b) have a migration background. Compare weighted with unweighted frequency tables: Which people are overrepresented in SOEP?

1
2
3
tab1 pgfamstd migback if syear==2014
tab pgfamstd [aw=phrf] if syear==2014
tab migback [aw=phrf] if syear==2014
../_images/SOEPlong_05.PNG
../_images/SOEPlong_06.PNG

The data show that married people are overrepresented in the SOEP and single people are underrepresented. The weighting makes it representative for Germany again.

../_images/SOEPlong_05b.PNG
../_images/SOEPlong_07.PNG

In the SOEP sample, respondents with a direct or indirect migration background are overrepresented.

How many of those persons who report an life satisfaction (scale value 7) in a survey year also indicate the scale value 7 in the following survey year?

1
xttrans plh0182
../_images/SOEPlong_08.PNG

34.57% of the respondents who reported a life satisfaction of 7 again reported a value of 7 in the following year.

Is it more likely that a highly dissatisfied person (value: 0) will be less dissatisfied the following year, or that a very satisfied (value: 10) person will be less satisfied the following year?

1
xttrans plh0182
../_images/SOEPlong_08.PNG

The rows reflect the initial values, and the columns reflect the final values. People who were completely dissatisfied (value: 0) in the base year remain completely dissatisfied with around 20 % in the following year. About 80% of these dissatisfied people from the base year improve their life satisfaction in the following year. Of the completely satisfied persons (value: 10), about 37% remain just as satisfied in the following year. For 63%, however, life satisfaction worsens. It is more likely that a completely dissatisfied person (value: 0) will become more satisfied in the following year.

Which transitions in marital status can be observed particularly frequently in the data?

1
xttrans pgfamstd
../_images/SOEPlong_09.PNG

Survey respondents who were married but separated in the base year and declared a divorce as family status in the following year can be observed particularly frequently. (About 19%).

Simple cross sectional analyses

You now want to discover the correlation between marital status and life satisfaction. Is there an effect of marriage on life satisfaction? And if so, is this a sustainable effect?

First, calculate the correlation between family status and life satisfaction in cross section for 2010: Are married people happier than singles?

1
2
3
*-------------------------------------------------------------------------------
*** Step 5)simple cross sectional analyses
table pgfamstd if syear==2010, content (mean plh0182)
../_images/SOEPlong_10.PNG

At first glance, married couples seem happier than singles.

Now generate a variable that indicates a transition from “single” to “married”.

How many such transitions can you find in the data?

1
2
3
4
***perform longitudinal analysis
**define event: transition to marriage	
generate to_mar=1 if pgfamstd==1 & l.pgfamstd==3
tab to_mar
../_images/SOEPlong_11.PNG

A total of 4834 people can be observed changing from single to married.

What is the average level of life satisfaction immediately after the transition to marriage (i.e. in the first survey in which the transition can be observed) and how high is life satisfaction immediately before the transition to marriage?

1
2
3
4
5
6
7
**standard way of life-event analysis
sum plh0182 if to_mar==1
sum l.plh0182 if to_mar==1

**alternative way
generate dif_sat= plh0182- l.plh0182
mean dif_sat if to_mar==1
../_images/SOEPlong_12.PNG
../_images/SOEPlong_13.PNG

Before the transition to marriage, the average life satisfaction of the respondents is 7.54. in the following year, i.e. after the transition to marriage, the average life satisfaction of the respondents is 7.65. It can be seen that with the transition to marriage, the average life satisfaction rises slightly by 0.11.

Map the complete satisfaction history around the “marriage entry” event [3 years before; 3 years after].

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
**preparing illustration of trajectory
generate t=0 if to_mar==1 & l.to_mar~=1 &l2.to_mar~=1 & l3.to_mar~=1 & l4.to_mar~=1 & l5.to_mar~=1 & l6.to_mar~=1 & l7.to_mar~=1 & l8.to_mar~=1 & l9.to_mar~=1 & l10.to_mar~=1 & l11.to_mar~=1 & l12.to_mar~=1 & l13.to_mar~=1 & l14.to_mar~=1
replace t=1 if l.t==0
replace t=2 if l2.t==0
replace t=3 if l3.t==0
replace t=-1 if f.t==0
replace t=-2 if f2.t==0
replace t=-3 if f3.t==0

table t, content (mean plh0182 n plh0182)
../_images/SOEPlong_14.PNG

Choose a suitable presentation for your results and let Stata create a graphic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
** Preparing graph of event analysis												
sort t
cap drop meanplh0182
by t: egen meanplh0182 = mean(plh0182)

cap drop upper
gen upper = .
forval i = -3/3{ 
	su plh0182 if t == `i'
	replace upper = r(mean) + 1.96 * r(sd)/sqrt(r(N)) if t == `i'
}

cap drop lower
gen lower = .
forval i = -3/3{ 
	su plh0182 if t == `i'
	replace lower = r(mean) - 1.96 * r(sd)/sqrt(r(N)) if t == `i'
}

twoway (line meanplh0182 t) (rcap upper lower t, lcolor("red")) , title("Satisfaction with life relative to year of marriage") legend(label(1 "Avg. life satisfaction") label(2 "95% Conf. interval")) scheme(s1mono) xtitle("Years relative to marriage") ytitle("Avg. life satisfaction")
../_images/SOEPlong_15.PNG

The graph shows that a positive effect on life satisfaction can be observed when the family status changes from single to married. In the following years of the existing marriage, life satisfaction decreases again and approaches the initial satisfaction before the marriage.