Working with SOEP data in csv format

SOEP offers the data in statistical program specific file formats (e.g.: Stata .dta) and also as comma-separated values FIle (csv). With these csvs you can read the non-formatted information directly into a statistical program of your choice.

This example shows how to open SOEP data of data version v.36 in csv format with an old Stata version (12) and how to prepare the data in an efficient way.

Create an exercise path with four subfolders:

../_images/uebungspfade.png

Example:

  • H:/material/exercises/do

  • H:/material/exercises/output

  • H:/material/exercises/temp

  • H:/material/exercises/log

These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define the paths you created with globals:

1***********************************************
2* Set relative paths to the working directory
3***********************************************
4global AVZ 	"H:\material\exercises"
5global MY_IN_PATH "\\hume\rdc-prod\distribution\soep-core\soep.v35\csv"
6global MY_DO_FILES "$AVZ\do\"
7global MY_LOG_OUT "$AVZ\log\"
8global MY_OUT_DATA "$AVZ\output\"
9global MY_OUT_TEMP "$AVZ\temp\"

The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

For the following script to work, the global “MY_IN_PATH” must contain the folder path to the SOEP csv files of all datasets. The csv files for each data set should always consist of three csvs. If we want to import and prepare the dataset jugendl in csv format, we need the following csv Files:

  • jugendl.csv

  • jugendl_variables.csv

  • jugendl_values.csv

In the SOEP, the csv of each data set contains the variables as columns and their numerical values. Variables and Values csvs contain the variable labels and the value labels for the data set. First some packages for Stata have to be installed so that the process can start.

 1* Import and Labeling of SOEP csv-Files
 2clear
 3set more off
 4
 5* Load ados
 6capture which adolist
 7if _rc==111{ 
 8	ssc install adolist
 9}
10quietly adolist list
11local allAdos `r(names)'
12foreach package in fre labutil2 chardef labundef saveascii useold {
13	if !regexm("`r(names)'", " `package' ") {
14		display as result "Paket " as error  "`package'" as result " wird versucht über SSC-Server zu installieren"
15		ssc install `package'
16	} 
17} 	

Once the packages are installed, you will need to define the following functions to be able to label your dataset later. We define the function soeplabelsvars for linking the variables to the variable labels.

 1* Assign German variable labels from *_variables.csv
 2capture program drop soeplabelsvars
 3program soeplabelsvars
 4version 12
 5syntax , varlabels(string)
 6    preserve
 7        insheet using "`varlabels'", clear names
 8		putmata varLab = (variable label_de) ,replace 
 9	restore
10	foreach variable of varlist * {
11		label variable `variable' ""
12	}
13
14	mata: st_local("n", strofreal(rows(varLab)))
15	forvalues i = 1/`n' {
16		mata: st_local("varName",varLab[`i',1])
17		mata: st_local("varLabel",varLab[`i',2])
18		capture confirm variable  `varName'
19		if !_rc {
20			di "Variable: `varName' mit -`varLabel'- gelabelt"
21			label variable `varName' "`varLabel'"	
22		}
23		else di "Variable " as error "`varName'" as result " nicht vorhanden"
24	}
25end

The soeplabelvals function links the information in the data set with valuelabels.

 1* Assign German value labels from *_values.csv
 2capture program drop soeplabelsvals
 3program soeplabelsvals
 4version 12
 5syntax , vallabels(string)
 6    quietly label drop _all
 7	quietly labundef , detach
 8	preserve
 9        insheet using "`vallabels'", clear names
10    	quietly tostring value, replace
11    	putmata valLab = (variable value label_de) ,replace 
12    	quietly levelsof variable, local(variables) clean
13	restore
14	foreach variable in `variables' {
15			di "------------"
16			di "Variable `variable' wird gelabelt"
17            mata:  valLabVar= select(valLab, valLab[.,1]:=="`variable'")
18			mata: st_vlmodify("`variable'",  strtoreal(valLabVar[.,2]) ,valLabVar[.,3])
19			capture confirm variable  `variable'
20			if !_rc label value `variable' `variable' , nofix
21            else di "Variable " as error "`variable'" as result " nicht vorhanden"
22	}
23end	

After both functions have been loaded we can define in a local the dataset we want to import and prepare as csv. We load the csv via the insheet command. Then we use the defined functions and use the variables.csv and values.csv provided by SOEP to label the data.

1* import and label dataset
2local dataset = "jugendl"
3insheet using "$MY_IN_PATH/`dataset'.csv", clear names
4soeplabelsvars, varlabels("$MY_IN_PATH/`dataset'_variables.csv")
5soeplabelsvals, vallabels("$MY_IN_PATH/`dataset'_values.csv")

Congratulations you should now have a fully labeled dataset!

Last change: Feb 21, 2024