Working with SOEP data in csv format

SOEP offers the data in statistical program specific file formats (e.g.: Stata .dta) and also as comma-separated values FIle (csv). With these csvs you can read the non-formatted information directly into a statistical program of your choice.

This example shows how to open SOEP data of data version v.36 in csv format with an old Stata version (12) and how to prepare the data in an efficient way.

Create an exercise path with four subfolders:

../_images/uebungspfade.PNG

Example:

  • H:/material/exercises/do

  • H:/material/exercises/output

  • H:/material/exercises/temp

  • H:/material/exercises/log

These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define the paths you created with globals:

1
2
3
4
5
6
7
8
9
***********************************************
* Set relative paths to the working directory
***********************************************
global AVZ 	"H:\material\exercises"
global MY_IN_PATH "\\hume\rdc-prod\distribution\soep-core\soep.v35\csv"
global MY_DO_FILES "$AVZ\do\"
global MY_LOG_OUT "$AVZ\log\"
global MY_OUT_DATA "$AVZ\output\"
global MY_OUT_TEMP "$AVZ\temp\"

The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.

For the following script to work, the global “MY_IN_PATH” must contain the folder path to the SOEP csv files of all datasets. The csv files for each data set should always consist of three csvs. If we want to import and prepare the dataset jugendl in csv format, we need the following csv Files:

  • jugendl.csv

  • jugendl_variables.csv

  • jugendl_values.csv

In the SOEP, the csv of each data set contains the variables as columns and their numerical values. Variables and Values csvs contain the variable labels and the value labels for the data set. First some packages for Stata have to be installed so that the process can start.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
* Import and Labeling of SOEP csv-Files
clear
set more off

* Load ados
capture which adolist
if _rc==111{ 
	ssc install adolist
}
quietly adolist list
local allAdos `r(names)'
foreach package in fre labutil2 chardef labundef saveascii useold {
	if !regexm("`r(names)'", " `package' ") {
		display as result "Paket " as error  "`package'" as result " wird versucht über SSC-Server zu installieren"
		ssc install `package'
	} 
} 	

Once the packages are installed, you will need to define the following functions to be able to label your dataset later. We define the function soeplabelsvars for linking the variables to the variable labels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
* Assign German variable labels from *_variables.csv
capture program drop soeplabelsvars
program soeplabelsvars
version 12
syntax , varlabels(string)
    preserve
        insheet using "`varlabels'", clear names
		putmata varLab = (variable label_de) ,replace 
	restore
	foreach variable of varlist * {
		label variable `variable' ""
	}

	mata: st_local("n", strofreal(rows(varLab)))
	forvalues i = 1/`n' {
		mata: st_local("varName",varLab[`i',1])
		mata: st_local("varLabel",varLab[`i',2])
		capture confirm variable  `varName'
		if !_rc {
			di "Variable: `varName' mit -`varLabel'- gelabelt"
			label variable `varName' "`varLabel'"	
		}
		else di "Variable " as error "`varName'" as result " nicht vorhanden"
	}
end

The soeplabelvals function links the information in the data set with valuelabels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
* Assign German value labels from *_values.csv
capture program drop soeplabelsvals
program soeplabelsvals
version 12
syntax , vallabels(string)
    quietly label drop _all
	quietly labundef , detach
	preserve
        insheet using "`vallabels'", clear names
    	quietly tostring value, replace
    	putmata valLab = (variable value label_de) ,replace 
    	quietly levelsof variable, local(variables) clean
	restore
	foreach variable in `variables' {
			di "------------"
			di "Variable `variable' wird gelabelt"
            mata:  valLabVar= select(valLab, valLab[.,1]:=="`variable'")
			mata: st_vlmodify("`variable'",  strtoreal(valLabVar[.,2]) ,valLabVar[.,3])
			capture confirm variable  `variable'
			if !_rc label value `variable' `variable' , nofix
            else di "Variable " as error "`variable'" as result " nicht vorhanden"
	}
end	

After both functions have been loaded we can define in a local the dataset we want to import and prepare as csv. We load the csv via the insheet command. Then we use the defined functions and use the variables.csv and values.csv provided by SOEP to label the data.

1
2
3
4
5
* import and label dataset
local dataset = "jugendl"
insheet using "$MY_IN_PATH/`dataset'.csv", clear names
soeplabelsvars, varlabels("$MY_IN_PATH/`dataset'_variables.csv")
soeplabelsvals, vallabels("$MY_IN_PATH/`dataset'_values.csv")

Congratulations you should now have a fully labeled dataset!

Last change: May 04, 2021