Working with SOEP data in csv format¶
SOEP offers the data in statistical program specific file formats (e.g.: Stata .dta) and also as comma-separated values FIle (csv). With these csvs you can read the non-formatted information directly into a statistical program of your choice.
This example shows how to open SOEP data of data version v.36 in csv format with an old Stata version (12) and how to prepare the data in an efficient way.
Create an exercise path with four subfolders:
Example:
H:/material/exercises/do
H:/material/exercises/output
H:/material/exercises/temp
H:/material/exercises/log
These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define the paths you created with globals:
1***********************************************
2* Set relative paths to the working directory
3***********************************************
4global AVZ "H:\material\exercises"
5global MY_IN_PATH "\\hume\rdc-prod\distribution\soep-core\soep.v35\csv"
6global MY_DO_FILES "$AVZ\do\"
7global MY_LOG_OUT "$AVZ\log\"
8global MY_OUT_DATA "$AVZ\output\"
9global MY_OUT_TEMP "$AVZ\temp\"
The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to your ordered data.
For the following script to work, the global “MY_IN_PATH” must contain the folder path to the SOEP csv files of all datasets. The csv files for each data set should always consist of three csvs. If we want to import and prepare the dataset jugendl in csv format, we need the following csv Files:
jugendl.csv
jugendl_variables.csv
jugendl_values.csv
In the SOEP, the csv of each data set contains the variables as columns and their numerical values. Variables and Values csvs contain the variable labels and the value labels for the data set. First some packages for Stata have to be installed so that the process can start.
1* Import and Labeling of SOEP csv-Files
2clear
3set more off
4
5* Load ados
6capture which adolist
7if _rc==111{
8 ssc install adolist
9}
10quietly adolist list
11local allAdos `r(names)'
12foreach package in fre labutil2 chardef labundef saveascii useold {
13 if !regexm("`r(names)'", " `package' ") {
14 display as result "Paket " as error "`package'" as result " wird versucht über SSC-Server zu installieren"
15 ssc install `package'
16 }
17}
Once the packages are installed, you will need to define the following functions to be able to label your dataset later. We define the function soeplabelsvars for linking the variables to the variable labels.
1* Assign German variable labels from *_variables.csv
2capture program drop soeplabelsvars
3program soeplabelsvars
4version 12
5syntax , varlabels(string)
6 preserve
7 insheet using "`varlabels'", clear names
8 putmata varLab = (variable label_de) ,replace
9 restore
10 foreach variable of varlist * {
11 label variable `variable' ""
12 }
13
14 mata: st_local("n", strofreal(rows(varLab)))
15 forvalues i = 1/`n' {
16 mata: st_local("varName",varLab[`i',1])
17 mata: st_local("varLabel",varLab[`i',2])
18 capture confirm variable `varName'
19 if !_rc {
20 di "Variable: `varName' mit -`varLabel'- gelabelt"
21 label variable `varName' "`varLabel'"
22 }
23 else di "Variable " as error "`varName'" as result " nicht vorhanden"
24 }
25end
The soeplabelvals function links the information in the data set with valuelabels.
1* Assign German value labels from *_values.csv
2capture program drop soeplabelsvals
3program soeplabelsvals
4version 12
5syntax , vallabels(string)
6 quietly label drop _all
7 quietly labundef , detach
8 preserve
9 insheet using "`vallabels'", clear names
10 quietly tostring value, replace
11 putmata valLab = (variable value label_de) ,replace
12 quietly levelsof variable, local(variables) clean
13 restore
14 foreach variable in `variables' {
15 di "------------"
16 di "Variable `variable' wird gelabelt"
17 mata: valLabVar= select(valLab, valLab[.,1]:=="`variable'")
18 mata: st_vlmodify("`variable'", strtoreal(valLabVar[.,2]) ,valLabVar[.,3])
19 capture confirm variable `variable'
20 if !_rc label value `variable' `variable' , nofix
21 else di "Variable " as error "`variable'" as result " nicht vorhanden"
22 }
23end
After both functions have been loaded we can define in a local the dataset we want to import and prepare as csv. We load the csv via the insheet command. Then we use the defined functions and use the variables.csv and values.csv provided by SOEP to label the data.
1* import and label dataset
2local dataset = "jugendl"
3insheet using "$MY_IN_PATH/`dataset'.csv", clear names
4soeplabelsvars, varlabels("$MY_IN_PATH/`dataset'_variables.csv")
5soeplabelsvals, vallabels("$MY_IN_PATH/`dataset'_values.csv")
Congratulations you should now have a fully labeled dataset!
Last change: Jan 13, 2025