Data Distribution File¶
In the SOEP, each survey year is allocated to a data wave, which is abbreviated using the letters of the alphabet. One data wave may be released in several versions, which are displayed in SOEP with a “v” for version and the respective version number. The version number represents the survey years since the beginning of the survey. The SOEP has recently published the 38th version since the survey began in 1984. Within a data wave, updates may be made over time, such as v34.1. If updates have been made, users will be informed through various channels and be asked to order the data again. After ordering the data, the data will be sent to you in a zip file.
Within this zip file you will find various datasets, a “raw” subdirectory and the “eu-silc-like-panel” subdirectory.
The datasets in the soepdata folder are a highly compressed and easy-to-analyze version of the SOEP data.
Note
SOEP strongly recommends that users use the soepdata top-level folder.
The data in SOEP-Core are no longer provided only as wave-specific individual files but are now pooled across all available years (in “long” format). In some cases, variables are harmonized to ensure that they are defined consistently over time. For example, the income information provided up to 2001 is given in euros, and categories are modified over time when versions of the questionnaire have been changed. The longitudinal nature of the data is one of the biggest assets of the SOEP. This is why we provide longitudinal datasets such as PL or HL. The advantage of such a dataset is that longitudinal analyses can be carried out without great effort.
If you need more information about the “long” data structure, see chapter Data Structure in “Long” Format (long).
Core Datasets¶
The datasets in the soepdata folder:
Tracking Data |
Original Data |
Survey Data |
Generated Data |
Spell Data |
---|---|---|---|---|
hbrutt |
abroad |
design |
bioagel |
artkalen |
hbrutto |
biol |
pbr_hhch |
biobirth |
biocouplm |
hpath |
hl |
bioedu |
biocouply |
|
hpathl |
jugendl |
bioimmig |
biomarsm |
|
pbr_exit |
kidlong |
biojob |
biomarsy |
|
pbrutto |
more_docu |
bioparen |
lifespell |
|
ppath |
more_local |
biopupil |
migspell |
|
ppathl |
pl |
biosib |
pbiospe |
|
instrumentation |
plueckel |
biotwin |
refugspell |
|
vpl |
camces |
sozkalen |
||
cog_refu |
||||
cogdj |
||||
cognit |
||||
gripstr |
||||
hconsum |
||||
health |
||||
hgen |
||||
hwealth |
||||
interviewer |
||||
mihinc |
||||
pequiv |
||||
pflege |
||||
pgen |
||||
pkal |
||||
pwealth |
||||
timepref |
||||
trust |
Raw Datasets¶
In the “raw” directory, you will find all wave-specific datasets that were used to generate the long datasets on the previously presented level.
Attention
Please note that the datasets in the soepdata folder are completely sufficient for your data analysis. The datasets used to generate the SOEP-Core data can be found in the raw subdirectory. Detailed information about the raw datasets can be found here Raw Data
Within this “raw” directory, each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on. To simplify the notation, the “$” sign is used when referring to all waves of one group of datasets. For example, $H refers to all household-level datasets from AH to now. For each year of SOEP data, there are single data files for households (e.g., $H) as well as for individual respondents (e.g., $P) and children (e.g., $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e., all households and their members that were eligible for an interview in any given year. Within the “raw” directory, the datasets are stored on a wave-specific basis and are the basis for generating the majority of the long datasets described above. In addition to these wave-specific datasets, the “RAW” directory also contains additional datasets in cross-sectional format that have not yet been distributed in long format ($SCHOOL, $SCHOOL2, EV, EXIT, $PKALOST and PBR_HHCH).
Tracking Data |
Original Data |
Survey Data |
Generated Data |
Spell Data |
---|---|---|---|---|
ppfad |
$p |
phrf |
$pgen |
einkalen |
hpfad |
$pausl |
hhrf |
$hgen |
sozkalen |
$pbrutto |
$pluecke |
$biorki |
$pequiv |
|
$hbrutto |
$h |
cirdef |
$pkal |
|
hbrutt$$ |
$post |
exit |
$pkalost |
|
cov_brutto |
$jugend |
|||
cov_contact |
$school |
|||
$school2 |
||||
ev |
||||
$vp |
||||
$kind |
eu-silc-like-panel¶
The European Union Statistics on Income and Living Conditions (EU-SILC) contains data from across Europe on individual and household income, household living conditions, individual health, aspects of child care, employment, and self-assessed financial situation. EU-SILC offers both cross-sectional and longitudinal data. The German EU-SILC dataset currently contains only cross-sectional data. The eu-silc-like-panel dataset provided at DIW Berlin offers additional longitudinal information on private households in Germany based on data from the Socio-Economic Panel (SOEP) study since 2005. The eu-silc-like-panel is included in the annual SOEP data release since 2018 and requires a data distribution contract with DIW Berlin. The SOEP data are provided free of charge for scientific research. Researchers can compare all of the information in the dataset with longitudinal data on other European countries that can be obtained from Eurostat upon request.
The eu-silc-like-panel includes all of the four EU-SILC sub-datasets: The household register (D-File), the personal register (R-File), personal data (P-File), and household data (H-File). The clone datasets can be combined using the R-File, which includes both the current household and individual identifier. The identifiers in the eu-silc-like-panel are unique and do not vary among the four datasets. Complete documentation on the datasets can be found here: Documentation EU-SILC.
Last change: Jun 06, 2024