Data Distribution File

In the SOEP, each survey year is allocated to a data wave, which is abbreviated using the letters of the alphabet. One data wave may be released in several versions, which are displayed in SOEP with a “v” for version and the respective version number. The version number represents the survey years since the beginning of the survey. The SOEP has recently published the 34th version since the survey began in 1984. Within a data wave, updates may be made over time, such as v34.1. If updates have been made, users will be informed through various channels and be asked to order the data again. After ordering the data, the data will be sent to you in a zip file.

../_images/SOEP_1.PNG

Within this zip file you will find various datasets, a “raw” subdirectory and the “EU-SILC Clone” subdirectory.

../_images/SOEP.PNG

The datasets in the top-level folder are a highly compressed and easy-to-analyze version of the SOEP data.

Note

SOEP strongly recommends that users use the top-level folder.

../_images/SOEP_2.PNG

The data in SOEP-Core are no longer provided only as wave-specific individual files but are now pooled across all available years (in “long” format). In some cases, variables are harmonized to ensure that they are defined consistently over time. For example, the income information provided up to 2001 is given in euros, and categories are modified over time when versions of the questionnaire have been changed. The longitudinal nature of the data is one of the biggest assets of the SOEP. This is why we provide longitudinal datasets such as PL or HL. The advantage of such a dataset is that longitudinal analyses can be carried out without great effort.

If you need more information about the “long” data structure, see chapter Data Structure in “Long” Format (long).

Core Datasets

The datasets in the top-level folder:

Tracking Data

Original Data

Survey Data

Generated Data

Spell Data

ppathl

pl

csamp

pgen

artkalen

hpathl

hl

design

hgen

biocouplm

pbrutto

biol

exit

bioagel

biocouply

hbrutto

jugendl

kidlong

biomarsm

hbrutt

plueckel

pequiv

biomarsy

pbr_exit

abroad

biobirth

einkalen

vpl

bioedu

lifespell

bioimmig

migspell

biojob

pbiospe

bioparen

refugspell

biopupil

sozkalen

biosib

biotwin

camces

cogdj

cognit

cog_refu

gripstr

hconsum

health

hwealth

interviewer

mihinc

pflege

pkal

pwealth

timepref

trust

Raw Datasets

In the “raw” directory, you will find all wave-specific datasets that were used to generate the long datasets on the previously presented level.

../_images/SOEP_4.PNG

Attention

Please note that the datasets in the top-level folder are completely sufficient for your data analysis. The datasets used to generate the SOEP-Core data can be found in the raw subdirectory. Detailed information about the raw datasets can be found here Raw Data

../_images/SOEP_3.PNG

Within this “raw” directory, each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on. To simplify the notation, the “$” sign is used when referring to all waves of one group of datasets. For example, $H refers to all household-level datasets from AH to now. For each year of SOEP data, there are single data files for households (e.g., $H) as well as for individual respondents (e.g., $P) and children (e.g., $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e., all households and their members that were eligible for an interview in any given year. Within the “raw” directory, the datasets are stored on a wave-specific basis and are the basis for generating the majority of the long datasets described above. In addition to these wave-specific datasets, the “RAW” directory also contains additional datasets in cross-sectional format that have not yet been distributed in long format ($SCHOOL, $SCHOOL2, EV, EXIT, $PKALOST and PBR_HHCH).

Tracking Data

Original Data

Survey Data

Generated Data

ppfad

$p

phrf

$pgen

hpfad

$pausl

hhrf

$hgen

$pbrutto

$pluecke

pbr_hhch

$kind

$hbrutto

$h

$pequiv

hbrutt$$

$post

$pkal

$jugend

$pkalost

$school

$school2

ev

$vp

biol

EU-SILC Clone

The European Union Statistics on Income and Living Conditions (EU-SILC) contains data from across Europe on individual and household income, household living conditions, individual health, aspects of child care, employment, and self-assessed financial situation. EU-SILC offers both cross-sectional and longitudinal data. The German EU-SILC dataset currently contains only cross-sectional data. The EU-SILC Clone dataset provided at DIW Berlin offers additional longitudinal information on private households in Germany based on data from the Socio-Economic Panel (SOEP) study since 2005. The EU-SILC Clone is included in the annual SOEP data release since 2018 and requires a data distribution contract with DIW Berlin. The SOEP data are provided free of charge for scientific research. Researchers can compare all of the information in the dataset with longitudinal data on other European countries that can be obtained from Eurostat upon request.

../_images/SOEP_6.PNG
../_images/SOEP_7.PNG

The EU-SILC Clone includes all of the four EU-SILC sub-datasets: The household register (D-File), the personal register (R-File), personal data (P-File), and household data (H-File). The clone datasets can be combined using the R-File, which includes both the current household and individual identifier. The identifiers in the EU-SILC Clone are unique and do not vary among the four datasets. Complete documentation on the datasets can be found here: Documentation EU-SILC.

Last change: May 04, 2021