Data Distribution File

In the SOEP, each survey year is allocated to a data wave, which is abbreviated using the letters of the alphabet. One data wave may be released in several versions, which are displayed in SOEP with a “v” for version and the respective version number. The version number represents the survey years since the beginning of the survey. The SOEP has recently published the 38th version since the survey began in 1984. Within a data wave, updates may be made over time, such as v34.1. If updates have been made, users will be informed through various channels and be asked to order the data again. After ordering the data, the data will be sent to you in a zip file.

../_images/SOEP_1.png

Within this zip file you will find various datasets, a “raw” subdirectory and the “eu-silc-like-panel” subdirectory.

../_images/SOEP.png

The datasets in the soepdata folder are a highly compressed and easy-to-analyze version of the SOEP data.

Note

SOEP strongly recommends that users use the soepdata top-level folder.

../_images/SOEP_2.png

The data in SOEP-Core are no longer provided only as wave-specific individual files but are now pooled across all available years (in “long” format). In some cases, variables are harmonized to ensure that they are defined consistently over time. For example, the income information provided up to 2001 is given in euros, and categories are modified over time when versions of the questionnaire have been changed. The longitudinal nature of the data is one of the biggest assets of the SOEP. This is why we provide longitudinal datasets such as PL or HL. The advantage of such a dataset is that longitudinal analyses can be carried out without great effort.

If you need more information about the “long” data structure, see chapter Data Structure in “Long” Format (long).

Core Datasets

The datasets in the soepdata folder:

Tracking Data

Original Data

Survey Data

Generated Data

Spell Data

hbrutt

abroad

design

bioagel

artkalen

hbrutto

biol

pbr_hhch

biobirth

biocouplm

hpath

hl

bioedu

biocouply

hpathl

jugendl

bioimmig

biomarsm

pbr_exit

kidlong

biojob

biomarsy

pbrutto

more_docu

bioparen

lifespell

ppath

more_local

biopupil

migspell

ppathl

pl

biosib

pbiospe

instrumentation

plueckel

biotwin

refugspell

vpl

camces

sozkalen

cog_refu

cogdj

cognit

gripstr

hconsum

health

hgen

hwealth

interviewer

mihinc

pequiv

pflege

pgen

pkal

pwealth

timepref

trust

Raw Datasets

In the “raw” directory, you will find all wave-specific datasets that were used to generate the long datasets on the previously presented level.

../_images/SOEP_4.png

Attention

Please note that the datasets in the soepdata folder are completely sufficient for your data analysis. The datasets used to generate the SOEP-Core data can be found in the raw subdirectory. Detailed information about the raw datasets can be found here Raw Data

../_images/SOEP_3.png

Within this “raw” directory, each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on. To simplify the notation, the “$” sign is used when referring to all waves of one group of datasets. For example, $H refers to all household-level datasets from AH to now. For each year of SOEP data, there are single data files for households (e.g., $H) as well as for individual respondents (e.g., $P) and children (e.g., $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e., all households and their members that were eligible for an interview in any given year. Within the “raw” directory, the datasets are stored on a wave-specific basis and are the basis for generating the majority of the long datasets described above. In addition to these wave-specific datasets, the “RAW” directory also contains additional datasets in cross-sectional format that have not yet been distributed in long format ($SCHOOL, $SCHOOL2, EV, EXIT, $PKALOST and PBR_HHCH).

Tracking Data

Original Data

Survey Data

Generated Data

Spell Data

ppfad

$p

phrf

$pgen

einkalen

hpfad

$pausl

hhrf

$hgen

sozkalen

$pbrutto

$pluecke

$biorki

$pequiv

$hbrutto

$h

cirdef

$pkal

hbrutt$$

$post

exit

$pkalost

cov_brutto

$jugend

cov_contact

$school

$school2

ev

$vp

$kind

eu-silc-like-panel

The European Union Statistics on Income and Living Conditions (EU-SILC) contains data from across Europe on individual and household income, household living conditions, individual health, aspects of child care, employment, and self-assessed financial situation. EU-SILC offers both cross-sectional and longitudinal data. The German EU-SILC dataset currently contains only cross-sectional data. The eu-silc-like-panel dataset provided at DIW Berlin offers additional longitudinal information on private households in Germany based on data from the Socio-Economic Panel (SOEP) study since 2005. The eu-silc-like-panel is included in the annual SOEP data release since 2018 and requires a data distribution contract with DIW Berlin. The SOEP data are provided free of charge for scientific research. Researchers can compare all of the information in the dataset with longitudinal data on other European countries that can be obtained from Eurostat upon request.

../_images/SOEP_6.png
../_images/SOEP_7.png

The eu-silc-like-panel includes all of the four EU-SILC sub-datasets: The household register (D-File), the personal register (R-File), personal data (P-File), and household data (H-File). The clone datasets can be combined using the R-File, which includes both the current household and individual identifier. The identifiers in the eu-silc-like-panel are unique and do not vary among the four datasets. Complete documentation on the datasets can be found here: Documentation EU-SILC.

Last change: Feb 21, 2024