Data Distribution File

In the SOEP, each survey year is allocated to a data wave, which is abbreviated using the letters of the alphabet. One data wave may be released in several versions, which are displayed in SOEP with a “v” for version and the respective version number. The version number represents the survey years since the beginning of the survey. The SOEP has recently published the 34th version since the survey began in 1984. Within a data wave, updates may be made over time, such as v34.1. If updates have been made, users will be informed through various channels and be asked to order the data again. After ordering the data, the data will be sent to you in a zip file.

../_images/SOEP_1.PNG

Within this zip file you will find various data sets, a “raw” subdirectory and the “EU-SILC Clone” subdirectory.

../_images/SOEP.PNG

The data sets above the “raw” subdirectory are highly compressed and an easy to analyze version of the SOEP data.

Note

SOEP strongly recommends that users use the data above the “raw” subdirectory.

../_images/SOEP_2.PNG

The data in SOEP-Core are no longer provided only as wave-specific individual files but are now pooled across all available years (in “long” format). In some cases, variables are harmonized to ensure that they are defined consistently over time. For example, the income information provided up to 2001 is given in euros, and categories are modified over time when versions of the questionnaire have been changed. The longitudinal nature of the data is one of the biggest assets of the SOEP. This is why we provide longitudinal datasets such as PL or HL. The advantage of such a dataset is that longitudinal analyses can be carried out without great effort.

If you need more information about the “long” data structure, see chapter Data Structure in long Format (long).

Core Data Sets

The data sets above the “raw” subdirectory:

Tracking Data

Original Data

Survey Data

Generated Data

Spell Data

ppathl

pl

csamp

pgen

artkalen

hpathl

hl

design

hgen

biocouplm

pbrutto

jugendl

exit

bioage17

biocouply

hbrutto

plueckel

bioagel

biomarsm

pbr_exit

abroad

kidlong

biomarsy

vpl

pequiv

einkalen

biobirth

lifespell

bioedu

migspell

bioimmig

pbiospe

biojob

refugspell

bioparen

sozkalen

biopupil

bioresid

biosib

biosoc

biotwin

camces

cogdj

cognit

cog_refu

gripstr

hconsum

health

hwealth

interviewer

mihinc

pflege

pkal

pwealth

timepref

trust

Raw Data Sets

In the “raw” directory, you will find all wave-specific datasets that were used to generate the long datasets on the previously presented level.

../_images/SOEP_4.PNG

Attention

Please note that the datasets above the raw subdirectory are completely sufficient for your data analysis. The datasets used to generate the SOEP-Core data can be found in the raw subdirectory. Detailed information about the raw datasets can be found here Raw Data

../_images/SOEP_3.PNG

Within this “raw” directory, each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on. To simplify the notation, the “$” sign is used when referring to all waves of one group of datasets. For example, $H refers to all household-level datasets from AH to now. For each year of SOEP data, there are single data files for households (e.g., $H) as well as for individual respondents (e.g., $P) and children (e.g., $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e., all households and their members that were eligible for an interview in any given year. Within the “raw” directory, the datsets are stored on a wave-specific basis and are the basis for generating the majority of the long datsets described above. In addition to these wave-specific datsets, the “RAW” directory also contains additional datasets in cross-sectional format that have not yet been distributed in long format ($SCHOOL, $SCHOOL2, EV, EXIT, $PKALOST and PBR_HHCH).

Tracking Data

Original Data

Survey Data

Generated Data

ppfad

$p

phrf

$pgen

hpfad

$pausl

hhrf

$hgen

$pbrutto

$pluecke

pbr_hhch

$kind

$hbrutto

$h

$pequiv

$post

$pkal

$jugend

$pkalost

$school

$school2

ev

$vp

biol

EU-SILC-Clone

Currently, the official German EU-SILC is provided only as a cross-sectional datset by the German Federal Statistical Office. A panel dataset will presumably be available from the year 2020 onwards (Bundesrat, 2016). As a consequence, Germany is excluded from cross-country studies exploiting the longitudinal dimension of EU-SILC. The aim of the EU-SILC clone is to provide an EU-SILC-like panel dataset for Germany from the year 2005 onwards so that Germany can be included in cross-country studies using EU-SILC panel data. The EU-SILC clone is built on the Socio-Economic Panel (SOEP) and therefore includes all EU-SILC panel variables for which the required information is recorded in the SOEP.

../_images/SOEP_6.PNG
../_images/SOEP_7.PNG

The EU-SILC-Clone includes all of the four EU-SILC sub-datasets: The household register (D-File), the personal register (R-File), personal data (P-File), and household data (H-File). The clone datasets can be combined using the R-File, which includes both the current household ID and the person ID. The identifiers in the EU-SILC-Clone are unique and do not vary among the four datasets. A complete documentation of the datasets can be found here: Documentation EU-SILC.