Data Distribution File¶

In the SOEP, each survey year is allocated to a data wave, which is abbreviated using the letters of the alphabet. One data wave may be released in several versions, which are displayed in SOEP with a “v” for version and the respective version number. The version number represents the survey years since the beginning of the survey. Within a data wave, updates may be made over time, such as v40.1. If updates have been made, users will be informed through various channels and be asked to order the data again. After ordering the data, the data will be sent to you in a zip file.

Within this zip file you will find various datasets, a “raw”, “linkage”, and “eu-silc-like-panel” subdirectory.

The datasets in the soepdata folder are a highly compressed and easy-to-analyze version of the SOEP data.

Note

SOEP strongly recommends that users use the soepdata top-level folder.

The data in SOEP-Core are no longer provided only as wave-specific individual files but are now pooled across all available years (in “long” format). In some cases, variables are harmonized to ensure that they are defined consistently over time. For example, the income information provided up to 2001 is given in euros, and categories are modified over time when versions of the questionnaire have been changed. The longitudinal nature of the data is one of the biggest assets of the SOEP. This is why we provide longitudinal datasets such as PL or HL. The advantage of such a dataset is that longitudinal analyses can be carried out without great effort.

If you need more information about the “long” data structure, see chapter Data Structure in “Long” Format (long).

Core Datasets¶

The datasets in the soepdata folder:

Tracking Data	Original Data	Survey Data	Generated Data	Spell Data
hbrutt	abroad	design	bioagel	artkalen
hbrutto	biol	pbr_hhch	biobirth	biocouplm
hpath	childl		bioedu	biocouply
hpathl	hl		bioimmig	biomarsm
pbr_exit	youthl		biojob	biomarsy
pbrutto	jugendl		bioparen	lifespell
ppath	kidlong		biopupil	migspell
ppathl	more_docu		biosib	pbiospe
instrumentation	more_local		biotwin	refugspell
	pl		camces	sozkalen
	plueckel		cog_refu
	vpl		cogdj
			cognit
			gripstr
			hconsum
			health
			hgen
			hwealth
			interviewer
			mihinc
			pequiv
			pflege
			pgen
			pkal
			pwealth
			timepref
			trust

Raw Datasets¶

In the “raw” directory, you will find all wave-specific datasets that were used to generate the long datasets on the previously presented level.

Attention

Please note that the datasets in the soepdata folder are completely sufficient for your data analysis. The datasets used to generate the SOEP-Core data can be found in the raw subdirectory. Detailed information about the raw datasets can be found here Raw Data

Within this “raw” directory, each wave is identified by letters of the alphabet: the first wave in 1984 is wave “A”, 1985 is wave “B”, and so on. To simplify the notation, the “$” sign is used when referring to all waves of one group of datasets. For example, $H refers to all household-level datasets from AH to now. For each year of SOEP data, there are single data files for households (e.g., $H) as well as for individual respondents (e.g., $P) and children (e.g., $KIND) based on interview information. These observations make up the “net” population, with each of these files containing as many records as interviews could be conducted. Additional data files with a limited number of variables based on the “address log” constitute the “gross” number of households and persons, i.e., all households and their members that were eligible for an interview in any given year. Within the “raw” directory, the datasets are stored on a wave-specific basis and are the basis for generating the majority of the long datasets described above. In addition to these wave-specific datasets, the “RAW” directory also contains additional datasets in cross-sectional format that have not yet been distributed in long format ($SCHOOL, $SCHOOL2, EV, EXIT, $PKALOST and PBR_HHCH).

Tracking Data	Original Data	Survey Data	Generated Data	Spell Data
ppfad	$p	phrf	$pgen	einkalen
hpfad	$pausl	hhrf	$hgen	sozkalen
$pbrutto	$pluecke	$biorki	$pequiv
$hbrutto	$h	cirdef	$pkal
hbrutt$$	$post	exit	$pkalost
cov_brutto	$jugend
cov_contact	$youth
	$school
	$school2
	ev
	$vp
	$kind
	$child

eu-silc-like-panel¶

The European Union Statistics on Income and Living Conditions (EU-SILC) contains data from across Europe on individual and household income, household living conditions, individual health, aspects of child care, employment, and self-assessed financial situation. EU-SILC offers both cross-sectional and longitudinal data. The German EU-SILC dataset currently contains only cross-sectional data. The eu-silc-like-panel dataset provided at DIW Berlin offers additional longitudinal information on private households in Germany based on data from the Socio-Economic Panel (SOEP) study since 2005. The eu-silc-like-panel is included in the annual SOEP data release since 2018 and requires a data distribution contract with DIW Berlin. The SOEP data are provided free of charge for scientific research. Researchers can compare all of the information in the dataset with longitudinal data on other European countries that can be obtained from Eurostat upon request.

The eu-silc-like-panel includes all of the four EU-SILC sub-datasets: The household register (D-File), the personal register (R-File), personal data (P-File), and household data (H-File). The clone datasets can be combined using the R-File, which includes both the current household and individual identifier. The identifiers in the eu-silc-like-panel are unique and do not vary among the four datasets. Complete documentation on the datasets can be found here: Documentation EU-SILC.

Last change: Apr 07, 2026