Data Sets SOEP-Core

SOEP-Core contains a multitude of different datasets. To get an overview of the data, a somewhat simplified categorization helps:

There are Tracking Data and Survey Data files which describe the development of the sample, such that the user knows which person or household was part of the interviewed sample in any given year. Then there are Original Data files, which contain the data from each year’s questionnaires without any changes except for very basic consistency checks. To help the user with the data, there also are Generated Data. These contain consistently coded variables across all waves with common names, such that the users can easily use this information when combining datasets across waves. The SOEP also provides various data on the respondent’s background, called biographical data. Biography data in general can conceptually be separated into biographical data which are unchanging (such as information on parent’s education, or data from the Mother-Child Questionnaires) and data which may be updated through changes in a respondent’s life (such as new children in the birth biography, or a job change in the job history). Some of the changing data is stored as Spell Data. For each spell there is a definition of the spell type, begin point, end point and the censoring status, indicating if a given employment or income spell is censored (left and/or right) or uncensored. One of the biggest assets of the SOEP data is their longitudinal nature, i.e. repeated observations of the same unit (person or household) over time. That’s why we provide longitudinal data sets, such as PL or HL. Finally, there are some files which cannot be easily categorized - some are one-time datasets, some provide information about the interviewers, some about respondents outside of Germany.

There are two datasets which should be the building block of any analysis, as they allow to define longitudinal populations very easily: PPATHL and HPATHL. HPATHL includes all households which have been interviewed successfully at least once. Similarly, PPATHL contains all persons who have ever lived in a household that has participated in the SOEP, i.e. that has been captured in HPATHL, including non-respondents and children. Both data files contain one record per household or person, respectively, with wave-specific variables for each year’s survey status. In addition to some time-invariant information (like gender, year of birth, migrant status), these files contain all necessary identifiers to combine other files with PPATHL and HPATHL. Although they provide essential information, PPATHL and HPATHL alone are of little use for actual analyses. The most often used sources for additional information in SOEP-Core are the cross-sectional data files provided in each survey year (or “wave”) or the data sets in the long-format.

The SOEP data sets can be viewed based on their content classification (Tracking Data, Original Data, Survey Data, Generated Data and Spell Data), the data structure (cross-sectional (cs), wide, long, spell) and also from the respondent’s perspective. From the respondent’s perspective, data sets can contain gross or net information. In addition, some data sets provide information only at the household level and other data sets provide information at the individual level.

../_images/level.PNG

Gross information at household or individual level are provided to users in the data sets hbrutto, hbrutt and pbrutto, pbrutt. Content information collected from household or individual questionnaires, for example, are original data and are stored in HL and PL. The SOEP team generates data from these original data, which are generated from the many SOEP questionnaires. New generated and user-friendly data sets such as pgen are created from the components of PL.

Tracking Data

Tracking data are the basis for linking your research-relevant variables. In addition to various demographic information, tracking data also provide information on how the interview was conducted. These datasets should be understood as initial data that you can use to merge your research-relevant variables via the person and household numbers.

Dataset

Label

Format

Identifier (ID)

Additional Identifier

ppathl

Individual Tracking File

long

pid, syear

hid, cid, parid

hpathl

Household Tracking File

long

hid, syear

cid

pbrutto

Gross Individual Data

long

pid, syear

hid, cid, intid, hhnrold

hbrutto

Gross Household Data

long

hid, syear

cid, intid1, intid

pbr_exit

Cumulated Exit

long

pid, syear

hid, cid, hhnrold

¹In addition to the classic identifiers (pid, hid and cid), these datasets also have the identifiers of older data distribution versions. (pid=persnr; hid=hhnrakt; cid=hhnr).

hpathl “Household Tracking File” (long): HPATHL consists of all waves of the raw datasets HPATH and HHRF. For all years since 1984, the HPATHL datset contains information on all households that have ever participated in the SOEP survey at any point in time. HPATHL is important for the delimitation of the unit of investigation (household), especially in longitudinal analysis. HPATHL is useful particularly for household analysis and can be used for pre-selection of specific households.

ppathl “Individual Tracking File” (long): PPATHL consists of all waves of the raw datasets PPATH and PHRF. For all years since 1984, the PPATHL datset contains information on all persons who have ever lived in a SOEP household at the point in time of a survey (i.e., all respondents, but also children under 17 years of age and persons who have never given an interview). PPATHL is important for the delimitation of the units of investigation (persons), especially for longitudinal analysis. It contains one record for each individual and year a person has been a member of a respondent household. It is keyed on pid and syear, the survey year identifier. It contains the Household ID, the unvarying individual characteristics, individual weights, as well as the response status for that individual in each wave.

pbrutto “Gross Individual Data” (long): PBRUTTO consists of all waves of the raw datasets $PBRUTTO. PBRUTTO covers all respondents who were either interviewed for the first time or contacted for the purpose of being interviewed again in a given wave. The datset provides gross information on all SOEP respondents’ interviews as well as their positions in the panel framework.

hbrutto “Gross Household Data” (long): HBRUTTO consists of all waves of the raw datasets $HBRUTTO. HBRUTTO covers all households that were successfully interviewed for the first time in a wave or were contacted for the purpose of being interviewed again. The datasets provide gross information on all SOEP households’ interviews as well as their positions in the panel framework.

pbr_exit “Cumulated Exit” (long): The dataset pbr_exit is a supplement of pbrutto for individual dropouts. Individual dropouts are removed from the original pbrutto population, so that pbrutto covers all current household members. Pbr_exit contains the corresponding register information on individual dropouts from households.

Original Data

These datsets contain respondents’ direct information. The contents of these variables mirror the contents of the survey instruments. By searching the questionnaires, you can determine the exact wording of the question and obtain possible filter guidance.

Dataset

Label

Format

Identifier (ID)

Additional Identifier

pl

Personal questionnaire

long

pid, syear

hid, cid, intid

hl

Household questionnaire

long

hid, syear

cid, intid

biol

Biographical Data

long

pid, syear

hid, cid, intid

jugendl

Youth questionnaire for first time respondents at age 18

long

pid, syear

hid, cid, intid

plueckel

Follow-Up Questioning

long

pid, syear

hid, cid, intid

abroad¹

Questionnaire for people moved abroad

long

pid, syear

hid, cid

vpl

Deceased Person

long

vpid, syear

hid, cid, intid

¹In addition to the classic identifiers (pid, hid, and cid), these datsets also have the identifiers from older data release versions. (pid=persnr; hid=hhnrakt; cid=hhnr).

pl “Individual questionnaire” (long): The PL datset contains all waves of the $P datsets from SOEP-Core. In addition, the PL file includes all variables of all waves of the datsets $POST and $PAUSL. This means that the PL datset contains all variables from the individual questionnaire for all waves. In addition, the individual-specific data from the IAB-SOEP Migration Survey and IAB-BAMF-SOEP Refugee Survey are integrated into the PL datset.

hl “Household questionnaire” (long): HL contains all waves of the datsets $H from SOEP-Core. This means that the HL datset includes all questions of the household questionnaire. In addition, the household-specific data from the IAB-SOEP Migration Survey and IAB-BAMF-SOEP Refugee Survey are integrated into the original HL datset.

biol “Biographical data” (long): BIOL contains cumulated individual-level raw data from the biographical questionnaire and from wave-specific biographical modules of the individual questionnaire. BIOL is intended to be used in addition to the generated biographical files (by advanced users) to complete (or modify) generated biographical variables.

jugendl “Youth questionnaire for first-time respondents at age 17” (long): JUGENDL contains the waves q (2000) up to the current wave of $JUGEND in SOEP-Core. Since 2000 (wave Q), first-time respondents between the age of 16 and 17 have received a separate biographical questionnaire with additional age-group-specific questions, for instance, about their relationship to their parents or about what they do in their free time. Up to now, only some of the data collected from this survey have been processed and provided to users in dataset BIOAGE17. The complete data will be provided in individual JUGENDL dataset.

plueckel “Catch-up questionnaire” (long): The PLUECKEL datset contains all waves of the $PLUECKE datsets in SOEP-Core. Temporary drop-outs (“gaps”) can cause problems for longitudinal analyses. This has especially negative consequences for the employment and income data. That is why the SOEP tries to fill in at least some of the key missing information. PLUECKEL is a small questionnaire covering information on the year previous to which the temporary drop-out occurred. It covers questions on job-related changes, employment calendar, income, education, and qualifications.

abroad “Questionnaire for respondents who have moved abroad” (CS): With the pilot study “Life outside Germany” in 2008, the longitudinal SOEP study ventured into completely uncharted methodological territory by attempting to locate the addresses of former SOEP respondents who have since moved abroad and to survey these individuals with the help of a specially developed written questionnaire on the reasons for their move. The project was discontinued due to insufficient case numbers in 2014.

vpl “Questionnaire on the deceased individual” (long): The VPL datset contains all waves of the $VP datsets of SOEP-Core. The VPL file contains information about respondents who lost a relative in the previous year. It provides information about the deceased individual and the respondent who reported the death.

Survey Data

These datsets contain information on survey methodologies used in SOEP-Core. The various datsets contain detailed exit information provided by respondents and the household weighting factors that users need for representative analysis.

Dataset

Label

Format

Identifier (ID)

Special Identifier

csamp

Sample Definition

long

cid

design

Survey Design

CS

hhnr

intid

exit¹

Cumulative drop-outs

CS

pid

cid, syear

pbr_hhch¹

PBR_HHCH

CS

pid

hid, syear, cid, pnralt, pnrneu, hhnrold

cirdef

Randomized Survey File

long

hhnr

¹In addition to the classic identifiers (pid, hid and cid), these datsets also contain the identifiers from older data release versions. (pid=persnr; hid=hhnrakt; cid=hhnr).

csamp “Sample definition” (long): The dataset CSAMP [SAMP] contains detailed sampling information for each of the original sample households at the case level [cid / hhnr].

design “Survey design” (CS): The dataset DESIGN provides information on the stratified sampling of the SOEP in the form of two variables. The variable STRAT identifies each of the discrete sampling groups described above. Altogether, the SOEP consists of 40 strata: one stratum in sample A, twenty-seven in sample B, one in sample C, three in sample D, one in sample E, two in sample F, four in sample G, and one in sample H. Each of these strata have unique inclusion probabilities. The variable design contains the inverse of this probability, i.e., the design weight.

exit “Follow-up study [Verbleibstudie]” (long): The dataset EXIT delivers the results from the follow-up study [Verbleibstudie] conducted by Kantar Public (formerly: TNS Infratest) in 2008/2009. This study has been used to identify reasons for (demographic) dropouts. Deceased individuals identified through the follow-up study are included in the corresponding variables in PPATH/L [todjahr, todinfo].

pbr_hhch “PBR_HHCH” (long): The dataset pbr_hhch is a subfile of pbrutto that was used from 1984 to 2009 to identify individuals from households that underwent split-offs in subsamples A-H.

cirdef “Randomized survey file” (long): This dataset includes randomized groups of original sample households [rgroup] for selection of representative shares across all subsamples with full representation of any cross-sectional and longitudinal information (variables) at all levels (case, households, individuals, spells) for the entire SOEP population across all waves.

Generated Data

The SOEP team has prepared these datsets for easy use and subjects them to additional plausibility checks and quality controls prior to data release. In most cases, they consist of several variables and different survey instruments and are described in the documentation provided. As a result, these datsets cannot be assigned 1:1 to a single survey instrument.

Dataset

Label

Format

Identifier (ID)

Additional Identifier

pgen

Generated Individual Data

long

pid, syear

hid, cid, pgpartnr

hgen

Generated Household Data

long

hid, syear

cid

bioage17¹

Generated biographical youth information

CS

pid

hid, syear, cid, bymnr, byvnr, intid

bioagel¹

Generated biographical information

long

pid, syear, persnre

hid, cid,

biopupil¹

Generated biographical information

long

pid, syear

hid, cid

kidlong¹

Data on children

long

pid, syear

hid, cid

pequiv

Cross-national Equivalent File

long

pid, syear

hid, cid

biobirth¹

Generated biographical information

CS

pid

cid, kidpnr01-kidpnr15

bioedu¹

Generated biographical information

CS

pid

cid

bioimmig¹

Generated biographical information

long

pid, syear

hid, cid

biojob¹

Generated biographical information

CS

pid

cid

bioparen¹

Generated biographical information

CS

pid

cid, fnr, mnr

bioresid¹

Generated biographical information

CS

pid

hid, syear, cid, intid

biosib¹

Generated biographical information

CS

pid

cid, sibpnr1-sibpnr11

biosoc¹

Generated biographical information

CS

pid

hid, syear, cid, intid

biotwin¹

Generated biographical information

CS

pid

cid, pnrtwin, pnrtrip, pnrquad

camces¹

Highest Educational Qualification, Migrants Sample M1 and M2

CS

pid

hid, syear, cid

cogdj¹

Data on cognitive tests (Youth)

CS

pid

syear, cid

cognit¹

Data on cognitive potential

CS

pid

syear, cid, intid

cog_refu¹

Data on cognitive tests (Refugees)

CS

pid

syear, cid, hid

gripstr¹

Measures grip strength

CS

pid

syear, cid, intid

hconsum¹

Hosehold Consume Module

CS

hid

syear, cid

health¹

Data on health indicators

CS

pid

syear, cid

hwealth

Wealth Module

long

hid, syear

cid

interviewer

Data on the SOEP Interviewer

long

intid, syear

cid

mihinc

Multiple imputed data on monthly household income

long

hid, syear

cid

pflege

Persons needing care within the household

long

pid, syear

cid

pkal

Individual Calendar

long

pid, syear

hid, cid

pwealth

Wealth Module

long

pid, syear

hid

timepref¹

Experiment on time preferences

CS

pid

hid, syear, cid

trust

Experiment on trust

long

pid

hid, syear, cid

¹In addition to the classic identifiers (pid, hid and cid), these datsets also have the identifiers of the older data release versions. (pid=persnr; hid=hhnrakt; cid=hhnr).

pgen “Generated individual data” (long): PGEN contains all waves of the $PGEN datsets in SOEP-Core. The PGEN-file contains user-friendly data on the individual level that are consolidated from different sources. The plausibility is validated longitudinally in many respects, making the data superior to those in PL in most situations. The file contains one row for each person (pid is unique) with a completed individual or youth questionnaire.

hgen “Generated household data” (long): HGEN contains all waves of the $HGEN datsets in SOEP-Core. In order to minimize computational effort for the user, the SOEP provides yearly status variables on the household level. The HGEN data provide a set of time-invariant variables generated from the SOEP household questionnaire. They only include households that participated in the respective year.

bioage17 “Generated biographical information” (CS): The design of the dataset BIOAGE17 is patterned after the 2001 Youth Questionnaire, which is the standard version used in subsequent years. Young people living in a panel household who reached the survey age of 17 are a special group of first-time respondents. This group of panel entrants provides more detailed information on youth and socialisation than we are able to obtain from other new sample members.

bioagel “Generated biographical information” (long): The BIOAGEL data files are generated using information collected in the “Mother & Child” and “Parent” questionnaires. BIOAGEL is now provided in one dataset.

biopupil “Generated biographical information” (long): The BIOPUPIL data files are generated using information collected in the “Pre-Teen” and “Early-Youth” questionnaires. BIOPUPIL is provided in one dataset.

kidlong “Data on children” (long): The variables stored in the KIDLONG file are based on the information collected annually and contained in the wave-specific $KIND files. The relevant information is not provided by children themselves but is obtained from answers to questions in the household questionnaire provided by the respondent within the household (usually the head of the household). This data is reaggregated at the individual level and stored as child-specific entries in the file $KIND.

pequiv “Cross-National Equivalent File” (long): PEQUIV contains all waves of the $PEQUIV datsets in SOEP-Core. The PEQUV-File is based on the Cross-National Equivalent File (CNEF) with extended income information for the SOEP. This file comprises not only the aggregated income figures from CNEF but also additional separate income components.

pkal “Individual calendar” (long): PKAL contains all waves of the $PKAL datsets in SOEP-Core. The PKAL datasets contain calendar variables from the individual questionnaire. The datset includes the person’s employment or educational status on a monthly basis as well as the person’ income status.

biobirth “Generated biographical information” (CS): The file BIOBIRTH provides information on fertility histories of adult respondents in the SOEP. Up to 2014 (version 30, wave BD), the data were stored in two separate files: BIOBIRTH containing female fertility histories, and BIOBRTHM providing male fertility histories. Fertility histories in BIOBIRTH provide information on every woman (as well as every man with panel entry since 2001) who has ever completed at least one SOEP interview.

bioedu “Generated biographical information” (CS): The SOEP contains a broad range of variables on early childhood education and care, educational participation, educational degrees, and related topics. The BIOEDU dataset is designed to provide ready-made variables on educational transitions and related topics for use in longitudinal analysis.

bioimmig “Generated biographical information” (long): The variables contained in BIOIMMIG relate to foreigners in (and migrants to) Germany. Questions deal with the desire to return to the home country, the presence of relatives in the home country, reasons for coming to Germany, and conditions upon initial arrival in Germany.

biojob “Generated biographical information” (CS): The purpose of BIOJOB is to provide a file that offers the user convenient access to biographical information on past job activities. BIOJOB consists of generated variables as well as plain questionnaire information. Up to now, all but two variables in BIOJOB are time-invariant. Information on occupational changes and on the age at the most recent change of occupation refer to the date of the respondent’s biography interview.

bioparen “Generated biographical information” (CS): The dataset BIOPAREN contains biographical entries on the parents’ and respondent’s background. The information in BIOPAREN is obtained from two sources: from proxy entries by children on their parents in the biography questionnaire and youth questionnaire, and from direct entries by parents when the respondent lives in the same household as the parents. Please note that BIOPAREN focuses on the social parent. Biological parent identifiers can be found in BIOBIRTH.

bioresid “Generated biographical information” (CS): In 1994, questions with a focus on occupancy were introduced into the biographical questionnaire asking about the duration of residence in the current dwelling and any second residence. The information obtained from the biographical questionnaire is contained in the file BIORESID.

biosib “Generated biographical information” (CS): BIOSIB provides information on siblings living within SOEP households. The datset contains the individual identifiers of all siblings in a SOEP household. It includes information on the individual sibling’s sex, year of birth, number of siblings, position in birth order, and relationship between siblings.

biosoc “Generated biographical information” (CS): Contains data on youth and socialization. Respondents of all ages describe aspects of their life at the age of 15, including their relationship with parents, grades in school, the federal state where they last attained educational qualifications, detailed information on vocational qualifications, as well as intentions to complete further education or vocational training. Questions concerning military and alternative services are also included in this datset.

biotwin “Generated biographical information” (CS): The file BIOTWIN contains all twins that were ever identified within the SOEP. To be classified as a twin, a person is required to have exactly the same age as his or her sibling (year & month of birth), have a relationship to the head of the household that indicates that he or her and a second persons are siblings, and have the same mother (as far as a pointer to the mother is available). Furthermore, it is not only twins that are recorded in the BIOTWIN datset, but also triplets or quadruple siblings.

camces “Highest educational qualification, Migrant Samples M1 and M2” (CS): The CAMCES-File provides information about computer-assisted measurement and coding of educational qualifications in surveys.

cogdj “Data from cognitive tests (Youth)” (CS): In SOEP 2006, a separate questionnaire with cognitive tests for adolescents was used for the first time: “Lust auf DJ”. The acronym “DJ” stands for “Denksport und Jugend” (mind sports and youth)”, but it was named for its more common association with “disc jockey”. The questionnaire “Lust auf DJ” was created for all respondents aged 16 - 17.

cognit “Data on cognitive potential” (long): In the 2006 survey year, for the first time, short cognitive tests were carried out with a subsample of the SOEP. The goal was to employ a robust set of instruments that could be administered easily by trained interviewers within just a few minutes. COGNIT06 provides the aggregated sum scores (total values for three time packages, so-called “parcels” of 30, 60 and 90 seconds).

cog_refu “Data on cognitive tests (Refugees)” (CS): The data set contains sum scores for two competence measurements (previous school knowledge and basic cognitive skills) of youths born in 2000, 2003 and 2005 surveyed in 2017.

gripstr “Measures of grip strength (left and right hand)” (long): The data on grip strength from the survey year 2012 is now included in the GRIPSTR dataset.

hconsum “HH consumption module” (CS)“: We were faced with three methodological challenges in generating the final consumption data. First, due to the design of the consumption module, inconsistent answers arose between the amounts give for monthly and annual consumption. Second, there was the common problem of missing data, here in particular item nonresponse. And third, consumption data are usually blurred by heaping. For researchers who do not want their consumption variables to include changes from all steps of data preparation, the new datset “HCONSUM” contains not only the prepared consumption variables but also flag variables providing researchers the opportunity to select individual solutions.

health “Data on health indicators” (long): Starting in 2002, the SOEP health module in the individual questionnaire has been revised and replicated at two-year intervals. In the HEALTH file, users find, for instance, the generated variables on height and weight with imputation flags and a user-friendly longitudinal checked generated variable for Body Mass Index (BMI).

hwealth “Wealth module” (long): The generated SOEP wealth data is stored in two separate data files called PWEALTH for information at the individual level and HWEALTH for correspondingly aggregated data at the household level. HWEALTH contains all information on the household level; it is purely the result of aggregating the individual-level information in PWEALTH. However, for all individuals with valid household-level information who did not respond to the individual questionnaire (partial unit non-response), imputations have been carried out and the results are included in HWEALTH.

interviewer “Data on the SOEP interviewer” (long): The SOEP aims not only to collect high-quality data on the living conditions and well-being of households, but also to provide a valuable empirical source for survey research. The INTERVIEWER file provides users with easy access to all available longitudinal information on the SOEP interviewers.

mihinc “Multiple imputed data on monthly household income” (long): The dataset MIHINC contains the complete imputation results and is available separately. To be compatible with methods for analyzing multiply imputed data, MIHINC is constructed in the “stacked” or MIM data format. It contains the following variables: HHNRAKT, SVYYEAR, MJ, MI, IHINC and IMPFLAG. Since 1995 for every survey household in all survey years, there are ten imputed values for current household income.

pflege “Persons needing care within the household” (long): Since wave B (1985), the SOEP household questionnaire includes questions on household members in need of care. In order to support individual-level analysis, this information has been restructured and is stored in the cumulative file PFLEGE.

pwealth “Wealth module” (long): For the first time in 2002, the individual questionnaire included a special module focusing on wealth. It included questions on seven different wealth components: owner-occupied property (including debt), other property (including debt), financial assets, private pensions (including life insurance and building savings contracts), business assets, tangible assets, and consumer credit. The generated SOEP wealth data are stored in two separate data files called PWEALTH for information at the individual level and HWEALTH for correspondingly aggregated data at the household level. Wealth-related variable names in the file PWEALTH consist of six digits. The first digit tells the user which wealth component is referred to, and the second to sixth digits provide more detailed information about possible filter information, the personal share, the gross amount, and the amount of any outstanding debt. In principle, a digit is coded “1” if a given variable does indeed contain this specific piece of information and “0” otherwise. The wealth information in the SOEP questionnaire is surveyed at the individual level and thus also imputed or edited at the individual level (although checked against household information for consistency).

timepref “Experiment on time preferences” (CS): Following the behavioral experiment on trust and trustworthiness carried out in the 2003, 2004, and 2005 SOEP surveys, the experiment “time preferences” was run in 2006. In this experiment on economic behavior, respondents were asked to decide how they would want to receive €200 in prize money: if they would want to receive it immediately by check or if they would want to wait and receive a larger amount later, that is, with interest.

trust “Experiment on trust” (long): The economic behavior experiment on trust and trustworthiness from survey years 2003, 2004, and 2005 served to measure trust based on an investment game, a one-off game for two players who interact anonymously. The first player receives a credit of ten points and can overwrite any number of points of the second player. Each overwritten point is doubled. The second player also receives a credit of ten points. After receiving the (doubled) points from the first player, the second player decides how much of her own credit she will transfer to the first player (zero to ten points). As with the first transfer, the recipient’s points are doubled. After the decision of the second player, the game ends and the other players are paid (one point corresponds to one euro, the total is paid by check a few days later). The trust datset thus contains the information from all three waves in which the behavioral experiment was conducted.

Spell Data

Spell, duration, and event history data are used frequently in the social sciences. In the strict sense of the word, spell data are about time periods with a defined start and end. General information about the data structure of spell data can be found in the chapter Data Structure in spell format (spell)

Working with spell data:

Working with spell data (pdf):

Working with spell data (do-files):

How to generate spell data from data in wide format: Based on the migration biographies in the IAB-SOEP Migration Sample:

Generating spell data:

Dataset

Label

Format

Identifier (ID)

Additional Identifier

artkalen

Spell data from the activity calendar

spell

pid

cid

biocouplm

Generated biographical information

spell

pid

cid, coupid

biocouply

Generated biographical information

spell

pid

cid

biomarsm

Generated biographical information

spell

pid

cid

biomarsy

Generated biographical information

spell

pid

cid

einkalen

[deprecated] Spell data on income

spell

pid

cid

lifespell

Spell Information on the Pre- and Post-Survey History of SOEP-Respondents

spell

pid

cid

migspell

Migration history

spell

pid

cid

pbiospe

Generated biographical information

spell

pid

cid

refugspell

Migration history

spell

pid

cid

sozkalen

[deprecated] Spell data on social benefits

spell

hid, cid

artkalen “Spell data from the activity calendar” (long): The ARTKALEN contains spells (monthly) for events starting in January 1983. This is in contrast to PBIOSPE, where spells were in yearly durations, and events previous to 1983 were included. The information on activity status is collected on a monthly basis in the yearly individual questionnaire and stored in the file ARTKALEN.

biocouplm “Generated biographical information” (long): With the BIOCOUPLM the SOEP provides consistent and continuous partnership histories for nearly all adult respondents. BIOCOUPLM is built on the prospective information at the time of each interview. The relationsship histories are collected on a monthly basis from all adult SOEP participants since their entry into the SOEP.

biocouply “Generated biographical information” (long): With the BIOCOUPLY, the SOEP provides consistent and continuous partnership histories for nearly all adult respondents. BIOCOUPLY is built on retrospective and prospective information at the time of each interview. The relationship histories are provided on an annual basis.

biomarsm “Generated biographical information” (long): With BIOMARSM the SOEP provides consistent and continuous marital histories for nearly all adult respondents. BIOMARSM is built on the prospective information at the time of each interview. The martial histories are collected on a monthly basis from all adult SOEP participants since their entry into the SOEP.

biomarsy “Generated biographical information” (long): With BIOMARSY the SOEP provides consistent and continuous marital histories for nearly all adult respondents. BIOMARSY is built on retrospective and prospective information at the time of each interview. The marital histories are provided on an annual basis.

einkalen “[deprecated] Spell data on income” (long) The income calendar is used to gain information about sources of income throughout the year. The respondent checks off for each month all appropriate sources of income.

lifespell “Spell information on the pre- and post-survey history of SOEP respondents” The SOEP team regularly conducts follow-up studies to relocate attritors. These studies draw on official register data and allow us to determine whether a person is still living in Germany, is deceased, or has moved abroad since the last SOEP interview. The information is combined in a spell file LIFESPELL. This dataset reports all available information on the pre- and post-survey history of all persons who have ever been a member of a SOEP household.

migspell “Migration history” (long): MIGSPELL is derived from the migration biographies, which are collected from each new respondent of the IAB-SOEP migration samples M1 and M2. It contains data on moves by foreign-born migrants as well as on stays abroad by German-born respondents.

pbiospe “Generated biographical information” (long): The spell file PBIOSPE is based on the information on activity status over the life course, which is collected as a matrix from every respondent who completes the biographical questionnaire. The observations start at the age of 15 and end at the current age (up to age 65). To update ongoing employment information in PBIOSPE, information from the yearly individual questionnaire is also used.

refugspell “Migration history” (long): For migration biographies in the refugee samples, we created the spell datset REFUGSPELL. The variables in MIGSPELL and REFUGSPELL are derived from different instruments and only partially overlap. The data structure allows the datset to be linked with MIGSPELL if desired.

1992-2000 sozkalen “[deprecated] Spell data on social benefits“: The file SOZKALEN provides spell data on receiving social assistance of households, defining begin, end, and censoring status of any period of receiving 3 different types of assistance. This file is set up, using information from the calendar, asked for the previous year (asked for the years 1992-2000). Thus, it contains information on a monthly basis.