The SOEP Samples in Detail¶
Sample A “Residents of the Federal Republic of Germany” covers individuals in private households with a household head who does not belong to one of the main groups of “guest workers” (i.e., Turkish, Greek, Yugoslavian, Spanish, or Italian households). Because only a few foreigners are in Sample A, it is often called the “West German Sample” of the SOEP. In 1984 it covered 4,528 households with a sampling probability of about 0.0002.
Sample B “Foreigners in the Federal Republic of Germany” adds individuals in private households with a Turkish, Greek, Yugoslavian, Spanish, or Italian household head, who in 1984 constituted the main groups of foreigners in the FRG. Compared to Sample A, the population of Sample B is oversampled with a sampling probability of about 0.002. In the first wave, Sample B included 1,393 households.
Sample C “German Residents of the German Democratic Republic (GDR)” consists of individuals in private households in which the household head was a citizen of the German Democratic Republic (GDR). This meant that approximately 1.7% of the residential population of the GDR in June 1990 was excluded from the sample as foreigners (most of whom were living in “institutionalized” housing). In total, the sample started with 2,179 households with a sampling probability of about 0.0005.
Sample D “Immigrants” started in 1994/95 with two different samples. In 1994, the first sample, D1, had 236 households and in 1995, the second sample, D2, had 295 households, leading to a total of 531 households (D1 and D2) in 1995. This sample consisted of households in which at least one household member had moved from abroad to West Germany after 1984. The sampling probability is about 0.0002.
Sample E “Refresher” was added in 1998, selected from the entire population of private households in Germany. The households were chosen independently of the ongoing panel and its subsamples A through D. The aim was to increase the number of observations of the general population and to preserve its representativity. The selection scheme used for sample E essentially resembles the one used in subsample A. The number of households in the first wave of subsample E was $1,060$, with a sampling probability of about 0.00005. With the 2012 data release, parts of subsample E were extracted into the SOEP Innovation Sample. It is also the first sample in which Computer-Assisted Personal Interview (CAPI) was used. At that time, interviews in Samples A-D were being conducted entirely using Paper-and-Pencil-Interviews (PAPI). To study mode effects, households from sample E were randomly allocated to either CAPI or PAPI.
Sample F “Refresher” was selected independently of all other subsamples from the population of private households in 2000. The selection scheme was slightly altered compared to the previous addition in Sampl’ E: while the “German” households (all adults aged 16 or older in the household have German nationality) were selected with a sampling probability of $0.00028$, the ’non-German’ households (at least one adult does not have German nationality) were oversampled with a probability of 0.0005. Overall, the number of added households in subsample F’s first wave amounts to 6,043.
Sample G “High-Income” entered the SOEP in 2002 independently from all other subsamples. The original selection scheme required that the responding households had a monthly income of at least DM 7,500 (EUR 3,835), which - due to the lack of an adequate sampling frame - were identified using a screening procedure. This sample of a total of 1,224 households increased the potential for analysis in the high-income bracket, which was previously difficult to study because of the low case numbers. The derived sampling probability is about 0.0014. Starting with Wave 2 in 2003, the selection scheme for this subsample was changed such that only households with a net monthly income of at least EUR 4,500 were followed.
Sample H “Refresher” started in 2006 as a random sample, again independently of all previous subsamples, covering all residential households in Germany. The added 1,506 households were sampled with a probability of 0.0001.
Sample I “Incentive Sample” started in 2009, where in the first wave, a new incentive scheme was tested to increase participation rates (see also [sec:PanelCare]. The sampling was independent of all other SOEP samples, adding a total number of 1,531 households to the SOEP. The sampling probability was 0.00013. This sample remained in the main data release for its first two waves (2010 and 2011, or waves Z and BA). With the 2012 data release, subsample I was extracted into the SOEP Innovation Sample.
Sample J “Refresher Sample” started in 2011 as a random sample, independently of all previous subsamples, covering residential households in Germany. The added3,136 households were sampled with a probability of 0.0002.
Sample K “Refresher Sample” started in 2012 as a random sample, drawn independently of all previous subsamples, covering the residential households in Germany. The added 1,526 households were sampled with a probability of 0.0001.
Sample L1 “Cohort Sample” covers private households in Germany in which at least one household member was born between January 2007 and March 2010 and was therefore a child at that time. Again, migrants identified were oversampled using an onomastic procedure. Sample L1 (as well as L2 and L3) was part of the SOEP-related study “Families in Germany” (FiD), which was integrated into the SOEP in 2014. As part of an evaluation project by the Federal Ministry for Family Affairs, Senior Citizens, Women and Youth (BMFSFJ) and the Federal Ministry of Finance (BMF), the study focused on public benefits in Germany for married people and families. Therefore, the survey instruments used in waves BA to BD differ in some respects from those used in the other samples.
Sample L2 “Family Types I” covers private households in Germany that meet at least one of the following criteria for household composition: single parents, low-income families, and large families with three or more children. Similar to Sample G, we face the problem that the eligible sub-population is relatively small and an adequate sampling frame is lacking. So again, a preceding telephone screening procedure identifies eligible households.
Sample L3 “Family Types II” covers private households in Germany that meet at least one of the following criteria for household composition: single parents or large families with three or more children. It is conducted analogously to Sample L2 to increase the number of cases in these sub-populations.
Sample M1 “Migration Sample” is a new migration sample added in 2013 with around 2,700 households drawn using register information from the German Federal Employment Agency. It includes individuals who immigrated to Germany after 1995 or second-generation immigrants.
Sample M2 “Migration Sample” was another migration sample added in 2015 with around 1,100 households drawn using register information from the German Federal Employment Agency. It includes individuals who immigrated to Germany between 2010 and 2013.
Sample M3 “Refugee Sample” was a new refugee sample added in 2016 for the IAB-BAMF-SOEP Refugee Survey in which roughly 1,769 refugee households were interviewed repeatedly. Respondents aged 18 and older who entered Germany between January 2013 and December 2016 and who had filed an asylum application by April 2016 (regardless of their current legal status) were interviewed along with the other members of their households.
Sample M4 “Refugee Family Sample”: the 2016 “IAB-BAMF-SOEP Survey of Refugees” (Samples M3 and M4) is a joint project of the Institute for Employment Research (IAB), the Research Center of the Federal Office for Migration and Refugees (BAMF-FZ) and the Socio-Economic Panel (SOEP). The target population of the samples consists of 1,769 households with individuals who arrived in Germany between January 2013 and January 2016 and had applied for asylum by June 2016 or were hosted as part of specific programs of the federal states (irrespective of their asylum procedure and their current legal status). The first part of the sample (M3) was financed with funds allocated to the IAB from the research budget of the Federal Employment Agency (BA) . Sample M4 was funded by the Federal Ministry of Education and Research (BMBF) and has a focus on refugee families.
Sample M5 “Refugee Sample” M5 is the third boost sample of refugee households. The population of M5 covers adult refugees who applied for asylum in Germany between January 1, 2013, and December 31, 2016, and are currently living in Germany. The first wave of M5 was conducted in 2017. M5 added another 1,519 households of refugees who have migrated to Germany since 2013 to the SOEP framework.
Sample N “Refresher Sample (PIAAC-L)”: Sample N integrated 2,314 households of former participants in the Program for the International Assessment of Adult Competencies (PIAAC and PIAAC-L) in 2017. This is the most recent addition to the SOEP-Core samples. Fieldwork in sample N was conducted between mid-March and mid-August and thus slightly later than the majority of samples A–L1.
Sample O “Social City Sample”: Sample O includes 935 households located primarily in bigger cities. It was designed to enhance the potential of the data for analysis by incorporating more city-specific environments. The sample was selected in cooperation with BBSR using a new sampling design based on regional data in areas where the “Soziale Stadt” (social city) urban development project is being carried out. Based on the digital data available on the boundaries of the “Soziale Stadt” areas, it was possible to create a new variable going back to the year 2000 that shows whether or not a household’s address is within an area covered by the project.
Sample P “Top Shareholder Sample”: Sample P was conceptualized as a sample of highly affluent households in Germany. Against the backdrop of increasing income and wealth inequality in Germany, despite economic growth in recent decades, a lack of data on wealthy populations has become increasingly evident in the social sciences. Goals to be accomplished with sample P were to improve the empirical basis of the poverty and wealth report of the German government as well as laying the foundation for medium and long-term cross-sectional and longitudinal analyses. The gross sample of sample P consisted of 23,259 households.
Sample Q “LGB*”: Sample Q is a boost sample of a hard-to-survey population: lesbians, gays, bisexuals, transgender people, and those who identify as non-binary. While the actual percentage of LGBTQ+ people in the general population is unknown, this population was too scarcely represented in the SOEP to meaningfully analyze this group. 835 households were recruited via an approximately 9-month long telephone screening process. Of these households 477 participated between April and November.
Sample M6 “Refugee Sample”: M6 is the acronym for the fourth top-up sample for households that represents refugees. The population of M6 covers two groups: firstly, adult refugees who arrived in Germany between January 1, 2013 and December 2016 (“Refreshment”) and secondly adult refugees who came to Germany between January 1, 2017 and June 2019 (“Enlargement”) with a strongly disproportionate oversampling of refugees from East- and West-Africa.
Sample M7 “Migration Sample”: Like the older migration samples M1 and M2, the Integrated Employment Biographies Sample (IEBS) of the Federal Employment Agency (BA) served as the sampling frame for both boost samples. Boost sample M7’s goal was to capture migration dynamics and processes from 2016 to 2018 with a focus on EU migration. To ensure that statistically significant group comparisons can be made, sampling was restricted to the three most significant countries of origin in that time period: Romania, Bulgaria, and Poland.
Sample M8a “Migration Sample”: Like the older migration samples M1 and M2, the Integrated Employment Biographies Sample (IEBS) of the Federal Employment Agency (BA) served as the sampling frame for both boost samples. Boost sample M8a was designed to help evaluate the skilled worker immigration law (Fachkräfteeinwanderungsgesetz), which came into effect March 1, 2020, and targeted migrants from third countries that came to Germany between 2017 and 2018, sampling them as a control group for a treatment group that will be sampled at a later date.
More information about “Sample Sizes and Panel Attrition” can be found here
In SOEP it is common for special samples to receive extended, adapted, and/or integrated questionnaires in the first few years. This ensures that sample-specific questions that do not play a role in the main SOEP can also be included. In the following tables you can see which questionnaires the respective samples received, which years they ran, which raw data set they were included in, and which “long” data set they went into.
From the start of Sample B (foreigners), respondents could complete the individual questionnaire in German or in the respective foreign language. Starting with wave 2 of the panel, there were “old” and “new” survey units (households, persons), and there were survey units with or without certain changes (e.g., households that had or had not moved; individuals who had or had not changed careers). The questionnaires took these changes into account for all sub-groups. Survey procedures and tools were designed to ensure that each subgroup received the right questionnaire for them. This technique as well as the bilingual design of the foreigner questionnaires was retained for waves 3-6. In addition, retrospective information and missing information on temporary drop outs was collected. The “financial statement”, which is now a survey module, was a separate questionnaire in the year 1988.
SOEP researchers were determined to seize the historic opportunity of German reunification to obtain a first baseline measurement of incomes in the “old” GDR currency. The questionnaire was prepared by an East-West working group including DIW Berlin, WZB, Collaborative Research Centre 3, and the ISS at the Academy of Sciences in the GDR, with the participation of Infratest and its partner organization in the GDR. The result was a questionnaire that covered many of the same themes and questions and was structured similarly to the West SOEP questionnaire, but which focused more on the specific situation in the GDR (e.g., the housing situation).
A major shift in the design of SOEP questionnaires took place with Sample J. Due to the increased panel mortality from wave 1 to wave 2 that was observed for the refresher samples F (2000- 2001), H (2006-2007), and I (2009-2010), the biographical module, with an average interview length of 17 minutes, was integrated into wave 1. If this had not been done, no biographical data would have been collected for approximately 20% of all SOEP respondents who would probably not have participated in wave 2. In comparison to the longitudinal samples, data collection in the first wave was focused on the main three questionnaires: the household, the individual, and the youth questionnaire. As the fieldwork in these refresher samples was conducted exclusively by CAPI, it was feasible to include complex modules with event-triggered question loops.
The main focus of Families in Germany (FiD) was on the families and children – the parental questionnaires (filled out by parents about their children) were about twice as long as the comparable questionnaires in SOEP-Core, and questionnaires for the 1-2-year-olds and the 9-10-year-olds were added (as of 2012, SOEP-Core had added a questionnaire for 9-10-year-olds that is partly comparable to the FiD version). In large part, FiD resembled the SOEP. Each adult was asked to answer an individual questionnaire, which, in the first two years, included retrospective questions on childhood, education, and early work experience. In addition, there were several questions designed to capture the challenges families face with regard to the return of mothers into the labor market – with respect to workplace, work schedule, overtime, daycare options, etc.
Following the design shift for refresher samples since Sample J in 2011, respondents have been surveyed on their life history using the “biography questionnaire”, which was integrated into the individual questionnaire from wave 1. This ensures that biographical information will be available for all target persons who provided an individual interview in participating households. Other supplementary questionnaires were not included in the survey instruments given to first-wave respondents to avoid “overburdening” respondents with an extremely lengthy first-wave interview. Questionnaires for the migration boost samples include questions that have been part of SOEP-Core for the last three decades. In addition, the survey covers each respondent’s complete migration history, education, training, and employment history in Germany and abroad, and numerous aspects of cultural and living environments relevant to the social integration of migrants. The household questionnaire is identical to the questionnaire used in the SOEP-Core sample.
As with every other previously established subsample of migrants in the SOEP (M1 and M2), there was a clear need for several deviations from standard SOEP-Core questionnaires to reflect the special characteristics of the target group. Several additional questions concerning migration and integration were incorporated into the individual questionnaire to better field the range of research questions and research goals of the project partners. These included topics such as ethnic background, experiences en route to Germany, language skills, integration courses in Germany, job experience, current occupation, educational background, health, attitudes, and values. The household questionnaire was much more SOEP-related than the individual questionnaire in order to establish longitudinal information on the households.
In the first wave of M6 three questionnaires were fielded: the individual questionnaire for first time respondents (including additional biographical questions) for all adult household members, which was administered in separate versions for refugees and for Germans or migrants respectively, and the household questionnaire for the anchor respondent. Like for the other refugee samples M3-5, a special SOEP individual and life-history questionnaire was developed that includes issues specific to refugees. The version for Germans and migrants was identical to the individual and life-history questionnaire in samples A-Q and M1/2. As is the usual approach for boost samples, no youth or child questionnaires were fielded in sample M6. All questionnaires were solely available in CAPI mode and provided in seven different language versions, although a small percentage of interviews was conducted via telephone in the CAPI environment.
Three different questionnaires were used to collect data in sample P. Apart from the regular household and individual questionnaires, a life-history questionnaire module was used to collect background information of all respondents. Computer-assisted personal interviewing (CAPI) was applied alongside paper questionnaires (PAPI or SELF) for all questionnaires. While the life history questionnaire was integrated into the individual questionnaire in the CAPI, it was administered as a separate questionnaire in the PAPI and SELF modes
Eleven different questionnaires were used to collect data in sample Q. Apart from the regular household and individual questionnaires, a life-history questionnaire module was used to collect background information of all respondents. A special module regarding their sexual orientation was added in the individual questionnaire. Adolescents of the age 16 or 17, 13 or 14 and 11 or 12 were interviewed using specific youth questionnaires. Additionally, all mother and child /parent questionnaires were administered in this boost sample. Computer-assisted personal interviewing (CAPI) was applied exclusively for all questionnaires.
In the first waves of M7 and M8 three questionnaires were fielded: the individual questionnaire for first time respondents (including additional biographical questions) for all adult household members, which had the life-history module integrated in the CAPI-instrument and the household questionnaire for the anchor respondent. In addition to these instruments, anchor-respondents had to answer a short screening questionnaire in order to clarify their membership in the target populations of M7 and M8a respectively. Respondents had to have been born outside of Germany, their stay should not be temporary, and they were to have moved to Germany no earlier than 2016 (M7) or 2017(M8a) respectively. All questionnaires were solely available in CAPI mode. Translation aides were provided only in paper form in four additional languages. With regards to questionnaire content, the household and individual questionnaires were almost identical to the ones used in samples M1/2.
Last change: Feb 21, 2024