Missing Conventions

Survey variables might be missing, that is, lacking a valid code or value, for different reasons. In the SOEP, negative values are not valid for any variable, but are used instead to code different reasons for missing information. There are two possible origins of missing values: the respondent’s answer or the survey design. In the first case, the respondent may refuse to answer or not know an answer or may report invalid values. In the second case, the interview design may exclude respondents with certain characteristics from some questions (e.g., men will never be asked if they are pregnant). The following codes are used:




No answer / don’t know


Does not apply


Implausible value


Inadmissable multiple response


Not included in this version of the questionnaire


Version of questionnaire with modified filtering


Only available in less restricted edition


Question not part of the survey program this year¹

¹Only applicable to datasets in long format.

A person might decline to answer a question. This occurs mainly with sensitive questions (e.g., income-related questions) and when respondents simply do not know the answer. In such cases, the missing code is “-1” for “no answer / don’t know”. Note that the SOEP does not distinguish between a refusal to answer and a true “don’t know”. Information may be missing when a question is not asked because it is not relevant to a specific person, e.g., owner-occupiers will not be asked about the amount of rent they pay. In such cases, the question “does not apply” to this person, and the variable receives a code of “-2”. Sometimes invalid answers occur when respondents fill out a PAPI interview themselves or the interviewer mistypes an answer (e.g., working hours over 168 per week). In such cases, multiple checks are carried out, and if the inconsistency remains, the variable is recoded “-3 Implausible value”. Some questions contain multiple answer possibilities and respondents are asked to pick one answer. In the SOEP PAPI questionnaires, respondents sometimes ignore this request and give more than one answer (e.g., “very good” and “good” when asked about their current health status). In such cases, if the correct answer cannot be determined from the questionnaire itself, the code “-4 Invalid Multiple Answers” is assigned to this variable. With the extensions to the SOEP in recent years, entirely new samples have been added to SOEP-core. In these samples, questions are sometimes left out completely, e.g., to shorten the questionnaire or because the focus of the sample is different (as is the case with SOEP-related studies). In such cases, the variable will be set to “-5 Not included in this version of the questionnaire” for an entire subsample. With the use of CAPI, recent developments include an “integrated” individual questionnaire, i.e., the biography part and the “regular” part of the questionnaire are combined into one questionnaire. Some of the questions in the biography part are repeated in the regular part. Whereas the respondent will answer the same question twice the PAPI mode, the CAPI allows the respondent to filter around the question if it has already been asked. These cases are very rare, but if they occur, they receive a code “-6 Version of questionnaire with modified filtering”. SOEP-Core offers a variety of different editions of the data. Due to data protection regulations, some variables of these editions may not be made accessible. Variables with increased restrictions are for example variables that provide federal state level information. Because the variable may not be made accessible in a specific edition the federal state level information still remains in the data but they are assigned a missingcode “-7 Only available in less restricted edition”.

Last change: Jun 06, 2024