Documentation on Generated Data

SOEP-Core contains a wide range of generated variables and datasets. To facilitate data use, we generate a large number of variables in the process of data preparation and release them with the SOEP-Core data. To make the generation process transparent to users, we provide comprehensive documentation on the numerous generated datasets and variables. For an overview, see our Documentation on Generated Data

Example: A number of frequently used variables are provided in SOEP as “generated variables” (e.g., the datasets $PGEN and $HGEN). These variables are checked for consistency across waves. The documentation can be used to answer the following questions:

a) Which variable gives the highest school-leaving certificate attained by individuals surveyed in 2007?

To search for the variable that provides this information, open Paneldata , click on the search button and the tab “Variables”, then enter “school leaving degree” in the search bar. Specify your search by adjusting the filter settings as follows:

  • study: soep-core

  • Conceptual dataset: Generated (raw folder)

  • analysis unit: individual

  • period: 2007


All variables could contain the information you are looking for. Since almost all variables in the search result come from the generated “xpgen” dataset, the documentation for the $pgen dataset should be used. Visit the Documentation of SOEP-Core Page and enter the search term pgen in the search field.


Alternatively, you can also use the filters and select “Data Documentations”:


Now select the documentation of the required version of pgen


The table of contents on the left gives you a classification of the dataset by topics. To find the variable you are looking for, select topic area 10.


After a few searches, you will find the variable you are looking for. The documentation provides useful information about the generated variable: it comes from the biography questionnaire, which was introduced in 1994 and is administered only once per respondent. The documentation also explains the two additional variables $psbila and $psbilo in more detail: the $psbil variable is updated regularly to take into account possible changes in the respondent’s highest school-leaving certificate. For this reason, the generated variable is useful in providing the most up-to-date information on completed secondary schooling.

The variable we are looking for is xpsbil and describes the highest degree in certificate attained by individuals surveyed since 2007.

b) What values do individuals with an upper secondary school-leaving certificate (Abitur) have for this variable??

Since you now know the variable you are looking for, you can use the extensive functions of in addition to the information from the documentation. If you search for the variable “xpsbil” in and click on it, the frequency counts are displayed.


In addition to the absolute and relative frequencies, you can also read the value codes of specific response categories. A translation of the answer categories can be found in the “Label translations” section:


You can answer the question without opening the data. In the 2007 survey year, the variable “xpsbil” with the value code “4” describes the response category “upper secondary school-leaving certificate (Abitur)”.

Last change: May 04, 2021