Documentation of Generated Data

The range of generated variables and data sets from SOEP-Core is very extensive. To make work easier for users, many variables are already generated for the user in the data preparation process and published with SOEP-Core. The large number of generated data sets and variables is comprehensively documented so that the generation process remains transparent for the user. Here you will find an overview of the Documentation Generated Data

Example: A number of frequently used variables are provided in SOEP as so-called generated variables (e.g. data sets $PGEN and $HGEN). These variables are checked for consistency across waves and have a uniform name. Please use the appropriate documentation to answer the following questions:

a) In which variable is the highest school leaving degree for the persons surveyed in 2007?

To search for the variable with the highest school leaving degree, use Open and enter school leaving degree in the search field. Then specify your search by adjusting the filter settings as follows:

  • type: variable
  • subtype: gen
  • study: soep-core
  • analysis unit: p
  • period: 2007

All variables could contain the information you are looking for. Since almost all variables in the search result come from the generated “xpgen” data set, the documentation for the $pgen data set should be used. Open the Documentation Generated Data


Now select the documentation of $pgen


The table of contents on the left shows you a thematic classification of the data set. To find the variable you are looking for, select topic area 10.


After a few searches you will find the variable you are looking for. Some interesting information can be derived from the documentation. It can be seen that the information from the generated variable has been taken from the CV questionnaire since 1994 and is surveyed once. In addition, the two additional variables $psbila and $psbilo are explained in more detail. The documentation describes that the $psbil variable is updated regularly and also takes into account possible changes in the highest level of education. This is precisely why it is worth using the generated variable to represent the most recent highest school leaving degree of those surveyed.

The variable we are looking for is xpsbil and describes the highest school leaving degree of the persons surveyed from the survey year 2007.

b) Which values are given to persons with Upper Secondary Degree (Abitur) in this variable??

Since you now know the variable you are looking for, you can use the extensive functions of in addition to the information from the documentation. If you search for the variable “xpsbil” in and click on it, the frequency counts are displayed.


In addition to the absolute and relative frequencies, you can also read the value codes of specific response categories. A translation of the answer categories can be found in the “Label translations” section:


You can answer the question without opening the data. In the 2007 survey year, the variable “xpsbil” with the value code “4” describes the answer category “Upper Secondary Degree (Abitur)”.