Documentation on Generated Data

SOEP-Core contains a wide range of generated variables and datasets. To facilitate data use, we generate a large number of variables in the process of data preparation and release them with the SOEP-Core data. To make the generation process transparent to users, we provide comprehensive documentation on the numerous generated datasets and variables. For an overview, see our Documentation Generated Data

Example: A number of frequently used variables are provided in SOEP as “generated variables” (e.g., the data sets $PGEN and $HGEN). These variables are checked for consistency across waves. The documentation can be used to answer the following questions:

a) Which variable gives the highest school-leaving certificate attained by individuals surveyed in 2007?

To search for the variable that provides this information, open Paneldata.org and enter “school-leaving certificate” in the search field. Then specify your search by adjusting the filter settings as follows:

  • type: variable

  • subtype: gen

  • study: soep-core

  • analysis unit: p

  • period: 2007

../_images/paneldata_25.PNG

All variables could contain the information you are looking for. Since almost all variables in the search result come from the generated “xpgen” data set, the documentation for the $pgen data set should be used. Open the Documentation Generated Data

../_images/docimentation_2.PNG

Now select the documentation of $pgen

../_images/docimentation_3.PNG

The table of contents on the left gives you a classification of the dataset by topics. To find the variable you are looking for, select topic area 10.

../_images/docimentation_4.PNG

After a few searches, you will find the variable you are looking for. The documentation provides useful information about the generated variable: it comes from the biography questionnaire, which was introduced in 1994 and is administered only once per respondent. The documentation also explains the two additional variables $psbila and $psbilo in more detail: the $psbil variable is updated regularly to take into account possible changes in the respondent’s highest school-leaving certificate. For this reason, the generated variable is useful in providing the most up-to-date information on completed secondary schooling.

The variable we are looking for is xpsbil and describes the highest school-leaving certificate attained by individuals surveyed since 2007.

b) What values do individuals with an upper secondary school-leaving certificate (Abitur) have for this variable??

Since you now know the variable you are looking for, you can use the extensive functions of paneldata.org in addition to the information from the documentation. If you search for the variable “xpsbil” in paneldata.org and click on it, the frequency counts are displayed.

../_images/paneldata_26.PNG

In addition to the absolute and relative frequencies, you can also read the value codes of specific response categories. A translation of the answer categories can be found in the “Label translations” section:

../_images/paneldata_27.PNG

You can answer the question without opening the data. In the 2007 survey year, the variable “xpsbil” with the value code “4” describes the response category “upper secondary school-leaving certificate (Abitur)”.