Working with SOEP Regional Data

SOEP offers diverse possibilities for regional and spatial analysis. With the anonymized regional information on SOEP respondents’ (households’ and individuals’) place of residence, it is possible to link numerous regional indicators on the levels of the federal states (Bundesländer), spatial planning regions, districts, and postal codes with the data on the SOEP households. However, specific security provisions must be made due to the sensitivity of the data under data protection law. Accordingly, data users are not allowed to give any information in their analyses that could indicate, for instance, the city or district in which respondents reside. The data nevertheless provide valuable background information for regional analysis.

../_images/editions1.png

For more information and to access the data, see Regional Data

Assume that for your research project, you want to measure current (2016) urban-rural differences in the population. You are particularly interested in the differences in interest in politics and the different satisfaction variables provided by the SOEP. You also want to take into account demographic differences in gender and age. To be able to evaluate the potential of the data for your project, you first need an overview. For regional analysis, for example, the municipal size classes from the regional data are suitable.

Create an exercise path with four subfolders:

../_images/uebungspfade.png

Example:

  • H:/material/exercises/do

  • H:/material/exercises/output

  • H:/material/exercises/temp

  • H:/material/exercises/log

These are used to store your script, log files, datasets, and temporary datasets. Open an empty do-file and define your paths with globals:

 1***********************************************
 2* Set relative paths to the working directory
 3***********************************************
 4global AVZ 	"H:\material\exercises"
 5global MY_IN_PATH "\\hume\rdc-prod\complete\soep-core\soep.v33.2\stata_en\"
 6global region "\\hume\soep-region\DATA\soep33_de\"
 7global MY_DO_FILES "$AVZ\do\"
 8global MY_LOG_OUT "$AVZ\log\"
 9global MY_OUT_DATA "$AVZ\output\"
10global MY_OUT_TEMP "$AVZ\temp\"

The global “AVZ” defines the main path. The main paths are subdivided using the globals “MY_IN_PATH”, “MY_DO_FILES”, “MY_LOG_OUT”, “MY_OUT_DATA”, “MY_OUT_TEMP”. The global “MY_IN_PATH” contains the path to the data you ordered.

a) Prepare a dataset for cross-sectional analysis covering the survey year 2016 (wave bg).

To perform your analysis, you need different SOEP variables. The SOEP offers various options for a variable search:

Your source file should contain the following variables:

Use the key variables from the ppath.dta dataset as your starting file.

1use hhnr persnr bghhnr sex gebjahr bgnetto bgpop using ${MY_IN_PATH}\ppfad.dta, clear

Attention

Please note that since version 34 (v34), PPFAD can be found in the subdirectory “Raw” of the data distribution file. The following exercises are done with version 33.1 (v33.1), where the tracking file was named PPFAD.

Keep people who completed a questionnaire in 2016 and lived in a private household.

1* Keep people who completed a questionnaire in 2016 and live in a private household
2keep if bghhnr>0 & inrange(bgnetto, 10, 29) & inlist(bgpop, 1, 2)
3keep hhnr persnr bghhnr sex gebjahr bgnetto bgpop
4merge 1:1  persnr using ${MY_IN_PATH}\phrf.dta, keep(match master) keepusing (bgphrf) nogenerate
5tempfile ppfad
6save `ppfad'

Prepare the different datasets bgp, bghbrutto, regionl

 1* Prepare dataset bgp
 2use ${MY_IN_PATH}\bgp.dta, replace
 3keep persnr hhnr bghhnr bgp01* bgp143
 4tempfile bgp
 5save `bgp'
 6
 7* Prepare dataset bghbrutto
 8use ${MY_IN_PATH}\bghbrutto.dta, replace
 9keep hhnr bghhnr bgsampreg bgbula bgregtyp
10tempfile bghbrutto
11save `bghbrutto'
12
13* Prepare dataset regionl
14use ${region}\regionl_v33.dta, replace
15keep if syear==2016
16keep syear hhnr hhnrakt ggk
17rename hhnrakt bghhnr
18tempfile regionl
19save `regionl'

Merge all datasets.

1* Merge all datasets
2use `ppfad'
3merge 1:1 persnr using `bgp', keep(match master) nogenerate
4merge m:1 bghhnr hhnr using `regionl', keep(match master) nogenerate
5merge m:1 bghhnr hhnr using `bghbrutto', keep(match master) nogenerate

Recode negative values as missings.

1* Recode negative values into missings
2mvdecode sex gebjahr bgp01* bgp143,mv(-5/-1)

Categorize the municipal class sizes from the SOEP regional dataset.

 1* Categorize community class size
 2gen ggk_cat=.
 3replace ggk_cat=-1 if ggk==-1
 4replace ggk_cat=1 if ggk==1 | ggk==2
 5replace ggk_cat=2 if ggk==3
 6replace ggk_cat=3 if ggk==4 | ggk==5
 7replace ggk_cat=4 if ggk>5 & ggk<=7
 8
 9lab var ggk_cat "Community Size categorised"
10lab def ggk_cat -1 "No information" 1 "<=5000" 2 "5001 - 20000" 3 "20001 - 100000" /// 
114 ">100000"
12lab val ggk_cat ggk_cat

Generate an age variable.

 1* Generate age variable
 2gen alter= 2016-gebjahr if gebjahr > 0
 3gen alter_cat=1 if alter<=20
 4replace alter_cat=2 if alter>20 & alter<=30
 5replace alter_cat=3 if alter>30 & alter<=65
 6replace alter_cat=4 if alter>65 & alter<=120
 7
 8lab var alter "age"
 9lab var alter_cat "age categorized"
10lab def alter_cat 1 "<=20" 2 "21-30" 3 "31-65" 4 ">65"
11lab val alter_cat alter_cat 

Categorize a federal states variable.

 1* Categorize federal states
 2gen bgbula_cat=.
 3* Schleswig-Holstein + Hamburg
 4replace bgbula_cat=1 if bgbula==1 | bgbula==2
 5* Lower Saxony + Bremen
 6replace bgbula_cat=2 if bgbula==3 | bgbula==4
 7* Mecklenburg Western Pomerania + Brandenburg
 8replace bgbula_cat=3 if bgbula==13 | bgbula==12
 9* Saarland + Rhineland Palatinate
10replace bgbula_cat=4 if bgbula==7 | bgbula==10
11* Northrhine-Westphalia
12replace bgbula_cat=5 if bgbula==5
13* Hesse
14replace bgbula_cat=6 if bgbula==6
15* Baden-Württemberg
16replace bgbula_cat=7 if bgbula==8
17* Bavaria
18replace bgbula_cat=8 if bgbula==9
19* Berlin
20replace bgbula_cat=9 if bgbula==11
21* Saxony
22replace bgbula_cat=10 if bgbula==14
23* Saxony-Anhalt
24replace bgbula_cat=11 if bgbula==15
25* Thuringia
26replace bgbula_cat=12 if bgbula==16
27
28lab var bgbula_cat "Federal states categorized"
29lab def bgbula_cat 1 "Schleswig-Holstein/Hamburg" 2 "Lower Saxony/Bremen" 3 "Mecklenburg Western Pomerania/Brandenburg" /// 
304 "Saarland/Rhineland Palatinate" 5 "Northrhine-Westphalia" 6 "Hesse" /// 
317 "Baden-Wuerttenberg" 8 "Bavaria" 9 "Berlin" 10 "Saxony" 11 "Saxony-Anhalt" 12 "Thuringia"
32lab val bgbula_cat bgbula_cat
33drop bgbula
34rename bgbula_cat bgbula

Put the variables in your preferred order and save your dataset.

1* Order demography and identifiers first
2order persnr hhnr bghhnr syear sex gebjahr alter alter_cat bgsampreg bgbula ggk /// 
3ggk_cat bgregtyp  
4
5save ${MY_OUT_DATA}\zeit_online.dta, replace

b) You want to get an initial overview of regional differences in satisfaction with various aspects of life. Use the variable bgsampreg and cross-stabilize the variable with all satisfaction variables to identify differences between East and West Germany, display the absolute and relative frequencies.

To save the tables, save them in a log file.

 1********************************************************************************
 2capture log close
 3log using "${MY_LOG_OUT}\satisfaction.log", replace
 4
 5* Life satisfaction
 6
 7local varlist bgp0101 bgp0102 bgp0103 bgp0104 bgp0105 bgp0106 bgp0107 bgp0108 /// 
 8bgp0109 bgp0110 bgp0111 bgp0112
 9foreach x of local varlist {
10tab bgsampreg `x' [aw= bgphrf] , row
11}
../_images/reg_01.png
../_images/reg_02.png
../_images/reg_03.png

To view all tables, look at your generated log file.

c) Now take a closer look at satisfaction with various aspects of life with the help of SOEP regional data. Use the municipal size classes. Create a table showing satisfaction with different aspects of life and highlighting differences by sex, age, municipal size class, and federal state.

1foreach x of local varlist {
2* Tabulation of satisfaction by municipal size class and federal state
3table `x' sex alter_cat, by(bgbula ggk_cat) contents(freq) column row stubwidth(20) cellwidth(8) csepwidth(2) nomissing
4* Tabulation of satisfaction by municipal size class
5table `x' sex alter_cat, by(ggk_cat) contents(freq) column row stubwidth(20) cellwidth(8) csepwidth(2) nomissing
6* Tabulation of satisfaction by federal state
7table `x' sex alter_cat, by(bgbula) contents(freq) column row stubwidth(20) cellwidth (8) csepwidth(2) nomissing 
8}
../_images/reg_08.png
../_images/reg_10.png

To view all tables, look at your generated log file. As you can see, SOEP regional data can be used to analyze variables at the lowest regional levels.

d) Create a table that shows political interest differentiated by age, sex, and municipal size class in Bavaria

1********************************************************************************
2capture log close
3log using "${MY_LOG_OUT}\political_interest.log", replace
4
5* Political interest
6* Tabulation of political interest by municipal size class for Bavaria
7table bgp143 sex alter_cat if bgbula==8, by(ggk_cat) contents(freq) column row stubwidth(20) cellwidth (8) csepwidth(2) nomissing
../_images/reg_11.png

As you have seen here, the SOEP offers a wide range of possibilities for regional analysis. It is possible to allocate a multitude of regional indicators at the level of federal states, regional planning regions, districts, and postal codes.

Last change: Jan 13, 2025