Analyzing Alcohol Consumption in Cambodia: A Stata Tutorial for the 2021-2022 CDHS

 

The Cambodia Demographic and Health Survey (CDHS) 2021-2022 is one of the most comprehensive datasets available for public health researchers in Southeast Asia. Recent findings from the 2022 data indicate that nearly 70% of Cambodian men aged 15–49 have consumed alcohol in the past month, with young men aged 19–24 showing higher odds of risky consumption (AOR 2.14).

For researchers and data analysts, the challenge lies in correctly processing this complex survey data. In this tutorial, we will walk through the essential Stata commands to clean, weight, and analyze alcohol consumption patterns among young men in urban Cambodia.

Step 1: Loading and Cleaning the Data

When working with the Men's Recode (MR) file, you first need to identify the variables related to age and alcohol use. In the CDHS, mv012 typically represents age, and alcohol-related questions often begin with mv482 (though you should always verify with the codebook).

Stata
* Load the dataset
use "KHMR81FL.DTA", clear

* Filter for young men aged 15-24
keep if mv012 >= 15 & mv012 <= 24

* Clean the alcohol consumption variable (example: mv482a)
recode mv482a (8=.) (9=.), gen(alcohol_recent)
label define alc_lbl 0 "No" 1 "Yes"
label values alcohol_recent alc_lbl

Step 2: Applying Survey Weights (svyset)

One of the most common mistakes in DHS research is failing to account for the complex survey design. CDHS uses a two-stage stratified cluster sampling method. Without using the svyset command, your standard errors will be underestimated, potentially leading to false significance.

Stata
* Create the weighting variable (DHS weights are stored with 6 decimals)
gen weight = mv005 / 1000000

* Set the survey design
* mv021 = Primary Sampling Unit (Cluster)
* mv023 = Stratification
svyset mv021 [pweight=weight], strata(mv023)

Step 3: Descriptive Analysis

Once the survey design is set, you must use the svy: prefix for all estimation commands. This ensures that your means and proportions represent the national or urban population accurately.

Stata
* Calculate the proportion of young men drinking by urban/rural residence
svy: tab alcohol_recent mv025, column percent

Step 4: Identifying Predictors with Logistic Regression

To understand the "why" behind the numbers, researchers often use Multiple Logistic Regression. For example, to see if education level or wealth index influences drinking habits among young men:

Stata
* Multivariate analysis
svy: logistic alcohol_recent i.mv106 i.mv190 i.mv025

Note: mv106 is education level and mv190 is the wealth index.

Conclusion

Mastering Stata for CDHS data is essential for any public health professional in Cambodia. By correctly applying weights and using the svy suite, you ensure that your research contributes accurately to the development of health promotion efforts and policy-making.

Are you working with the CDHS 2021-2022 dataset? Leave a comment below if you need help with specific variable recoding!