OOL: Moderation Analysis: Age, Employment Status, and Gender (Module 4)
A two-way ANOVA and correlation analysis were conducted to examine the relationship between age and work status, with gender tested as a potential moderator. The ANOVA model included age group, gender, and their interaction term, while Pearson correlations were also computed overall and separately for males and females to further examine subgroup differences in the relationship.
Code:
In[1]:
import pandas as pd
df = pd.read_csv("ool_pds.csv.csv") print(df.columns)
Output:
Index(['CASEID', 'W1_CASEID', 'W2_CASEID2', 'W1_TM_START', 'W1_TM_FINISH', 'W1_WEIGHT1', 'W1_WEIGHT2', 'W1_WEIGHT3', 'W2_TM_START', 'W2_TM_FINISH', ... 'PPREG9', 'PPRENT', 'PPSTATEN', 'PPT01', 'PPT1317', 'PPT18OV', 'PPT25', 'PPT612', 'PPWORK', 'PPNET'], dtype='object', length=436)
In[2]: data = df[['PPAGE', 'PPWORK', 'PPGENDER']].copy()
In[3]:
data['PPAGE'] = pd.to_numeric(data['PPAGE'], errors='coerce') data['PPWORK'] = pd.to_numeric(data['PPWORK'], errors='coerce') data['PPGENDER'] = pd.to_numeric(data['PPGENDER'], errors='coerce')
data = data.dropna()
In[4]:
overall_r = data['PPAGE'].corr(data['PPWORK']) print("Overall correlation (Age vs Work):", overall_r)
Output:
Overall correlation (Age vs Work): 0.25863565449969084
In[5]:
male = data[data['PPGENDER'] == 1] female = data[data['PPGENDER'] == 2]
r_male = male['PPAGE'].corr(male['PPWORK']) r_female = female['PPAGE'].corr(female['PPWORK'])
print("Male correlation:", r_male) print("Female correlation:", r_female)
Output:
Male correlation: 0.341436825966627 Female correlation: 0.202204314205431
In[6]:
import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf
In[7]:
df = pd.read_csv("ool_pds.csv.csv")
data = df[['PPAGE', 'PPWORK', 'PPGENDER']].copy()
data['PPAGE'] = pd.to_numeric(data['PPAGE'], errors='coerce') data['PPWORK'] = pd.to_numeric(data['PPWORK'], errors='coerce') data['PPGENDER'] = pd.to_numeric(data['PPGENDER'], errors='coerce')
data = data.dropna()
In[8]: data['age_group'] = pd.qcut(data['PPAGE'], 3, labels=['young', 'middle', 'old'])
In[9]:
print(data.columns) print(data['age_group'].value_counts())
Output:
Index(['PPAGE', 'PPWORK', 'PPGENDER', 'age_group'], dtype='object') young 813 old 746 middle 735 Name: age_group, dtype: int64
In[10]:
import statsmodels.api as sm import statsmodels.formula.api as smf
model = smf.ols('PPWORK ~ C(age_group) * C(PPGENDER)', data=data).fit() anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
Output:
sum_sq df F PR(>F) C(age_group) 1110.717147 2.0 135.432220 2.579881e-56 C(PPGENDER) 91.689438 1.0 22.359796 2.398096e-06 C(age_group):C(PPGENDER) 46.200673 2.0 5.633351 3.626358e-03 Residual 9382.260831 2288.0 NaN NaN
Screenshot of the Code:
Interpretation:
The ANOVA results showed a significant main effect of age group on work status (F = 135.43, p < 0.001) and gender on work status (F = 22.36, p < 0.001). Importantly, the interaction between age group and gender was also statistically significant (F = 5.63, p = 0.0036), indicating that the relationship between age group and work status differs depending on gender. This confirms that gender acts as a moderator in the relationship between age and work status.
The correlation analysis supports this finding. The overall correlation between age and work status was moderate (r = 0.26). However, when separated by gender, the relationship was stronger for males (r = 0.34) compared to females (r = 0.20), demonstrating that the strength of the association varies across gender groups.
Overall, both the ANOVA interaction and subgroup correlations indicate that gender moderates the relationship between age and work status, meaning the effect of age on work outcomes is not uniform across genders













