Methods Section, Milestone Assignment 2
Dear colleagues students,Â
let me present 2nd part of capstone project. I changed my plan for working with World Bank Data and for my final project chose dataset from one of DrivenData competitions - prediction of H1N1 and seasonal flu vaccines.
Sample
The sample includes N=26706 respondent answers to the phone survey, which was conducted in 2009-2010 in United States (National 2009 H1N1 Flu Survey). Respondents were asked whether they had received the H1N1 and seasonal flu vaccines, also some questions about themselves.
Measures
In this data set, there are two response variables, 1) showing whether respondent received H1N1 flu vaccine, 2) whether respondent received seasonal flu vaccine. Both are binary variables. Some respondents didn't get either vaccine, others got only one, and some got both.
There are 2 groups of predictors: behavioral and personal situation.
Behavior /opinions describing variables: level of concern about the H1N1, level of knowledge about H1N1, has taken antiviral medications (yes/no), has avoided close contact with others with flu-like symptoms (yes/no), has bought a face mask (yes/no), has frequently washed hands or used hand sanitizer (yes/no), has reduced time at large gatherings (yes/no), has reduced contact with people outside of own household (yes/no), has avoided touching eyes, nose, or mouth (yes/no), H1N1 flu vaccine was recommended by doctor (yes/no), seasonal flu vaccine was recommended by doctor (yes/no), has any of the chronic medical conditions (yes/no), has regular close contact with a child under the age of six months (yes/no), is a healthcare worker (yes/no), has health insurance (yes/no), respondent's opinion about H1N1 vaccine effectiveness (1 ânot at all effectiveâ to 5 âvery effectiveâ), respondent's opinion about risk of getting sick with H1N1 flu without vaccine (1 âvery lowâ to 5 âvery highâ), respondent's worry of getting sick from taking H1N1 vaccine (1 ânot at all worriedâ to 5 âvery worriedâ), respondent's opinion about seasonal flu vaccine effectiveness (1 ânot at all effectiveâ to 5 âvery effectiveâ), respondent's opinion about risk of getting sick with seasonal flu without vaccine (1 âvery lowâ to 5 âvery highâ), respondent's worry of getting sick from taking seasonal flu vaccine (1 ânot at all worriedâ to 5 âvery worriedâ).
Personal situation describing variables: age, race, sex, education level, marital status, household income, number of aduls in household, number of children in household, housing situation, employment status, type of industry, type of occupation, respondents residence, residence in metropolitan statistical areas.
Each row in the dataset represents one person who responded to the National 2009 H1N1 Flu Survey.
Analyses
For descriptive analysis, the distributions for the predictors and response variables: H1N1 and seasonal flu vaccines, were evaluated by examining frequency tables for categorical variables and calculating the mean, standard deviation and minimum and maximum values for quantitative variables.
For bivariate analysis, logistic regression method was used to test bivariate associations between chosen behavioral/opinions predictors and both response variables, the H1N1 and seasonal flu vaccines.
Lastly, multivariate multiple linear regression analysis was performed. Each dependent variable was separately regressed on the predictors. All predictor variables were standardized to have a mean=0 and standard deviation=1 prior to conducting the multivariate multiple linear regression analysis. Â For cross-validation testing to avoid overfitting graphical residual analysis was performed.to consider the fit, predicted values were put on the x axis and the estimated residuals on the y axis.
Please review my work and let me know what you think.
Sincerely,
Edita

















