My First SAS Program - Frequency Distribution Analysis
SAS PROGRAM FREQUENCY DISTRIBUTION ANALYSIS 1) MY SAS PROGRAM CODE
/* STEP 1: First program - Base for future analysis / / Developed by: [Your Name] / / Date: [Date] */
/* Clear work environment */ proc datasets library=work nolist kill; quit;
/* Create sample dataset - STUDENTS */ data students; input ID Age Gender $ Course $ Grade Attendance; datalines; 1 20 M Mathematics 85 90 2 22 F Biology 78 85 3 21 F History 92 95 4 19 M Mathematics 76 80 5 23 M Biology 88 92 6 20 F Mathematics 95 98 7 22 M History 82 88 8 21 F Biology 79 83 9 20 M Mathematics 91 94 10 24 F History 84 87 11 19 M Biology . 75 12 22 F Mathematics 87 91 13 21 M History 93 96 14 20 F Biology 80 82 15 23 M Mathematics 89 93 ; run;
/* STEP 2: Frequency distributions for three variables */
/* Analysis of AGE variable */ proc freq data=students; tables Age / missing; title "Frequency Distribution - STUDENT AGE"; title2 "Analysis of Discrete Quantitative Variable"; run;
/* Analysis of GENDER variable */ proc freq data=students; tables Gender / missing; title "Frequency Distribution - STUDENT GENDER"; title2 "Analysis of Nominal Qualitative Variable"; run;
/* Analysis of COURSE variable */ proc freq data=students; tables Course / missing; title "Frequency Distribution - STUDENT COURSE"; title2 "Analysis of Nominal Qualitative Variable"; run;
/* Analysis of GRADE variable (with missing data) */ proc freq data=students; tables Grade / missing; title "Frequency Distribution - STUDENT GRADE"; title2 "Analysis of Quantitative Variable with Missing Data"; run;
2) FREQUENCY DISTRIBUTION RESULTS
3) FREQUENCY DISTRIBUTION DESCRIPTION
Age Variable Analysis: The age distribution shows students range from 19 to 24 years, with the highest concentration at age 20 (26.67%). The distribution is relatively balanced across age groups, with no missing data. Most students (66.67%) are between 20-22 years old.
Gender Variable Analysis: Gender distribution is nearly balanced, with 53.33% male and 46.67% female students. The slight difference represents only one additional male student in the sample. There are no missing values in this variable.
Course Variable Analysis: Mathematics is the most popular course (40%), followed by Biology (33.33%) and History (26.67%). The distribution shows a preference for STEM fields, with no missing data recorded.
Grade Variable Analysis: Grades range from 76 to 95 with one missing value (6.67%). The distribution shows varied academic performance across the student sample, with values fairly spread out rather than clustered around specific ranges.
















