DATA VISUALIZATION IN SAS
1) MY SAS VISUALIZATION PROGRAM
/* DATA VISUALIZATION - UNIVARIATE AND BIVARIATE PLOTS / / Developed by: [Your Name] / / Date: [Current Date] */
/* Clear work environment */ proc datasets library=work nolist kill; quit;
/* Create dataset - STUDENTS */ data students; input ID Age Gender $ Course $ Grade Attendance; datalines; 1 20 M Mathematics 85 90 2 22 F Biology 78 85 3 21 F History 92 95 4 19 M Mathematics 76 80 5 23 M Biology 88 92 6 20 F Mathematics 95 98 7 22 M History 82 88 8 21 F Biology 79 83 9 20 M Mathematics 91 94 10 24 F History 84 87 11 19 M Biology . 75 12 22 F Mathematics 87 91 13 21 M History 93 96 14 20 F Biology 80 82 15 23 M Mathematics 89 93 ; run;
/* STEP 1: UNIVARIATE PLOTS - Examine center and spread */
/* Histogram for Age (Numerical Variable) */ proc sgplot data=students; histogram Age / binwidth=1 fillattrs=(color=steelblue) transparency=0.5; density Age / lineattrs=(color=red thickness=2); title "Univariate Analysis: Age Distribution"; title2 "Histogram with Density Curve"; run;
/* Bar chart for Gender (Categorical Variable) */ proc sgplot data=students; vbar Gender / fillattrs=(color=lightgreen) dataskin=crisp; title "Univariate Analysis: Gender Distribution"; title2 "Bar Chart of Categorical Variable"; run;
/* Bar chart for Course (Categorical Variable) */ proc sgplot data=students; vbar Course / fillattrs=(color=coral) dataskin=crisp; title "Univariate Analysis: Course Distribution"; title2 "Bar Chart of Academic Programs"; run;
/* Box plot for Grade (Numerical Variable - Center and Spread) */ proc sgplot data=students; vbox Grade / fillattrs=(color=lightpurple) dataskin=crisp; title "Univariate Analysis: Grade Distribution"; title2 "Box Plot Showing Center and Spread"; run;
/* STEP 2: BIVARIATE PLOTS - Association between variables */
/* Scatter plot: Age vs Grade */ proc sgplot data=students; scatter x=Age y=Grade / markerattrs=(color=blue size=10px symbol=circlefilled); reg x=Age y=Grade / lineattrs=(color=red thickness=2); title "Bivariate Analysis: Age vs Grade"; title2 "Scatter Plot with Regression Line"; xaxis label="Age (Years)"; yaxis label="Grade (Points)"; run;
/* Grouped bar chart: Course vs Average Grade */ proc sgplot data=students; vbar Course / response=Grade group=Gender groupdisplay=cluster stat=mean dataskin=crisp; title "Bivariate Analysis: Average Grade by Course and Gender"; title2 "Grouped Bar Chart - Categorical Association"; yaxis label="Average Grade"; run;
/* Box plot: Grade distribution by Course */ proc sgplot data=students; vbox Grade / category=Course group=Gender groupdisplay=cluster fillattrs=(color=lightblue) dataskin=crisp; title "Bivariate Analysis: Grade Distribution by Course and Gender"; title2 "Grouped Box Plots - Center and Spread Comparison"; run;
/* Additional bivariate plot: Attendance vs Grade */ proc sgplot data=students; scatter x=Attendance y=Grade / group=Course markerattrs=(size=10px) transparency=0.3; ellipse x=Attendance y=Grade / alpha=0.2 type=mean; title "Bivariate Analysis: Attendance vs Grade by Course"; title2 "Scatter Plot with Confidence Ellipses"; xaxis label="Attendance Rate (%)"; yaxis label="Grade (Points)"; run;
2) UNIVARIATE PLOTS CREATED
Histogram - Age Distribution:
Shows the frequency distribution of student ages
Includes density curve for normal distribution reference
Center: Around 21 years | Spread: 19-24 years range
Bar Chart - Gender Distribution:
Visual representation of gender proportions
Clear comparison between male and female counts
Bar Chart - Course Distribution:
Shows popularity of different academic programs
Mathematics appears most frequent
Box Plot - Grade Distribution:
Displays center (median), spread (IQR), and potential outliers
Shows the five-number summary visually
3) BIVARIATE PLOTS CREATED
Scatter Plot - Age vs Grade:
Examines relationship between age and academic performance
Includes regression line to show trend
Reveals any correlation between these variables
Grouped Bar Chart - Course vs Grade by Gender:
Compares average grades across different courses
Shows gender differences within each course
Reveals interaction between categorical variables
Grouped Box Plots - Grade by Course and Gender:
Compares grade distributions across multiple categories
Shows center, spread, and variability for each group
Visualizes potential differences in academic performance
Scatter Plot - Attendance vs Grade by Course:
Examines relationship between attendance and grades
Includes confidence ellipses for each course
Shows clustering patterns by academic program
4) GRAPHICAL ANALYSIS DESCRIPTION
Univariate Plot Insights: The Age histogram shows a relatively symmetric distribution centered around 21 years, with most students aged 19-22. The Gender bar chart confirms nearly equal distribution between males and females. Course distribution reveals Mathematics as the most popular program (40%), while the Grade box plot shows good spread with median around 85-87 points and one potential missing value.
Bivariate Relationship Insights: The Age vs Grade scatter plot shows no strong linear relationship, suggesting age doesn't strongly predict academic performance in this sample. The Course vs Grade analysis reveals interesting patterns - Mathematics students show consistently high grades, while Biology shows more variability. Gender differences appear minimal within courses. The Attendance vs Grade plot shows a positive trend, suggesting higher attendance correlates with better grades, particularly evident in the Mathematics group.
Key Findings:
Course selection appears more influential on grades than age or gender
Attendance shows positive association with academic performance
Mathematics students demonstrate both high attendance and high grades
No major outliers or unusual patterns detected in the relationships


















