Running a lasso Regression Analysis
A Lasso regression analysis was conducted to identify a subset of variables from a small pool of socioeconomic and digital access indicators that best predicted life expectancy across countries. The response variable was life expectancy. Three predictor variables were initially included in the model: income per person, urbanization rate, and internet usage rate. These variables were selected to represent key dimensions of economic development, population structure, and technological access.
To run the analysis, the following syntax were used
The following results were obtained
Interpretation of the results
All predictor variables were standardized to have a mean of zero and a standard deviation of one prior to model fitting to ensure comparability of coefficients and to improve the performance of the regularization process. The dataset was analyzed using a Lasso regression model with 5-fold cross-validation. The least angle regression (LARS) algorithm was used to estimate the model and select the optimal level of regularization (lambda) based on minimizing cross-validation error. The optimal penalty parameter (alpha) was 0.209, which resulted in a sparse model where one of the three predictors was excluded. The dataset was not split into separate training and test sets, but model performance was evaluated using cross-validation, yielding an R² of approximately 0.60, indicating that the selected predictors explained about 60% of the variation in life expectancy across countries. Of the three predictor variables, two were retained in the final model: urbanization rate and internet usage rate. Income per person was shrunk to zero and excluded from the model, indicating that it did not contribute additional predictive power once the other variables were included. Internet usage rate was the strongest predictor of life expectancy, showing a strong positive association with the outcome variable. Countries with higher internet penetration tended to have higher life expectancy, likely reflecting improved access to information, healthcare systems, and economic opportunities. Urbanization rate was also positively associated with life expectancy, suggesting that more urbanized countries tend to have better health outcomes, possibly due to improved infrastructure, healthcare access, and service delivery. Income per person was not retained in the final model, suggesting that its effect on life expectancy may be captured indirectly through variables such as urbanization and internet access.


















