Top Posts Tagged with #lassoregression

Running Lasso Regression Analysis

Import Libraries

from pandas import Series, DataFrame import pandas as pd import numpy as np import os import matplotlib.pylab as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LassoLarsCV

Load the dataset

data = pd.read_csv("C:\Users\guy3404\OneDrive - MDLZ\Documents\Cross Functional Learning\AI COP\Coursera\machine_learning_data_analysis\Datasets\tree_addhealth.csv")

Getting information aboubt the dataset

data.info()

upper-case all DataFrame column names

data.columns = map(str.upper, data.columns)

Total size of data

len(data)

We observe some of the columns of the dataset contains null values . We need to drop them

Drop Null values

data_clean = data.dropna()

Data management

recode1 = {1:1, 2:0} data_clean['MALE']= data_clean['BIO_SEX'].map(recode1)

Length of dataset after dropping null values

len(data_clean)

Split into training and testing sets

select predictor variables and target variable as separate data sets

predvar= data_clean[['MALE','HISPANIC','WHITE','BLACK','NAMERICAN','ASIAN', 'AGE','ALCEVR1','ALCPROBS1','MAREVER1','COCEVER1','INHEVER1','CIGAVAIL','DEP1', 'ESTEEM1','VIOL1','PASSIST','DEVIANT1','GPA1','EXPEL1','FAMCONCT','PARACTV', 'PARPRES']]

target = data_clean.SCHCONN1

standardize predictors to have mean=0 and sd=1

predictors=predvar.copy() from sklearn import preprocessing predictors['MALE']=preprocessing.scale(predictors['MALE'].astype('float64')) predictors['HISPANIC']=preprocessing.scale(predictors['HISPANIC'].astype('float64')) predictors['WHITE']=preprocessing.scale(predictors['WHITE'].astype('float64')) predictors['NAMERICAN']=preprocessing.scale(predictors['NAMERICAN'].astype('float64')) predictors['ASIAN']=preprocessing.scale(predictors['ASIAN'].astype('float64')) predictors['AGE']=preprocessing.scale(predictors['AGE'].astype('float64')) predictors['ALCEVR1']=preprocessing.scale(predictors['ALCEVR1'].astype('float64')) predictors['ALCPROBS1']=preprocessing.scale(predictors['ALCPROBS1'].astype('float64')) predictors['MAREVER1']=preprocessing.scale(predictors['MAREVER1'].astype('float64')) predictors['COCEVER1']=preprocessing.scale(predictors['COCEVER1'].astype('float64')) predictors['INHEVER1']=preprocessing.scale(predictors['INHEVER1'].astype('float64')) predictors['CIGAVAIL']=preprocessing.scale(predictors['CIGAVAIL'].astype('float64')) predictors['DEP1']=preprocessing.scale(predictors['DEP1'].astype('float64')) predictors['ESTEEM1']=preprocessing.scale(predictors['ESTEEM1'].astype('float64')) predictors['VIOL1']=preprocessing.scale(predictors['VIOL1'].astype('float64')) predictors['PASSIST']=preprocessing.scale(predictors['PASSIST'].astype('float64')) predictors['DEVIANT1']=preprocessing.scale(predictors['DEVIANT1'].astype('float64')) predictors['GPA1']=preprocessing.scale(predictors['GPA1'].astype('float64')) predictors['EXPEL1']=preprocessing.scale(predictors['EXPEL1'].astype('float64')) predictors['FAMCONCT']=preprocessing.scale(predictors['FAMCONCT'].astype('float64')) predictors['PARACTV']=preprocessing.scale(predictors['PARACTV'].astype('float64')) predictors['PARPRES']=preprocessing.scale(predictors['PARPRES'].astype('float64'))

split data into train and test sets

pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, target, test_size=.3, random_state=123)

specify the lasso regression model

model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train)

print variable names and regression coefficients

dict(zip(predictors.columns, model.coef_))

plot coefficient progression

m_log_alphas = -np.log10(model.alphas_) ax = plt.gca() plt.plot(m_log_alphas, model.coef_path_.T) plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha CV') plt.ylabel('Regression Coefficients') plt.xlabel('-log(alpha)') plt.title('Regression Coefficients Progression for Lasso Paths')

MSE from training and test data

from sklearn.metrics import mean_squared_error train_error = mean_squared_error(tar_train, model.predict(pred_train)) test_error = mean_squared_error(tar_test, model.predict(pred_test)) print ('training data MSE') print(train_error) print ('test data MSE') print(test_error)

R-square from training and test data

rsquared_train=model.score(pred_train,tar_train) rsquared_test=model.score(pred_test,tar_test) print ('training data R-square') print(rsquared_train) print ('test data R-square') print(rsquared_test)

Summary

The study used lasso regression to figure out which factors affect how connected adolescents feel to school. They had 23 variables, including things like age, substance use, and family-related factors. The data was split into a training set (70%) and a test set (30%). The model found 18 key variables that together explained 33.4% of the variation in school connectedness. Self-esteem, depression, violent behavior, and GPA were the strongest influencers. Positive factors included older age, Hispanic/Asian ethnicity, family connectedness, and parental involvement. Negative factors included being mal, Black/Native American, substance use, deviant behavior, and expulsion history. The R square and MSE values of both train and test datasets are very close, indicating a lower variance in the model results.

#machine learning #lassoregression #datascience

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Lasso Regression Analysis

LASSO Regression is used to reduce the model overfitting. It increase the bias and reduce the variance in model.

Full form of LASSO is Least Absolute Shrinkage and Selection Operator. So the model itself is capable of feature selection. It shrinks the less important features and remove the features which are not important by making the value of features zero.

LASSO regression also know as L1 regularization. It takes the absolute value of variable and remove variables which don't much contribute to the model.

#Conclusion

We can clearly see the prediction accuracy is stable when we used both the dataset When we add more data the prediction error decreases. The R-square values of .74 and .70 indicate training and test model have variance of .74 and .70

#LassoRegression #machinelearningdataanalysis

Lasso Regression

LASSO Regression is used to reduce the model overfitting. It increase the bias and reduce the variance in model.

LASSO regression also know as L1 regularization. It takes the absolute value of variable and remove variables which don't much contribute to the model.

#Conclusion We can clearly see the prediction accuracy is stable when we used both the dataset When we add more data the prediction error decreases. The R-square values of .74 and .70 indicate training and test model have variance of .74 and .70

#LassoRegression #machinelearningdataanalysis

Day 61 - Ridge Regression and Lasso Regression https://www.gopichandrakesan.com/day-61-ridge-regression-and-lasso-regression/?feed_id=288&_unique_id=60cc9f0e352cf #100dayschallenge #LassoRegression #machinelearning #RidgeRegression

#100dayschallenge #LassoRegression #machinelearning #RidgeRegression

Lasso Regression #1

Lasso regression analysis is a shrinkage and variable selection method for linear regression models. The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable.

The following code will demonstrate the working of Lasso Regression to determine whether a person has diabetes or not using some feature attributes.

1. Import all the useful classes and packages and also specify the correct path of the dataset file that is to be used.

2. Use the dropna() function to remove all the null valued rows from the dataset. Then, display the dataset that you are using.

3. Take two variable X and y and store the feature attributes and target attribute respectively. All the attributes other than the target attribute (Outcome in this case) is considered as a feature attribute.

4. Apply the train_test_split function that we imported earlier to divide the dataset into training data and test data. X_train and X_test is the training and test data of the sample and y_train and y_test is the response/outcome train and test data. We will consider test_size as 0.3 i.e. 70% data for training and 30% for testing. Random state is kept to be 123 thus the dataset produces the same kind of distribution everytime the dataset runs.

5. Now, we use the Lasso Regression analysis. We will use LassoLarsCV with 10 cross validations. We keep precompute as False so that the system doesn’t use any pre-computed matrices. Then, fit the model with training data.

6. Now, find the mean squared error of the training data and test data. The values calculated in this case is 0.16 and 0.15 for training and test data respectively. The values of error is very low , thus telling the model is good.

7. Calculate the R2 Score of the model. The R2 score calculated is 0.27 and 0.35 of the training and test data respectively.

8. We also print the regression variables that are used to compute our result.

9. We use matplotlib to plot a graph between regression coefficients or Lasso Paths.

This type of regression nullifies some values to zeroes. It also considers the same kind of datatypes in all the attributes.

#machinelearning #lassoregression #easywaytolearn #coefficients

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Lasso Regression

libname mydata "/courses/d1406ae5ba27fe300" access=readonly;

DATA new; set mydata.gapminder;

keep country co2emissions femaleemployrate hivrate internetuserate lifeexpectancy oilperperson polityscore relectricperperson suicideper100th employrate urbanrate;

*delete observations with missing data;

if cmiss(of _ all_) then delete; run;

ods graphics on;

* Split data randomly into test and training data;

proc surveyselect data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

ods graphics on;

Split data randomly into test and training data; proc surveyselect data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

lasso multiple regression with lars algorithm k=10 fold validation; proc glmselect data=traintest plots=all seed=123; partition ROLE=selected(train='1' test='0');

class country;

model co2emissions = country femaleemployrate hivrate internetuserate lifeexpectancy oilperperson polityscore relectricperperson suicideper100th employrate urbanrate/selection=lar(choose=cv stop=none) cvmethod=random(10);

run;

A lasso regression analysis was conducted to identify a subset of variables from a pool of 11 categorical and quantitative predictor variables that best predicted a quantitative response variable measuring co2emissions. Categorical predictors which are included: only one- country ( 56 countries altogether ) to improve interpretability of the selected model with fewer predictors.

Quantitative predictor variables include femaleemployrate, hivrate, internetuserate, lifeexpectancy, oil consumption per person, polityscore ( democracy ), residential electricity consumption, suicide per 100 000 people, employrate, and urbanrate. All predictor variables were standardized to have a mean of zero and a standard deviation of one. Data were randomly split into a training set that included 70% of the observations (N=40) and a test set that included 30% of the observations (N=16). The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.

The table for LAR selection information shows the steps in the analysis and the variable that is entered at each step. Of the 11 predictor variables, 10 were retained in the selected model. During the estimation process femaleemployrate_56,53,46, and internetuserate were most strongly associated with co2 emissions, followed by femaleemployrate_45,32 and may too with hivrate- 0.2. The best predictor of all is: femaleemployrate_56.

All of the predictors were nearly positively associated with co2 emissions according to Average Square Error ( ASE and Test ASE table) and mainly femaleemployrate, internetuserate, hivrate were positively associated with co2 emissions. These variables accounted for 39.5% of the variance in the co2 emissions response variable.

#SAS #lassoregression #machinelearning

Machine Learning for Data Analysis (Week 3 : Running a Lasso Regression Analysis)

In statistics and machine learning, lasso (least absolute shrinkage and selection operator) (also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces.

For my study, I have chosen suicide rate as the response variable and since it is continuous, I have categorised it into two categories, i.e., countries having low suicide rate (<8) coded as 0 and those with higher rates as 1.

A lasso regression analysis was conducted to identify a subset of variables from a pool of 14 quantitative response variables measuring levels of suicide among countries. The predictor variables included internet use rate, income, urbanisation rate, alcohol consumption, CO2 emissions, employment, oil consumption, polity score, life expectancy, armed forces rate, breast cancer prevalence, female employment, HIV prevalence and electricity consumption.

Data were randomly split into a training set that included 70% of the observations and a test set that included 30% of the observations. The least angle regression algorithm with k=10 fold cross validation was used to estimate the lasso regression model in the training set, and the model was validated using the test set. The change in the cross validation average (mean) squared error at each step was used to identify the best subset of predictor variables.

The SAS code for running Lasso regression analysis is as under:

libname mydata "/courses/d1406ae5ba27fe300" access=readonly; data new; set mydata.gapminder;

*********************************************************************************************** DATA MANAGEMENT ***********************************************************************************************; LABEL suicidegroup = "Level of suicide"; IF suicideper100th EQ . THEN suicidegroup = .; ELSE IF suicideper100th LE 8 THEN suicidegroup = '0'; ELSE suicidegroup = '1';

* delete observations with missing data; IF cmiss(of _all_) THEN delete; RUN;

ods graphics on;

* Split data randomly into test and training data; PROC SURVEYSELECT data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

* lasso multiple regression with lars algorithm k=10 fold validation; PROC GLMSELECT data=traintest plots=all seed=123; partition ROLE=selected(train='1' test='0'); model suicidegroup = internetuserate incomeperperson urbanrate alcconsumption co2emissions employrate oilperperson polityscore lifeexpectancy armedforcesrate breastcancerper100th femaleemployrate hivrate relectricperperson / selection=lar(choose=cv stop=none) cvmethod=random(10);

RUN;

OUTPUT

The sample for the training data has been drawn using simple random sampling with a selection probability of 0.714286 of each observation.

On applying lasso regression, it was found that all of the variables have been retained in the model. The values of ASE (Mean Square Error) decrease as the number of predictors increase, which is in expected as the model complexity increases. For the test data, a rise in ASE is seen 10th step onward.

The optimal model has been obtained at the 4th step. The values in the CV Press column first decrease and then sharply rise after the 4th step. The best subset of predictor variables thus includes 4 predictors namely, internetuserate, alcconsumption, urbanrate and oilperperson.

From the graph above it is seen that among the 4 selected predictor variables while oil consumption, internet use and alcohol consumption are positively associated suicide level, urban rate has a negative association.

From the fit criteria based on other methods, it is seen that the best criterion values for AIC, AICC and Adj R-Sq are at the 8th step while for SBC method, it is at the intercept step. These are in contrast with the predicted best subset based on the CV PRESS values.

The ASE for the training data decreases consistently as the number of predictors in the model increases. The ASE for test data shows a comparatively moderate decrease and eventually starts increasing as model complexity increases.

The best predicted model is:

Y = 0.6496 + 0.0036*X1 - 0.0025*X2 + 0.0028*X3 + 0.0043*X4

where Y = suicide level, X1 = internetuserate, X2 = urbanrate, X3 = alcconsumption, X4 = oilperperson

The selected model has 4 degrees of freedom and an F-value of 1.45. Adjusted R square is 0.0444, i.e., the model explains 4.44% of the variability in the response variable.

#lassoregression

Running Lasso Regression Analysis

Import Libraries

Load the dataset

data = pd.read_csv("C:\Users\guy3404\OneDrive - MDLZ\Documents\Cross Functional Learning\AI COP\Coursera\machine_learning_data_analysis\Datasets\tree_addhealth.csv")

Getting information aboubt the dataset

data.info()

upper-case all DataFrame column names

data.columns = map(str.upper, data.columns)

Total size of data

len(data)

We observe some of the columns of the dataset contains null values . We need to drop them

Drop Null values

data_clean = data.dropna()

Data management

recode1 = {1:1, 2:0} data_clean['MALE']= data_clean['BIO_SEX'].map(recode1)

Length of dataset after dropping null values

len(data_clean)

Split into training and testing sets

select predictor variables and target variable as separate data sets

target = data_clean.SCHCONN1

standardize predictors to have mean=0 and sd=1

split data into train and test sets

pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, target, test_size=.3, random_state=123)

specify the lasso regression model

model=LassoLarsCV(cv=10, precompute=False).fit(pred_train,tar_train)

print variable names and regression coefficients

dict(zip(predictors.columns, model.coef_))

plot coefficient progression

MSE from training and test data

R-square from training and test data

rsquared_train=model.score(pred_train,tar_train) rsquared_test=model.score(pred_test,tar_test) print ('training data R-square') print(rsquared_train) print ('test data R-square') print(rsquared_test)

Summary

#machine learning #lassoregression #datascience

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Lasso Regression Analysis

LASSO Regression is used to reduce the model overfitting. It increase the bias and reduce the variance in model.

LASSO regression also know as L1 regularization. It takes the absolute value of variable and remove variables which don't much contribute to the model.

#Conclusion

#LassoRegression #machinelearningdataanalysis

Lasso Regression

LASSO Regression is used to reduce the model overfitting. It increase the bias and reduce the variance in model.

LASSO regression also know as L1 regularization. It takes the absolute value of variable and remove variables which don't much contribute to the model.

#LassoRegression #machinelearningdataanalysis

#100dayschallenge #LassoRegression #machinelearning #RidgeRegression

Lasso Regression #1

The following code will demonstrate the working of Lasso Regression to determine whether a person has diabetes or not using some feature attributes.

1. Import all the useful classes and packages and also specify the correct path of the dataset file that is to be used.

2. Use the dropna() function to remove all the null valued rows from the dataset. Then, display the dataset that you are using.

7. Calculate the R2 Score of the model. The R2 score calculated is 0.27 and 0.35 of the training and test data respectively.

8. We also print the regression variables that are used to compute our result.

9. We use matplotlib to plot a graph between regression coefficients or Lasso Paths.

This type of regression nullifies some values to zeroes. It also considers the same kind of datatypes in all the attributes.

#machinelearning #lassoregression #easywaytolearn #coefficients

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Lasso Regression

libname mydata "/courses/d1406ae5ba27fe300" access=readonly;

DATA new; set mydata.gapminder;

keep country co2emissions femaleemployrate hivrate internetuserate lifeexpectancy oilperperson polityscore relectricperperson suicideper100th employrate urbanrate;

*delete observations with missing data;

if cmiss(of _ all_) then delete; run;

ods graphics on;

* Split data randomly into test and training data;

proc surveyselect data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

ods graphics on;

Split data randomly into test and training data; proc surveyselect data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

lasso multiple regression with lars algorithm k=10 fold validation; proc glmselect data=traintest plots=all seed=123; partition ROLE=selected(train='1' test='0');

class country;

run;

#SAS #lassoregression #machinelearning

Machine Learning for Data Analysis (Week 3 : Running a Lasso Regression Analysis)

The SAS code for running Lasso regression analysis is as under:

libname mydata "/courses/d1406ae5ba27fe300" access=readonly; data new; set mydata.gapminder;

* delete observations with missing data; IF cmiss(of _all_) THEN delete; RUN;

ods graphics on;

* Split data randomly into test and training data; PROC SURVEYSELECT data=new out=traintest seed = 123 samprate=0.7 method=srs outall; run;

RUN;

OUTPUT

The sample for the training data has been drawn using simple random sampling with a selection probability of 0.714286 of each observation.

The best predicted model is:

Y = 0.6496 + 0.0036*X1 - 0.0025*X2 + 0.0028*X3 + 0.0043*X4

where Y = suicide level, X1 = internetuserate, X2 = urbanrate, X3 = alcconsumption, X4 = oilperperson

The selected model has 4 degrees of freedom and an F-value of 1.45. Adjusted R square is 0.0444, i.e., the model explains 4.44% of the variability in the response variable.

#lassoregression

Top Posts Tagged with #lassoregression | Tumlook

Trending Tags

Last Seen Tags

#lassoregression

Trending Tags

Last Seen Tags

#lassoregression