Top Posts Tagged with #chi-square

How to Use SPSS: A Beginner’s Guide

If you’re diving into the world of data analysis, SPSS (Statistical Package for the Social Sciences) is an essential tool to have in your arsenal. This guide walks you through the basics of SPSS to help you get started with confidence.

SPSS Statistics Essential Training If you’re diving into the world of data analysis, SPSS (Statistical Package for the Social Sciences) is an essential tool to have in your arsenal. This guide walks you through the basics of SPSS to help you get started with confidence. What is SPSS? SPSS is a powerful statistical software used by researchers, students, and professionals to manage, analyze,…

View On WordPress

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Alcoholism and Major Lifetime Depression : W2 Data Analysis Tools

For the second week’s assignment of Data Analysis Tool on Coursera, we would continue to be working with NESARC’s dataset which contains information on alcohol and drug use and disorders, related risk factors, and associated physical and mental disabilities.

We would be studying the effect of Major Depression in the life of an individual on their alcohol consuming status. We'd be performing an Chi-Square test of Independence test between a categorical explanatory variable (alcohol drinking status ), and a categorical response variable (presence of major lifetime depression). We'll also be restricting the test to include only adults of age between 18-40.

The explanatory variable has 3 groups

Current Drinker

Ex Drinker

Lifetime Abstainer

The response variable has 2 groups.

0. No Lifetime Depression

1. Has Lifetime Depression

The null hypothesis is that there is no association between the drinking status of an individual and the presence of Major Lifetime Depression

Running a Chi-Square Test of Independence between the data for two variables, we get :

In the first table, the table of counts of the response variable by the explanatory variable, we see the number of individual under each consumer group (1,2, or 3), who do and do not have major lifetime depression. That is, among current drinkers, 10472 individuals do not have a Lifetime depression, while 2768 individuals do suffer from depression.

The next table presents the same data in percentages of individuals with or without lifetime depression under each alcohol consumer group. So 79% of current drinkers do not have major lifetime depression, while 21% do.

The graph below also conveys the same, just for the proportion of individuals under each alcohol consumer group who have Major Lifetime Depression. So, 21% of current drinkers and 20% of Ex-Drinkers have Major Lifetime Depression, while only 11 % of Lifetime abstainers have suffer from depression.

The Chi-Square Value from the test is large, about 168, while the p-value is very small (<< 0.0001), which tells us that the presence of Major Lifetime Depression and the Alcohol-Consuming Status of an individual are significantly associated.

The explanatory variable has 3 categories, and by observing the plot we can infer say that the Life-Time Abstainers had a significantly lower rate of life-time depression diagnosis compared to the current-drinkers and ex-drinkers. To quantitatively verify the same, and to avoid a type 1 error, we'll use the Bonferroni Adjustment Posthoc test.

Since we need to make only three pairs of comparisons, we would evaluate significance at the adjusted p-value of 0.017 (0.05/3).

Now, running a chi-square test between just the group 1 and 2 of Alcohol-Consumer Status we get a low Chi-Square value of 0.211 and a large p-value 0.64 >> 0.017. We hence will accept the null-hypothesis that there is no significant difference in the rates of Major Lifetime Depression among current-drinkers and ex-drinkers.

Running a chi-square test between just the group 1 and 3 of Alcohol-Consumer Status we get a high Chi-Square value of 165 and a low p-value << 0.017. We hence will reject the null-hypothesis that there is no significant difference in the rates of Major Lifetime Depression among current-drinkers and life-time abstainers.

Finally, using a chi-square test between just the group 2 and 3 of Alcohol-Consumer Status we get a high Chi-Square value of 89 and a low p-value << 0.017. We hence will once again reject the null-hypothesis that there is no significant difference in the rates of Major Lifetime Depression among Ex-Drinkers and life-time abstainers.

Thus, using the Bonferroni Adjustment, we can conclude that there is a significant difference in the occurrence of major life-time depression between Lifetime alcohol Abstainers as compared to current-drinkers or ex-drinkers. However, the rate of depression is not significantly different between current-drinkers and ex-drinkers.

Python Code

@author: DKalaikadal159607 """

import pandas import numpy import scipy.stats import seaborn import matplotlib.pyplot as plt

data = pandas.read_csv('nesarc.csv', low_memory=False)

#new code setting variables you will be working with to numeric

data['MAJORDEPLIFE'] = pandas.to_numeric(data['MAJORDEPLIFE'], errors='coerce') data['CONSUMER'] = pandas.to_numeric(data['CONSUMER'], errors='coerce') data['AGE'] = pandas.to_numeric(data['AGE'], errors='coerce')

#subset data to young adults age 18 to 40

sub1=data[(data['AGE']>=18) & (data['AGE']<=40)]

#make a copy of my new subsetted data

sub2 = sub1.copy()

#contingency table of observed counts

ct1=pandas.crosstab(sub2['MAJORDEPLIFE'], sub2['CONSUMER']) print (ct1)

colsum=ct1.sum(axis=0) colpct=ct1/colsum print(colpct)

print ('chi-square value, p value, expected counts') cs1= scipy.stats.chi2_contingency(ct1) print (cs1)

seaborn.catplot(x="CONSUMER", y="MAJORDEPLIFE", data=sub2, kind="bar", ci=None) plt.xlabel('Alcohol Consumer Status') plt.ylabel('Proportion with Major Depression')

recode2 = {1: 1, 2: 2} sub2['COMP1v2']= sub2['CONSUMER'].map(recode2)

#contingency table of observed counts

ct2=pandas.crosstab(sub2['MAJORDEPLIFE'], sub2['COMP1v2']) print (ct2)

#column percentages

colsum=ct2.sum(axis=0) colpct=ct2/colsum print(colpct)

print ('chi-square value, p value, expected counts') cs2= scipy.stats.chi2_contingency(ct2) print (cs2)

recode3 = {1: 1, 3:3 } sub2['COMP1v3']= sub2['CONSUMER'].map(recode3)

#contingency table of observed counts

ct3=pandas.crosstab(sub2['MAJORDEPLIFE'], sub2['COMP1v3']) print (ct3)

#column percentages

colsum=ct3.sum(axis=0) colpct=ct3/colsum print(colpct)

print ('chi-square value, p value, expected counts') cs3= scipy.stats.chi2_contingency(ct3) print (cs3)

recode4 = {2: 2, 3: 3} sub2['COMP2v3']= sub2['CONSUMER'].map(recode4)

#contingency table of observed counts

ct4=pandas.crosstab(sub2['MAJORDEPLIFE'], sub2['COMP2v3']) print (ct4)

#column percentages

colsum=ct4.sum(axis=0) colpct=ct4/colsum print(colpct)

print ('chi-square value, p value, expected counts') cs4= scipy.stats.chi2_contingency(ct4) print (cs4)

#coursera #chi-square #data analysis tools

Week 2: Performing a Chi-Square test of independence

The Chi-Square (X²) test of independence can be used when comparing a categorical explanatory variable (e.g. the absence or presence of major depression) to another categorical explanatory value (e.g. ethnical group).

C->C

For this course we use a categarical explanatory value and compare it to a categorized originally quantative explanatory variable.

For my test I choose to compare for regular chewing tobacco users if there is a dependency between the use of chewing tobacco and the presence of symptoms for lifetime nicotine dependence.

I categorized the usage frequency in 6 groups sorted by the usual times in a month cheweing tobacco is used. It results in the groups “1″, “2.5″, “5″, “14″, “22″ and “30″ days monthly use.

The null hypothesis here is:

Ho: There is no significant difference between those categorys.

While the alternative hypothesis then is:

Ha: There is a sinificant difference in those categorys.

For performing the Chi-Sqaure, I only include relevant data of those who used chewing tobacco in the last 12 month, sort the data by ID-Number and perform the test using the Proc Frey function, followed by /CHISQ, pairing the categories:

“used chewing tobacco when experiencing symptoms for lifetime nicotine dependence” + “Usual usage frequency of chewing tobacco”

As we have 6 different response categories who all need to be compared pairwise to another, we have in sum 15 paired comparisons involved in the test.

To ensure, that we do not have a type 1 error, we need to adjust our used alpha with the bonferroni procedure: To do this we divide the original alpha of 0.05 with the amount of comparisons:

p (adjusted): 0.05/15 = 0.0033

The result of the hi Square showed a value of 54.38 and a p<0.0001 So it looks like we have a significant finding, as the p value is way smaller than the used, adjusted p value.

But to ensure that we do not have a type one error, we perform a post hoc by comparing all of the possible pairs one by one, catergorize the results and check for significant differences.

We do this by using the CHISQ Porc FREQ function, but only include the date for two of the frequency categories at a time. This is repeated 15 times for all possible pairs, see for example:

The results show some different outcomes and tell us, that there are several comparisons without significant findings, for example comparing “14″ to “30″ we have a p-value of 0.065: no definite significance.

To compare the results in an easy an short overview, I put them in a table:

When looking at this, we only see significant differences between the pairs 1+30, 2.5+30 and 5+30. All other comparisons show no significant differences via the p-value that is greater than our adjusted p-value of 0.0033.

That means, that the post hoc does not give enough evidence to reject the null hypothesis.

So we still need to think, the Ho might still be true.

The post hoc protected us against a possible type 1 error successfully.

#chi-square #coursera #data analysis #sas

Chi-square Test of Independence for the GapMinder Dataset

Note: Chi-square Test of Independence is used for categorical explanatory and response variables. It measures how far the data are from the null hypothesis Ho.

The variables I am working with are quantitative variables, Chi-square analysis is not applicable, but for the purpose of the project, I will use Chi-square to examine the relationship between the categorized variables ‘income_cat’(explanatory variable) and ‘urbanrate_cat’(response variable).

Hypothesis

Ho: There is no relationship between the two categorical variables

Ha: There is a relationship between the two categorical variables

Summary

Chi-square value: 129.539, p-value: 7.638e-22 (<<0.05), ‘urbanrate_cat’ and ‘income_cat’ are significantly associated.

For the Bonferroni Adjustment test, if the p-value </=0.008 for the groups, we can conveniently reject the null hypothesis Ho between them.

‘low_income’ and ‘low-mid_income’ groups, Chi-square value: 27.852, p-value: 1.336e-05(<<0.008).

‘low_income’ and ‘high-mid_income’ groups, Chi-square value: 46.834, p-value: 1.651e-09(<<0.008).

‘low_income’ and ‘high_income’ groups, Chi-square value: 69.486, p-value: 2.914e-14(<<0.008).

‘low-mid_income’ and ‘high-mid_income’ groups, Chi-square value: 13.917, p-value: 0.008(=0.008).

‘low-mid_income’ and ‘high_income’ groups, Chi-square value: 39.644, p-value: 5.128e-08(<<0.008).

‘high-mid_income’ and ‘high_income’ groups, Chi-square value: 19.896, p-value: 0.001(<0.008).

Since all the p-values from the Bonferroni Adjustment test are </= 0.008, it is safe to reject the null hypothesis Ho for all the groups. It is concluded that ‘urbanrate_cat’ and ‘income_cat’ are related.

#Chi-square #Data analysis #Coursera

Chi-Square Test

Inference: p-value here is 0.0021 and it is <= 0.05. So we can reject NULL Hypothesis and say that there is some association between Property_area and loan_status

Post Hoc Testing

We have three combinations here: 'Rural - Semiurban', 'Rural - Urban' and 'Semiurban - Rural'. So le tus create three dataframes and start performing chi-square tests for each of the pair

Apply Bonferroni correction and considering adjusted significance level - 0.05/3 = 0.0167

Here p-value is 0.00107 and it is <= 0.016 (new significance level). So Loan_Status does have an association with Property_Area Rural and Semiurban

Here p-value is 0.43373 and it is > 0.016 (new significance level). So Loan_Status does not have an association with Property_Area - Rural and Urban

Here p-value is 0.001071 and it is <= 0.016 (new significance level). So Loan_Status does not have an association with Property_Area - Rural and Semiurban

#statistics #analytics #chi-square

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Data analysis: Assignment 2: Chi-square testing

In order to determine if the parental background of individuals starting drinking before 21 years old has an impact on the categories of age at which they started, a chi-square analysis has been done.

Here is the code performing the testing:

Below is the output of thwe code on IPython:

Model Interpretation for Chi-Square Tests:

When examining the association between the age at which the individuals started drinking (grouped age - categorical response) and the parental background (alcoholic or not - categorical explanatory), a chi-square test of independence revealed that among individuals who have alcoholic parents, 35.9% started drinking before age 15, compared to those who does not have alcoholic parents (24.4%), X2 =67.99, 1 df, p=1.64e-16.

The df or degree of freedom we record is the number of levels of the explanatory variable -1. Here the df is 1 alcoholic parents which has 2 levels (df 2-1=1).

The Chi Square test of independence revealed that among individuals who started drinking alcohol before 21 years old, the parental background is significantly associated.

#chi-square

Learn to install a chi-square test in R and interpret the different results by using techniques and examples; predictive modeling, hypothetical example, chi-squared test and R code.

#chi-square tests in r #install #chi-square #R #techniques #tutorial

CHI-SQUARE

View On WordPress

#chi-square #test for independence