Hierarchical clustering of variables using the R package ClustOfVar.
seen from France
seen from Malaysia

seen from Poland

seen from Malaysia

seen from Türkiye
seen from United States
seen from United States
seen from China

seen from United Kingdom

seen from Guadeloupe
seen from Malaysia
seen from Malaysia

seen from Australia
seen from China
seen from United States
seen from Singapore

seen from Guadeloupe
seen from T1
seen from Australia

seen from Sweden
Hierarchical clustering of variables using the R package ClustOfVar.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Supervised discretization of continuous variables using the R package scorecard.
Morris Goes Against the Grain
In the third class of Metis’s D3 Course, Kai Chang tasked our class with analyzing the canonical dot plot data set, barley_data (full data_set). To get a better feel for the data, before diving into D3 development, I’m going to throw it into Tableau and toy around.
To understand where I am coming from with this analysis I would recommend reading through Revolutions post about the Barley dot plot first.
Share any comment, question or feedback you have into the comment section at the bottom of this post.
Let start by looking at the yield at each site over time.
To me, what stands out here is Morris’s consistency versus the other sites. Morris is the site that’s third from the bottom and nearly flat. Now let’s look at it relative to all other sites and zoom into the years Revolution makes mention of in their post.
By giving all sites a shared y-axis we can better compare Morris to the others. While the original piece in Revolutions contains only two years we have nine in the set we’re analyzing. If we put our focus on only 1931 and 1932 we can see that all site take a steep dip except for Morris whose yield continues to rise, but only gradually.
To further drive home this point of comparison we can isolate the relevant years. In doing this, we get a good sense of Morris’s yield compared to the other as well as the change in annual yield from 1931 to 1932. Morris is, in fact, the only site to have an increase from 1931 to 1932, but the growth was tiny. It’s the change from year to year that’s drastically different from Morris and others.
When we plot the YoY differences between Morris, it looks like every site took a hit is in 1932 yield. But it was only Morris that was able to continue to grow.
To see who is closest to Morris in Year over Year (YoY) change we can take a look at the slope graph presented earlier and call out the second most gradual slope between 1931 and 1932.
Okay, not we see you, Duluth.
To visualize that we color both Duluth and Morris in the YoY change area chart. By presenting the data this way, we can see the both Duluth and Morris had about the same YoY change between 1931 and 1932. It's just that Duluth's 1931 number weren't quite as good as Morris's.
To continue down this pattern of expanding and then focusing our scope, let’s look more closely at the YoY change between 1931 and 1932.
Here we see every site took a hit in 1932, and it’s not only Duluth that has a similar negative slope as Morris, but also GrandRapids. How about we make one last modification to compare percent change instead of yield amounts. Now we can better compare sites' relative change from an equal baseline.
If there were a race to be run in 1932 Morris would’ve been considered the winner by a long shot. Bravo Morris!
To close, I want to create a view that shows us how sites yield compares to one another while intuitively communicating how the direction of change from 1931 to 1932.
With this view, we can see how Morris stacks up to the others regarding yield.
This article tutorial provides an example of how Pandas & Matplotlib (in Python) can be used for data exploration, data visualization and analysis.
Exploratory Analysis of smoking any tobacco product among adults
# Downloading data for -- smoking any tobacco product among adults
download.file("http://apps.who.int/gho/athena/data/GHO/WHOSIS_000012.csv","WHOSIS_000012.csv");
# Reading file into the R dataset
tob_15_and_above = read.csv("WHOSIS_000012.csv");
# Previewing dataset head(tob_15_and_above)
# Taking only the following columns for Analysis
Country,Sex,Year,Region,Numeric
tob_15_and_above_red = tob_15_and_above[,c(2:5,8)]
# Removing the entries that do not have any Percentage Value and a non numeric value
tob_15_and_above_red = tob_15_and_above_red[is.na(tob_15_and_above_red$Numeric)==FALSE,]
# Analysis of males for Year 2009 only and having a % > 0
tob_15_and_above_red_male = tob_15_and_above_red[tob_15_and_above_red$SEX=="MLE" & tob_15_and_above_red$YEAR==2009 & tob_15_and_above_red$Numeric>0,]
# Ordering by Region
tob_15_and_above_red_male = tob_15_and_above_red_male[order(tob_15_and_above_red_male[,4]),]
# Plotting of Smoking % data by Region for Males
par(mfrow=c(1,2)) plot(tob_15_and_above_red_male$Numeric,col=as.numeric(tob_15_and_above_red_male$REGION), Â Â Â Â pch=19,ylab="%") plot(1:20,type="n",xaxt="n",yaxt="n",xlab="",ylab="") legend(1,20,col=unique(as.numeric(tob_15_and_above_red_male$REGION)),legend=unique(tob_15_and_above_red_male$REGION),pch=19)
# Plotting of Smoking % data by Region for Males
 par(mfrow=c(1,1)) plot(tob_15_and_above_red_male$REGION,tob_15_and_above_red_male$Numeric,col="blue",     pch=19,ylab="%",xlab="Region")
# Create a Training and Test Data Set by randomly selecting the data entries
trainSamples <- sample(1:dim(tob_15_and_above_red)[1],size=(dim(tob_15_and_above_red)[1]/2),replace=F)
tob_15_and_above_red_trn = tob_15_and_above_red[trainSamples,] tob_15_and_above_red_tst = tob_15_and_above_red[-trainSamples,]
#Checking to see the distribution of SEX in the training dataset so that there is a proportional amount of males and females
table(tob_15_and_above_red_trn$SEX)
#Checking to see the distribution of SEX in the test dataset so that there is a proportional amount of males and females
table(tob_15_and_above_red_tst$SEX)
# Checking analysis of variance to determine the contributing covariates
summary(aov(tob_15_and_above_red_trn$Numeric ~ Â Â Â Â Â Â Â Â Â Â Â Â Â as.factor(tob_15_and_above_red_trn$SEX)+ Â Â Â Â Â Â Â Â Â Â Â Â Â as.factor(tob_15_and_above_red_trn$REGION)+ Â Â Â Â Â Â Â Â Â Â Â Â Â as.factor(tob_15_and_above_red_trn$COUNTRY)+ Â Â Â Â Â Â Â Â Â Â Â Â Â tob_15_and_above_red_trn$YEAR))
 #Creating a linear model based on the outcome of Smoking % and covariates of SEX,REGION,COUNTRY and YEAR
lmSmoking = lm(tob_15_and_above_red_trn$Numeric ~ as.factor(tob_15_and_above_red_trn$SEX)+as.factor(tob_15_and_above_red_trn$REGION)+as.factor(tob_15_and_above_red_trn$COUNTRY)+tob_15_and_above_red_trn$YEAR)
#Creating a linear model based on the outcome of Smoking % and covariates of SEX,REGION,COUNTRY and YEAR
lmSmoking = lm(tob_15_and_above_red_trn$Numeric ~ as.factor(tob_15_and_above_red_trn$SEX)+as.factor(tob_15_and_above_red_trn$REGION)+as.factor(tob_15_and_above_red_trn$COUNTRY)+tob_15_and_above_red_trn$YEAR)
#Plotting residuals against fitted to check if there are any prominent deviations
#Normal probability plot of residuals
It is observed that during the use of predict function, the system throws a warning
Warning message: In predict.lm(lmSmoking, newdata = tob_15_and_above_red_tst, interval = "prediction") : Â prediction from a rank-deficient fit may be misleading
 As per Wikipedia http://en.wikipedia.org/wiki/Collinearity
Collinearity in statistics refers to a linear relationship between two variables. On closer examination its found that both the REGION and COUNTRY is used in the linear model. Both these variables have a relationship with each other. Hence dropping the REGION variable and using only the COUNTRY in the linear regression model.
lmSmoking = lm(tob_15_and_above_red_trn$Numeric ~ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â as.factor(tob_15_and_above_red_trn$SEX)+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â as.factor(tob_15_and_above_red_trn$COUNTRY)+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tob_15_and_above_red_trn$YEAR)Â

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming