Awkward correlations: holidays are statistically rainier
My colleague always complains about the fact that it rains all the time on week-ends or when she is on vacation. This is statistically significant. At least for the last 60 days.
What I hear every Monday
Every Monday morning, same song: âit rained all week-end long, I couldnât do anything out.â My colleague thinks she is cursed, as every time she is off on vacation or on week-ends, the weather gets bad.
We decided to check if this curse is statistically significant. We asked the question âare my colleagueâs days off significantly associated to bad weather?â
Collecting the data
We filled in the following table day after day, during 60 days:
Another way to formulate our question are the âNoâsâ in the Working day column often associated to Badâsâ in the Weather column?
Working day and Weather are nominal variables, also referred to as qualitative variables. The values they take come from sets of categories (Yes/No; Good/Bad). They are not numbers.
A compact and informative way to summarize the association between two nominal variables is to use a contingency table, which places the categories of one variable in rows and the other one in columns:
The core of the table contains counts of days corresponding to the combination of categories. For example, over the 60 assessed days, there are 10 non-working days with bad weather.
It may be more interesting to visualize it as a row-percentages table:
Interesting: 71.4% of non-working days are associated to bad weather. Letâs go further and try to see if there is an association between working (yes/no) and weather (good/bad).
Investigating the significance of the curse
Letâs put things properly in a statistics language: we want to evaluate the association of ( = correlation between) two variables. Two qualitative variables. According to this grid, a way to assess the correlation between qualitative variables is to run a chi-square (Ď²) test on the contingency table.
We ran the test and obtained a p-value of 0.03. If we compare this p-value to a 0.05 risk threshold of being wrong, we may say that there is a statistically significant correlation between weather and working days. By saying this, we are taking a low (3%) risk of being wrong.
Design by freepik
Conclusion: days off, bad weather
This conclusion holds for the 60 last days, at least. It is not necessarily true for all days in life (unless these 60 days are representative of days in general).
Do days off encourage the weather to be bad?
Significant correlation does not necessarily imply causality. This result does not let us state that days off induce bad weather or that bad weather induces days off. It simply says that there is some kind of a significant association between working days and weather (with a quantified risk of being wrong).
Now letâs get back to work and count the hours until next week-end. Nice weather outside!
Statistical analyses were done using the XLSTAT software











