Who knows the difference between Type 1 and Type 2 statistical errors???? (Message me if yes) Like I have to write a paper on this shit And I honestly haven't taken stats in 2 years. This is unbelievable. Torture
seen from China
seen from Netherlands
seen from United Kingdom
seen from United States
seen from United States

seen from Russia
seen from United States

seen from Saudi Arabia
seen from Türkiye

seen from Canada
seen from Canada
seen from China
seen from United Arab Emirates

seen from South Korea

seen from Russia
seen from South Korea

seen from Saudi Arabia
seen from Russia
seen from China

seen from Italy
Who knows the difference between Type 1 and Type 2 statistical errors???? (Message me if yes) Like I have to write a paper on this shit And I honestly haven't taken stats in 2 years. This is unbelievable. Torture

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Power failure
There's a new article out in Nature Reviews Neuroscience about the failure of scientific studies in general (and neuroscience and fMRI studies in particular) to adequately power their studies. The NRN paper isn't open access, but you can email the authors for a pre-print. There's a good write-up at National Geographic.
The paper discusses the effect of low powered studies, both in an ideal world and in the world we actually live in. Even in a best case scenario, underpowered studies harm research: "low power, by definition, means that the chance of discovering effects that are genuinely true is low." By decreasing the amount of true positive effects in the literature, low powered studies increase the percentage of false positives among all positive results. Again, from the article:
For example, suppose that we work in a scientific field where one in five of the effects we test are expected to be truly non-null (i.e., R = 1 / (5-1) = 0.25) and that we claim to have discovered an effect when we reach p < 0.05; if our studies have 20% power, then PPV = 0.20 × 0.25 / (0.20 × 0.25 + 0.05) = 0.05 / 0.10 = 0.50; that is, only half of our claims for discoveries will be correct. If our studies have 80% power, then PPV = 0.80 × 0.25 / (0.80 × 0.25 + 0.05) = 0.20 / 0.25 = 0.80; that is, 80% of our claims for discoveries will be correct.
They also discuss the "Winner's Curse". If a study is underpowered, it will be less likely to produce strong effects - but only those studies which produce abnormally strong effects will get published.
To illustrate the Winner’s Curse, suppose that an association truly exists with an effect size that is equivalent to an odds ratio of 1.20, and we are trying to discover it by performing a small (i.e., underpowered) study. Suppose also that our study only has the power to detect an odds ratio of 1.20 on average 20% of the time. The results of any study are subject to sampling variation and random error in the measurements of the variables and outcomes of interest. Therefore, on average our small study will find an odds ratio of 1.20 but, because of random errors, our study may in fact find an odds ratio smaller than 1.20 (e.g., 1.00) or an odds ratio larger than 1.20 (e.g., 1.60). Odds ratios of 1.00 or 1.20 will not reach statistical significance because of the small sample size. We can only claim the association as nominally significant in the third case, where random error creates an odds ratio of 1.60. The Winner’s Curse means, therefore, that the ‘lucky’ scientist who makes the discovery in a small study is cursed by finding an inflated effect.
These are major problems - and of course, we don't live in an ideal world. There is publication bias:
Smaller studies more readily disappear into a file drawer than very large studies that are widely known and visible and the results of which are eagerly anticipated (although this correlation is far from perfect). A ‘negative’ result in a high-powered study cannot be explained away as being due to low power, and thus reviewers and editors may be more willing to publish it, whereas they more easily reject a small ’negative’ study as being inconclusive or uninformative. The protocols of large studies are also more likely to have been registered or otherwise made publicly available, so that deviations in the analysis plans and choice of outcomes may become obvious more easily. Small studies, conversely, are often subject to a higher level of exploration of their results and selective reporting thereof.
In addition to making a compelling case about the danger of low-powered studies, the article also provides a meta-analysis of neuroscience studies showing that, yup, they tend to be pretty underpowered.
The authors identified 730 studies by searching 49 meta-analyses which included them. They then calculated their power by assuming a p-level of .05 and an effect size equal to that found in the meta-analysis that contained the study. They found that the average statistical power was 21%. (For contrast, the 'standard' taught in intro stats classes is 80%.) Interestingly, the studies fell into two groups - 42 low powered meta-analyses with an average of 18% power, and 7 high powered meta-analyses with an average of >90% power.
(The authors admit that these calculations rely on the summary effect sizes reported in the meta-analyses being correct, and agree that it is not an unassailable assumption.)
The authors also looked at specific subfields. In neuroimaging, the median statistical power was 8%, across 461 individual studies contributing to 41 separate meta-analyses. A look at rat studies found that "the median statistical power for the water maze studies and the radial arm maze studies to detect these medium to large effects was 18% and 31%, respectively".
The article finishes up with a discussion of the ethical consequences of underpowered studies, particularly for animal studies and for clinical trials. It also discusses potential solutions, including increasing standards both at the IRB/grant approval stage as well as at publication, pre-registration of studies, incentivizing replication, and open access to data and materials.
Fallacies in Statistics: Economic
For years people have regarded statistics as infallible. When one sees it in a newspaper or a magazine, it is often believed without any question. When the words “experts say so and so are this probable” or “statistics done by such and such universities or such bureaus” are headed without question. True, statistics don’t lie, and though there are some that fabricate the data to suit their agenda like the infamous Greens, most of the time, the statistics coincide with the parameters of the research being done.
And exactly there lies the problem: rarely does anyone look up the parameters of a statistic, especially to research that can change the entire course of a policy that will continue to effect our futures. The parameters are the very things that determine whether a statistic’s validity should be taken seriously or not. If the parameter of a statistic on child molestation included children that cried or felt uncomfortable when their uncle or grandfather touched them on the arm or leg, then that statistic would be an over-exaggeration of the truth of what is going on.
The biggest and most famous example would be the top 1% statistic that protesters in Wall Street bring up over and over again. This statistic states that the top 1% of Americans make millions of dollars, and to get to the nitty gritty of it, only 3% make more than $250,000. The fallacy with this statistic is that it treats this class of top earners as a stable group. But this group is anything but static; it is one of the most mobile groups in the United States. Many of the people that are int he top 1% this year will not be in the top 1% next year, nor will they even be in he top 3% some years later. Further still, rarely any of them would stay in the top bracket at all, and would be sent back to the top middle, or even sometimes the lower class.
The same goes for those in the middle and poor class. Most that are there do not stay there for the rest of their lives, let alone their children generations later. The economic mobility in the United States is much higher than any other nation than in the world, past and present. Compared to eras, the economic mobility in the U.S. was much higher early 20th century than it is now. The only difference? The markets were not as regulated then, unlike their disposition now.
Another famous study is one that says the income gap between the middle class and the rich class is wider in the United States than it ever was anywhere else in history. Never, though, has it been mentioned that very few in the world, at so little points in human history, had a middle class that was more than 10% of the population ever existed. In fact, for most of the world, there was an average of 3-4% of the middle class that was there, while the rest of the multitudes lived in abject poverty, while those on top lived lavish lifestyles as a result of exploitation. It has never been looked upon that in the United States the well being of those who are poor and middle class are better off than they ever were anywhere in the thousand years of humanity. Using these statistics, bombasts in Congress, mostly made up of the Left, constantly chooses policies that promotes the poor be brought to a lower standard of living just as long as the rich are not richer.
But that is not the way to make wealth in a nation. That is not a way to provide for the economic and social ascendancy of a people from out of the pits of poverty to the streets paved by wealth and innovation.
But lastly, the third most famous statistic is the inequality of women’s jobs. Over and over again this study is thrown carelessly in classrooms and in university halls. That study, too, does not account for the number of years worked and the other activities some women hold they can do better than any man alive. In truth, if there is a completely equal number of years, hours, and job capabilities, there are many instances where women get paid higher than men. The only difference is some women hold it their responsibility to take care of the home and the children because leaving it to the hands of incapable men would be, to say the least, disastrous.
The harm done by tests of significance
An interesting article from Accident Analysis and Prevention from 2004 goes over three case studies where Null Hypothesis Significance Testing may have cost lives.
Case 1: Right Turns on Red
Looking at the data in Table 1, persons without training in statistics would think that after RTOR was allowed, these intersections were somewhat less safe. However, the consultant concluded, quite correctly, that the change was not statistically significant. The Commissioner of the Virginia Department of Highways and Transportation sent the consultant’s report to the Governor and in the letter of transmittal wrote: “we can discern no significant hazard to motorists or pedestrians from implementation of the general permissive rule (i.e. of RTOR). No significant increase in traffic crashes has been noted following adoption of right-turn-on-red in any state including Virginia”. Obviously, there was miscommunication. In English ‘significant’ means ‘having or likely to have considerable influence or effect’; the synonym of ‘significant’ is ‘important’. In statistics ‘not’ significant’ means that the data is insufficient to reject the (null) hypothesis of ‘no effect’. Thus, the consultant said one thing and the Commissioner transmitted something entirely different.
... And so the sequence of small studies all pointing in the same direction but with statistically not significant results continued to accumulate, till that last study which I followed was published in 1983. While 287 crashes to right turning vehicles were expected, 313 were counted. The authors concluded, once again, that there was no significant difference in vehicular crashes.
...The problem is clear. Researchers obtain real data which, while noisy, time and again point in a certain direction. However, instead of saying: “here is my estimate of the safety effect, here is its precision, and this is how what I found relates to previous findings”, the data is processed by NHST, and the researcher says, correctly but pointlessly: “I cannot be sure that the safety effect is not zero”. Occasionally, the researcher adds, this time incorrectly and unjustifiably, a statement to the effect that: “since the result is not statistically significant, it is best to assume the safety effect to be zero”. In this manner, good data are drained of real content, the direction of empirical conclusions reversed, and ordinary human and scientific reasoning is turned on its head for the sake of a venerable ritual.
Case 2: Paved shoulders on rural roads
Once again common sense and statistical ritual point in opposite directions. The figures show that, e.g. after a two-foot paved shoulder has been added, the crash rate has declined for all crash types and all severities. Therefore, ordinary reasoning would lead to the conclusions that paving shoulders has reduced crashes. And yet, because of the paucity of the data, none of these reductions proved statistically significant. But quasi-science wins again; and so, in their Conclusion section the authors write:
The study could not discern any statistically significant differences in either crash rate or severity rate between two- and four-foot shoulder installations. Unless (other) benefits … are considered important to practitioners, this study does not show the increased construction cost of four-foot shoulders on state routes to be justified by an increase in traffic safety (p. 37).
Case 3: Speed Limit Increases
The two above cases could be seen as researchers failing to appropriately communicate their findings to lawmakers. In Case 3, we see researchers themselves misusing NHST to deadly effect:
Table 3. Predicted percentage increase in the number of fatal crashes attributed to the speed-limit increases on rural interstates (from Balkin and Ord, p. 10, Table 3)
State First % (1987) Second % (1995)
Alabama 0.0 24.8
Arizona 41.0 0.0
………
Missouri 13.0 42.2
Nebraska 35.5 0.0
………
West Virginia 46.2 0.0
Wisconsin 24.3 0.0
It is obvious that 0.0 is not the best estimate of the change in fatal crashes in all these instances. Why the authors decided to enter 0.0 can perhaps be understood from the numerical example by which they explain their method. In their paper there is a graph of the monthly time series of fatal crashes from 1975 to 1998 for rural interstates in Arizona and, referring to this graph, the authors say (p. 6) that:
“We see a significant increase in the level around 1987 but none around 1995. … Statistically it is estimated that the 1987 speed-limit increase resulted in a 41% increase in rural interstate crashes an Arizona. There is no statistical evidence that the 1995 speed-limit increase has any additional effect on the number of crashes.”
That is, failure to reject the null hypothesis of zero effect at the 10% level of significance was equated with the absence of statistical evidence for an increase in the expected number of crashes. In all these cases, 0.0 was entered in the table. Thus, the table contains two kinds of entries: either estimates of percentage change when the increase was statistically significant, or 0.0 by NHST convention but unsupported by either data or prior-knowledge when the increase was not statistically significant.
The article is behind a paywall. Feel free to message me for a copy of it.
Correcting the Lies that Data Tell
Found this interesting paper from a couple years ago: Detecting and Correcting the Lies that Data Tell by F Schmidt.
I won't pretend to understand the meat of the paper, where Schmidt argues that correcting for sampling error and measurement error will decrease variability (got that) which strengthens correlations (???). However the author makes some other interesting points further along in the paper. There's a useful section on sources of error:
My initial example illustrates the two artifacts that are always present in any literature. But there are others that are often, but not always, present, such as data errors, range restriction, dichotomization of measures, and imperfect construct validity. Data errors—typos, coding errors, transcription errors, etc.—have been shown to be very prevalent (Hunter & Schmidt, 2004, pp. 53–54). This is a nonsystematic source of variability, like sampling errors. Unless they result in extreme or impossible outliers, data errors are hard to identify and therefore difficult or impossible to correct.
Unlike data errors and sampling errors, range restriction is a systematic artifact. Range restriction reduces the mean correlation. Also, variation in range restriction across studies increases the between-study variability of study correlations.
... Another artifact is caused by dichotomization. Researchers often dichotomize continuous measures into “high vs. low” groups. This practice not only loses information but also lowers correlations and creates more variability in findings across studies (Cohen, 1983; Hunter & Schmidt, 1990a, 2004; MacCallum, Zhang, Preacher, & Rucker, 2002). The distorting effects of dichotomization are correctable in a meta-analysis.
The final additional artifact I want to mention is imperfect construct validity in measures. Even after correction for measurement error, the measure may correlate less than perfectly with the desired construct (Schmidt, Le, & Oh, 2009). This is especially true when proxy measures are used (for example, use of education as a proxy for general mental ability). Degree of construct validity may vary across studies, causing between-study variability and typically lowering the mean. Correction for this requires special information, is complicated, and is often not possible (Hunter & Schmidt, 2004).
The authors also do their own meta-analysis of the use of fixed-effects vs random-effects models in the literature:
Fixed effects (FE) meta-analysis models assume a priori that there is only a single population parameter underlying all studies. That is, FE models assume that all variation across studies is due to solely to sampling error and that therefore none of the variation is due to real differences between studies in underlying parameters.1 This a priori assumption is highly questionable in most cases. RE models, by contrast, treat this assumption as an hypothesis and test it—allowing the researcher to see whether or not all variance is accounted for by sampling error and other artifacts.
... My colleagues and I recently examined the meta-analyses in this journal (Schmidt, Oh, & Hayes, 2009) and found that a total of 199 meta-analyses were published in Psychological Bulletin between 1978 and 2006. Of the 169 that could be classified as either FE or RE models, 79% (129) used only FE models.Figure 7 shows these findings.
A reanalysis of data from five of these FE meta-analysis studies (containing a total of 68 different meta-analyses) showed they seriously underestimated the width of the CIs they reported by an average of 52%. That is, the CIs were only half as wide as their real width, a gross overestimation of the precision of the findings. On average, the CIs reported as 95% CIs were actually 55% CIs (Schmidt, Oh, & Hayes, 2009).
What about corrections for measurement error? We found that 180 of the 199 published meta-analyses (90%) did not correct for measurement error—which, as noted earlier, is always present! Nor did they correct for the other artifacts I discussed. Figure 8 shows these findings.
I'm going to do a bit of digging in the references and see if I can find a decent explanation for how the authors' adjustment for measurement error worked. If I find anything useful, I'll update this post.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Misuses of models overstates significance
A brief comment points out a major flaw in a published study, in part by using simulations:
Our reanalysis illustrates two important points. First, studies of complex trait genetics require very large sample sizes to be well powered, and a study reporting a highly significant association from such a small sample size should elicit skepticism. Second, unnecessarily complex statistical methods can obscure rather than reveal the truth. The simplest analysis is sometimes the best.
Overfitting
I found this 2004 article through the footnotes of Nate Silver's book. Here are the highlights (but I recommend reading it all the way through.)
There has been something of a revolution in data analysis in the past 10 or so years. Modern computational power has not only made it easier to solve complex and large analytic problems but also allowed us to study, through a technique called simulation, the very act of collecting data and performing analyses... These simulation studies have taught us a great deal about the scientific merit of some of our conventions in data analysis and also have pointed toward new directions that may improve our practice as researchers.
The author, Babyak, draws attention to the problem of overfitting, detailing a simulation which shows that one needs 13 samples per predictor, if not more. Babyak also cites a 1991 paper which suggests a base of 50 samples plus 8+ more per predictor, but warns that even large sample sizes may be compromised if effect size is small or if predictors are correlated.
The author calls out four problematic methods/issues:
Automated Stepwise Regression: An automated way to shift through many predictors and select only the best combination. Babyak warns this practice should never be acceptable.
Univariate Pretesting or Screening: Similar to the above, only in this case one manually looks for correlations between predictors and responses. This explicit p-value testing is not the only place where researchers unwittingly expend degrees of freedom. "Faraway demonstrated that these phantom degrees of freedom actually arise in all sorts of unexpected places, such as examining residuals for homogeneity of variance, testing for outliers, or making transformations to improve power, to name a few, underscoring the principle that virtually any data-driven decision about modeling will lead to an overly optimistic model."
Dichotomizing Continuous Variables: "The common practice of dichotomizing 2 continuous variables and using them as factors in an ANOVA will yield an unacceptable Type I error rate when those 2 original variables are even moderately correlated. Because ANOVA is just a special case of the general linear model, this problem also will haunt us in the multivariable regression situations."
Multiple Testing of Confounders: Just as with other forms of multiple comparisons, this is using degrees of freedom. "The real problem here is that unless you have been very careful to account for expended degrees of freedom, you will not have any way of knowing the extent to which the apparent confounder is a real confounder or just caused by the play of chance sampling."
The fixes offered are mostly straightforward: Collect more data. Combine predictors into an index, if the predictors are related. If a predictor is well known, model it as a constant. And finally, Babyak advises adjusting your result using shrinkage techniques.