Just pointing out two things that many people don’t know about statistics and that I think are helpful when judging a larger body of scientific literature and listening to non-experts like RFK Jr.
First, the term “statistically significant” does not mean “big” or “meaningful”. It means “unlikely due to chance”, where “unlikely” is defined by the researcher, typically as a low-ish threshold like “with a probability of at most 5%”. This is also the threshold that researchers use when they compute a 95% confidence interval, like in the paragraph quoted above.
Second, with a 5% threshold, studies investigating the same phenomenon (like the effect of radio waves on rabbits) have a 5% chance to find a statistically significant effect even if that effect does not exist. As a consequence, scientists don’t regard it as proof of a phenomenon, when one study (or even several) out of a large number of studies finds something to be statistically significant. Instead, they require that this finding is replicated in independent replication studies (ideally ones that were conducted with a pre-registered protocol and a much larger sample).
Exactly. I got the impression from the posts that were here when I made mine that people had never even heard of this being something that has been “debated” in the scientific community but just some random idea RFK Jr had himself.
IIRC there was real worry in the late 90s due to the thermal effect mobiles back then had, which in the 00s transformed into electromagnetism and the blood-brain barrier instead. The thing that really gives away the non-scientific backing for those still trying to push this is that they keep trying to blame newer versions of the mobile networks “4G bad vs 3G!” “5G bad vs 4G!” when in reality every newer network standard has less penetrating energy and also, due to towers being much closer spaced, less transmitting energy overall from the phone.
NMT back in the 80s however? I’d probably be somewhat cautious today tbh.
Yep. Also, famously, a statistics/psychology professor was once quoted as saying the only reason you don’t find a statistically significant difference is because we’re “too damn lazy to drag enough people in.” The larger the sample size, the less of a difference is needed to hit that 5% mark. So if you aren’t “lazy,” you can just add more folks to your study and be more likely to find a ‘significant difference’ that you can then publish.
My statistics professor would rerun experiments that hit the 5% (p<0.05) mark and need it less than a 0.001 or 0.005 just to waggle his dick at others, saying his findings were a lot more reliable than theirs.
I must confess to not understanding your anecdote here. Pure chance might give you a p<0.05 when your sample size is low - but that disappears as the sample size grows larger.
I don’t want to dig out the math figures, because god knows they’re hard enough to scribble freehand, but as you add more samples, the difference between your null hypothesis and sample average shrinks in regards to what establishes a p<0.05. Let’s just use not-real numbers: if a sample of 100 people has a difference of 5 units from the null hypothesis, and has a p value of 0.1, a sample of 10,000 with a difference of .1 unit might have a p value of 0.02. In the quote (that I can’t seem to find now), the essential wisdom to take is that if you dragged in enough samples, you could find a statistically significant difference because your null hypothesis would never be exact, so even the smallest of differences would generate a low p-value. It’s why whenever you see a p-value, you should definitely see an effect size estimate nearby, such as cohen’s D.
Here’s a paper outlining some of this in much better words than I have.
Thank you for the link - that’s a very interesting paper. I’ve taken Statistics twice (two different engineering degrees) and still need to reread that a few times to “get it”!
Just pointing out two things that many people don’t know about statistics and that I think are helpful when judging a larger body of scientific literature and listening to non-experts like RFK Jr.
First, the term “statistically significant” does not mean “big” or “meaningful”. It means “unlikely due to chance”, where “unlikely” is defined by the researcher, typically as a low-ish threshold like “with a probability of at most 5%”. This is also the threshold that researchers use when they compute a 95% confidence interval, like in the paragraph quoted above.
Second, with a 5% threshold, studies investigating the same phenomenon (like the effect of radio waves on rabbits) have a 5% chance to find a statistically significant effect even if that effect does not exist. As a consequence, scientists don’t regard it as proof of a phenomenon, when one study (or even several) out of a large number of studies finds something to be statistically significant. Instead, they require that this finding is replicated in independent replication studies (ideally ones that were conducted with a pre-registered protocol and a much larger sample).
Relevant xkcd, because of course there is one:
Image source: xkcd (no. 882)
Exactly. I got the impression from the posts that were here when I made mine that people had never even heard of this being something that has been “debated” in the scientific community but just some random idea RFK Jr had himself.
IIRC there was real worry in the late 90s due to the thermal effect mobiles back then had, which in the 00s transformed into electromagnetism and the blood-brain barrier instead. The thing that really gives away the non-scientific backing for those still trying to push this is that they keep trying to blame newer versions of the mobile networks “4G bad vs 3G!” “5G bad vs 4G!” when in reality every newer network standard has less penetrating energy and also, due to towers being much closer spaced, less transmitting energy overall from the phone.
NMT back in the 80s however? I’d probably be somewhat cautious today tbh.
Yep. Also, famously, a statistics/psychology professor was once quoted as saying the only reason you don’t find a statistically significant difference is because we’re “too damn lazy to drag enough people in.” The larger the sample size, the less of a difference is needed to hit that 5% mark. So if you aren’t “lazy,” you can just add more folks to your study and be more likely to find a ‘significant difference’ that you can then publish.
My statistics professor would rerun experiments that hit the 5% (p<0.05) mark and need it less than a 0.001 or 0.005 just to waggle his dick at others, saying his findings were a lot more reliable than theirs.
I must confess to not understanding your anecdote here. Pure chance might give you a p<0.05 when your sample size is low - but that disappears as the sample size grows larger.
I don’t want to dig out the math figures, because god knows they’re hard enough to scribble freehand, but as you add more samples, the difference between your null hypothesis and sample average shrinks in regards to what establishes a p<0.05. Let’s just use not-real numbers: if a sample of 100 people has a difference of 5 units from the null hypothesis, and has a p value of 0.1, a sample of 10,000 with a difference of .1 unit might have a p value of 0.02. In the quote (that I can’t seem to find now), the essential wisdom to take is that if you dragged in enough samples, you could find a statistically significant difference because your null hypothesis would never be exact, so even the smallest of differences would generate a low p-value. It’s why whenever you see a p-value, you should definitely see an effect size estimate nearby, such as cohen’s D.
Here’s a paper outlining some of this in much better words than I have.
Thank you for the link - that’s a very interesting paper. I’ve taken Statistics twice (two different engineering degrees) and still need to reread that a few times to “get it”!