Hot take: statistical significance is very often misused in (marketing) experiments. 100 years ago someone wrote that we should all follow the magical P-value <0.05 and we are all just still rolling with it. Yes, really. Ronald Fischer wrote in his book “Statistical Methods for Research Workers” in 1925:
“Personally, the writer prefers to set a low standard of significance at the 5 percent point, and ignore entirely all results which fail to reach this level.”
And that’s it! Some gentleman from 1925 liked <0.05 (<5%) as the ideal value and told us to ignore any data that does not meet that threshold.
I’ve seen tons of marketing hypotheses rejected because the data was “insignificant”. In other words; not meeting that golden standard of P < 0.05. It’s even built in as a standard for experiments in ad platforms like Google Ads.
The problem is that it has become binary, and it is binary on a quite arbitrary cut-off point. Instead, it should be a spectrum of how certain you are about your results. Borrowing from the American Statistical Association, would you not take a decision to deploy your experiment if it won, because there’s a 6% probability (not stat sig) versus a 4% probability (stat sig) that your treatment has no effect at all? The reality is that there’s very little difference between P=0.04 and P=0.06. Yet good experiment outcomes are killed in marketing because they are not meeting P<0.05.
Sure, you want to be really, really certain about your experiment when you develop a new medicine to be sure that it’s not going to actually kill people. But senator, we run ads.