Because there are multiple levels of definitions of "science," people with very diverse levels of education and intellectual development can legitimately call themselves "scientists," and many do. Unfortunately, many of these "scientists" feel the need to conduct scientific research, even though they may lack the education and/or the intellect to do so. They are a "scientist" and therefore it seems that any research they do is, by definition, "scientific" research. In many cases, I'm not sure an understanding of statistics and experimental design is even expected.
Like everything else, there are good scientists and bad scientists, good research and bad research, even "good science" and "bad science". People (even scientists, "scientists", researchers and "researchers") don't operate in vacuum - it all boils down to scrutiny. Normally corporate management, the academic food-chain, market forces or peer-review would/should weed out most of the folly.
As I said, that's what it is to me. I've never encountered anything different in practice, but that's just within my own narrow scope. I'm sure there are many other things in practice that I've never been involved in. If people have been using t-tests and p-values for QA or QC or similar activities, I know nothing of this. I was responding strictly to the statement that scientific journals were "moving away from" p-values. If they are moving away from them for QA or QC, that's another matter entirely, and outside of my scope.
In that context "moving away from" doesn't mean just stopping to use something without suggesting an alternative. It means that instead of hypothesis tests and p-values there are now other concepts and methods.
Again, the context in which "moving away from" p-values was cited was in scientific journals. If an alternative was suggested, it would seem to have been "preference toward Statistical Process Control," but that doesn't make sense to me as an alternative one is likely to find in scientific journals. And really, journals can't move away from p-values, because p-values don't come from journals. They come from the people who design the studies and analyze the data in the articles submitted to the journals. Peer reviewers just review them.
"We have little evidence on the effectiveness of peer review, but we have considerable evidence on its defects. In addition to being poor at detecting gross defects and almost useless for detecting fraud, it is slow, expensive, profligate of academic time, highly subjective, something of a lottery, prone to bias, and easily abused." -- Richard Smith, MD, former editor of the British Medical Journal Peer review: a flawed process at the heart of science and journals
I don't think science makes anything work. I think it provides information that people can use to make things work. If things don't work, it could be because the information that science provided was wrong, but there are lots of other reasons for things not to work, including how people use (or misuse) the information provided by science.
I also don't think science defines "works." That is the domain of quality. Depending on how you define it, the same stuff working in the same way could be said to work 0% of the time, 100% of the time, and everything in between.
The American Statistical Association - the premier professional organization for statisticians - has come out against the p value. YES, there is an alternative, in fact there are several: start by reading Deming's "On Probability as a Basis for Action". It's free. My resource which started this whole thread also describes a powerful alternative graphs and probability.
I and my 'students' have solved hundreds if not thousands of very complex problems and never calculated a p value or performed a null hypothesis test. We have applied the same approaches to new product development - quite successfully. (caveat: we do have to report p values to our regulatory agency, but you know, it's the government...)
First understand that the whole null hypothesis a p value thing is a ritual: actions one take without thought just because it's always done that way.
The whole approach came about by mashing together the disjointed thoughts of two diametrically opposed statisticians: Fisher and Pearson.
What the p vlaue is NOT:
•The probability that there is no difference
•The probability that there is no effect due to the suspected cause
•The probability that the observed difference was produced by simple chance
•The probability of getting the observed difference if there really is no difference
The p value is the probability of results that conflict with the assumption of no difference by as much as or more than the observed results, IF all of the assumptions were true.
A low p value indicates the probability that at least one of the assumptions is not true
•No real difference exists
•The data are homogeneous
•The selected distributional model is correct for the data
•The test statistic was correct for the data
•The data were random: the trials were not confounded or biased
You see the assumptions (requirements) matter. Simply saying the p value is less than .05 (a limit that Fisher pulled out of his back pocket with little to no thought at the dawn of statistics as a profession) without detailing the study design, including the sample sizes, and the underlying science is tantamount to scientific malpractice.
As a friend of mine once said: "Statistics without physics is gambling. Physics without statistics is psychics" The lack of appropriate study designs and relying the mythological p value is what results in coffee being bad for you today and good for you tomorrow.
I come back to...for me, it is all about generalizing from a sample to a population.
Whatever sample was used as the basis for this generalization, it was not representative of the whole population of users (or uses) of "the whole null hypothesis a p value thing." I'm inclined to think it is representative of a large population, probably one that is defined by the sample (rather than the sample being selected from a defined population).
As Ronen E has noted, deep thinking has never been too popular, which to me means it is something else that never "caught on," i.e., remained practiced in isolated pockets, rather than becoming widespread. There are certainly those practitioners who give deep thought to the experimental designs and statistical techniques they choose to address their objectives, rather than doing it the way it's always been done. I fear they are rather few and far between, though.
I will add that I think Quality requires exceptionally deep thinking, where cookbook QA and QC, like cookbook everything else, do not. That is pretty much the whole point of a cookbook.
Ah but what you are ignoring is that it isn't that at all. There is no generalization going. With very few exceptions in scientific endeavors or in industrial quality we rarely just try to describe a population from a sample (what Deming called an enumerative study). What most of us attempt to do most of the time is to predict. (what Deming called an enumerative study). In science opinion doesn't matter at all; perhaps in politics, but not in science. Science will always win. so I am not really sure of the point you are trying to make. Certainly there are many people who choose to not learn or think but the masses are no excuse for not doing the right thing.
In most cases the "QA professional" is only required to apply (or put into use) techniques that rely on complex mathematical theory. True, some understanding is required for proper selection and implementation of these techniques, but not necessarily the complex mathematical foundations. Just like engineers many times successfully (and correctly) apply techniques that they are unable to fully understand the mathematical derivation of - these are sometimes simply too complex to practically master, and it's also unnecessary from an outcome perspective. The most important point is to not lose sight of the techniques limitations and underlying assumptions. Letting it slip is of course too easy - one has to actively and stubbornly fight for the maintenance of the latter, and this is were we usually fail.
Another issue is failing to recognise that "QA professional" doesn't equal "Jack of all trades". "People of very diverse levels of education and intellectual development" should not be drawing up experimental designs based on higher-than-undergrad-level statistical theory. Professional statisticians are there for that. Just like the average "QA professional" consults the plastics expert when they have issues with a plastic raw material, rather than diving into datasheets and chemical formulations. So maybe the problem is in the formal scoping (and internal classification) of the QA profession.
In the first paragraph you say that "QA professional" must only apply techniques without necessarily understanding the foundations of them, then in the second paragraph you say, ""People of very diverse levels of education and intellectual development" should not be drawing up experimental designs based on higher-than-undergrad-level statistical theory. Professional statisticians are there for that. " Leaving aside the apparent contradiction, you seem to think that the widespread practice of plugging numbers into Minitab and accepting the results uncritically is an acceptable state of affairs. If you're pretty sure that something works without a clue as to why it works, sooner or later something bad will happen.
In any event, the fact is that people in QA are expected to use statistical analysis without any indication that the expectations are based in reality. Just do it, as they say.