Deming vs. Statistical Hypothesis Testing

Marc · Jul 3, 2000

Newsgroups: misc.industry.quality
Subject: Re: Statistical Hypothesis Testing
Date: Mon, 17 Apr 2000 01:33:56 GMT

Greetings John,

You said:
> If you have the time could you
> indicate roughly what
> Deming's main objections were?

Deming said: "The student should avoid passages in books that treat confidence intervals and tests of significance, as such calculations have no application in analytic problems in science and industry." (W. Edwards Deming, Out of the Crisis, page 639.)

Deming was advocating a doctrine, still current among statistical/management gurus, that distinguishes between analytic methods and enumerative methods. I can't make sense of it myself, even though I have tried. Advocates of that doctrine classify statistical hypothesis testing as an enumerative method - the kiss of death. I consider the analytic versus enumerative thing to be a false dichotomy.

Also in "Out of the Crisis", Deming said: "Analysis of variance, t- test, confidence intervals, and other statistical techniques taught in the books, however interesting, are inappropriate because they bury the information contained in the order of production." (W. Edwards Deming, Out of the Crisis, page 132.)

Lastly, in "Out of the Crisis", Deming said: "... But a confidence interval has no operational meaning for prediction, hence provides no degree of belief in planning." (W. Edwards Deming, Out of the Crisis, page 132.)

I can't refrain from writing one last quote, this one from Ernest Hemingway: "In order to be a great writer a person must have a built- in, shockproof crap detector.

Sincerely, Stan Hilliard

=============
In article xxxx,
"John Duffus" wrote:
> This is very interesting. If you have the time could you indicate roughly
> what Deming's main objections were? What did he propose in it's place?
> I can see that it could be misused if the user does not have a clear idea of
> the difference between statistical significance and practical
> significance,or how to gather the data, but that is not the fault of the the
> technique.
> Regards
> John Duffus
> shilliard wrote
> > Greetings Kelly,
> >
> > You said:
> >
> > > Not many other engineers use
> > > this tool and I wonder why not.
> > > Is it that it is not a well-known
> > > technique or is it that it is
> > > just not well understood.
> >
> > Hypothesis testing was widely used up to about 20 years ago. Then along
> > came W Edwards Deming to tell the world to stop. So they stopped.
> >
> > I am an engineer, retired after a long career using and teaching
> > applied statistical methods. I believe that hypothesis test methods are
> > too important to engineering work to be left to the statisticians. I am
> > so inspired by the power of hypothesis testing and its derivative tools
> > for engineering, manufacturing, process improvement, and quality
> > assurance that I have continued working on my own niche, which you can
> > see at
> >
> > www.samplingplans.com
> >
> > I have seen a dumbing down of hypothesis testing education and
> > practices over the last 20 years. It started with the attack on
> > hypothesis testing and its derivative methods by W Edwards Deming and
> > his followers. I think that the situation can be turned around if
> > engineers learn to view hypothesis testing as an engineering method -
> > rather than an import from "statistics". A person well grounded in the
> > engineering heuristics paradigm can make much more successful use of
> > hypothesis testing in his/her work than a statistician can.
> >
> > Sincerely, Stan Hilliard
> > CQE,CQA,CRE,PE
> >
> > ===========
> > In article xxx,
> > "Kelly Speiser" wrote:
> > > I am an avid user of statistical hypothesis testing for helping me to make
> > > decisions in quality. I use it for deciding if a corrective action worked,
> > > should I recalculate control limits on a process (or not), examining the
> > > differences between processes, vendors and much much more. I think
> > it's
> > > great and have written a book about it.
> > >
> > > Not many other engineers use this tool and I wonder why not. Is it that it
> > > is not a well-known technique or is it that it is just not well understood.
> > > I'd like to hear from those that use it and those that don't use it and
> > > their reasons.
> > >
> > > Thanks for the discussion.
> > > --
> > > K. S. Speiser

Marc · Jul 3, 2000

From: shilliard
Newsgroups: misc.industry.quality
Subject: Re: Statistical Hypothesis Testing
Date: Mon, 24 Apr 2000 04:01:31 GMT
Organization: Deja.com - Before you buy.

Greetings John,

Here is a snippet from page 132 of "Out of the Crisis" that might clarify what Deming meant by analytic techniques.

"...analytic problems - planning for improvement of tomorrow's run, next year's crop"

Later, on the same page, he continues his attack on hypothesis tests by criticizing the concept of statistical significance:

"Degree of belief cannot be quantified as 0.8, 0.9, 0.95, 0.99. So- called probability levels of significance between method 1 and method 2 do not provide any measure of degree of belief for planning -- ie., for prediction."

MY ANALYSIS -- The level of significance (alpha) is the complement of the numbers Deming lists: (1-0.8=0.2), (1-0.9=0.1), (1-0.95=0.05), (1- 0.99=0.01).

More importantly, I believe that Deming's comment about the concept of significance is a red herring. His "degree of belief" is vague whereas "statistical significance" is a precise scientific concept. Significance is the probability (alpha) of a type 1 error -- the rejection of a null hypothesis (H0) when it is true.

I think that the most precise way to describe statistical significance is with an "if-then" statement. That is, IF the null hypothesis is true, and you were to perform the hypothesis test repeatedly (using a new sampling of data from the same population each time), THEN H0 would be rejected alpha proportion of the time.

MY CONCLUSION -- This is not a trivial issue because Deming and his disciples have influenced the management of many corporations in this matter, who in turn influence what training is available to their engineers.

Deming apparently invented his own theory of enumerative versus analytic studies and used it to explain what was wrong with hypothesis testing. I don't think he provided any data to support his claim. I don't think he demonstrated his point.

Sincerely, Stan Hilliard

=========
In article ,
jduffus wrote:
> Thanks Stan.
> Stranger and stranger. I am going to try to borrow a copy of "Out of
> the Crisis" to see what the context of Deming's statements was. Do you
> have a reference to a work that expounds this analytic/enumerative
> doctrine?
> Regards, John Duffus
>

> In article ,
> shilliard wrote:
> > Greetings John,
> >
> > You said:
> > > If you have the time could you
> > > indicate roughly what
> > > Deming's main objections were?
> >
> > Deming said: "The student should avoid passages in books that treat
> > confidence intervals and tests of significance, as such calculations
> > have no application in analytic problems in science and industry."
> > (W. Edwards Deming, Out of the Crisis, page 639.)
> >
> > Deming was advocating a doctrine, still current among
> > statistical/management gurus, that distinguishes between analytic
> > methods and enumerative methods. I can't make sense of it myself, even
> > though I have tried. Advocates of that doctrine classify statistical
> > hypothesis testing as an enumerative method - the kiss of death. I
> > consider the analytic versus enumerative thing to be a false dichotomy.
> >
> > Also in "Out of the Crisis", Deming said: "Analysis of variance, t-
> > test, confidence intervals, and other statistical techniques taughtin
> > the books, however interesting, are inappropriate because they bury the
> > information contained in the order of production." (W. Edwards Deming,
> > Out of the Crisis, page 132.)
> >
> > Lastly, in "Out of the Crisis", Deming said: "... But a confidence
> > interval has no operational meaning for prediction, hence provides no
> > degree of belief in planning." (W. Edwards Deming, Out of the Crisis,
> > page 132.)
> >
> > I can't refrain from writing one last quote, this one from Ernest
> > Hemingway: "In order to be a great writer a person must have a built-
> > in, shockproof crap detector.
> >
> > Sincerely, Stan Hilliard

Don Winton · Jul 5, 2000

Personally, I find it hard to believe Dr. Deming actually objected to significance tests. He was, after all, a trained statistician. He also advocated having at least one trained statistician on the staff of organizations (I read that somewhere; do not remember where). But, just as any tool used incorrectly can ruin the effort (a srewdriver as a chisel, for example) significance tests can ruin information (t-testing on a 10 second cycle mold machine in a production environment).

During the days when the modern quality movement was born, virtually everyone saw statistical quality control as a savior. Everyone and their brother (sister) wanted everyone else trained in SPC. The problem was, while everyone knew it worked, few had any idea WHY or HOW it worked. Thus, charts and graphs were everywhere. People begin applying these concepts to everything from production runs to 1st article manufacturing. The right tool for the right job?

I guess what I am saying is this:

Significance tests could be used just as effectively as the <insert SPC flavor of the month> chart. But, in which would you rather have your employees trained? IMHO, a lot of employees in the SPC branch and a few (or one) in the statistical branch.

An example may be this: While qualifying a machine for production, a first run sample of 20 parts were produced and determined to have an acceptable process capability, say for example 1.0. This particular machine was an injection molding machine using a four-cavity mold. All was declared well and good with the world. But, shortly after entering production, the charts were all over the place. Cycle time was adjusted, material changed, temperatures adjusted, but to no avail. The original 20 parts were measured again. Sure enough, the data were the same.

The staff quality engineer happened to be in the lab and asked, "Is there a difference between the cavities and/or the cycle data?" This question was treated with all the enthusiasm of Oliver Twist asking for more soup. "A little, but not much," was the reply. The data were presented to the quality engineer and examined. While the data between cycles were not significant, the data between cavities were!

The above story has flaws, but it presents a point. Without significance testing, used correctly, SPC may not be adequate.

Deming was trying to deliver his message to the masses, not just a few specialists. Significance testing was for the specialists, thus not the audience of his broad message.

One other thing I would like to comment on. There are those that take Deming's words as the absolute truth. I am a Deming disciple, and even I know this is not so.

Regards,

Don

Kevin Mader · Jul 6, 2000

Don,

I believe that Deming made his comment about having a statistician on board a number of times, I can recall that it was in both Out of the Crisis and The New Economics. As you are well aware of, he always promoted learning from a Master and not a hack.

I have also wondered why as a Doctor of Statistics that Deming's comments about rejecting Tests of Hypothesis. I must admit that when I studied advanced statistics, the concepts of using One-tail/two-tail tests seemed pretty reasonable. I still think that there is plenty of value in these types of tests. However, after reading a paper by John Kitteridge on the differences of Analytical Problems and Enumerative Studies, I realized what was missing from the examples provided by Juran in his books. They did not speak about the system, the method, or predictability. In general, I believe that the masses are exposed to the superficial 5% of anything and begin to believe it, regardless of what the other 95% of the information speaks of. We are eager to jump to conclusions and Deming was aware of this folly.

With this in mind, when I read the areas of Out of the Crisis noted in Marc's post, I keep in mind that Deming was probably making these statements realizing that the masses do not have the knowledge or understanding of the essence behind the curves and tails. I know I have a limited understanding. I also imagine that a PhD in Statistics probably is probably in a better position than me to make the assessment. I am a Hack by Deming's standards (but happily a work-in-progress)!

Here is something that always gets a nerve: baseball statistician. Could there be a more severe bastardization of the term statistician? Baseball statistics are enumerative. The furthest I would go is to say that they marginaly satisfy the Law of Averages. Yet folks trust these numbers beyond their worth. Baseball manager's make decisions based on the numbers. What they are doing is tampering. Some folks make desk top calanders such as the one I have. Filled with tidbits of information, none of which has predictive powers. Fun: yes. Usefull: not very. But errors such as this type of interpretation are rampant in most walks of life. People do not really now much about a claim in the news stating that crime is down 10%. What changed? What test would one use to validate the results? Not enough is known in most cases, and we hastily jump to conclusions. It may be safe now to leave your house unlocked. Do you suppose? Who will win the election? What does the data support. How about the latest Census? I believe Deming may have been disappointed if he were still living.

Your advice about balance and caution is as usual, sound. Balance in most everything I suppose is a good rule of thumb. Well, enough of my ramble, plus it is time to go to bed.

Regards,

Kevin

bevdaniels · Jul 6, 2000

Well, Deming was not particularly well known for his ability to articulate his thoughts very clearly in written form. Only Taguchi was worse - and he had the whole English-Japanese thing going against him : )

I'm not much better, but I'll take a shot at it. Disclaimer/Qualifier: While I have not studied Deming in great depth, I have a lot of experience in hypothesis testing in various manufacturing environments.

I think there are several points to be made about "significance" and "confidence" levels as used in t-tests, f-tests, etc.
First: the usefullness of these levels are dependent on the correct assumption of the underlying distribution AND the assumption of independence of the tested samples. Deming (and many others) did not feel that you could rely on a Distribution. I believe he followed Fisher's (Father of DOE)lead in recommending and using basic probability concepts: Combinations and permutations which are distribution free - also known as non-parametric statistics. In my experience the use of non-parametric statistics (particularly the Tukey-Duckworth test) has much more practical usefullness determining root cause or large changes. It's not so great at small changes - for this I would use a run chart/multi-vari to show the shift or trend over time. Then I'd apply basic run probabilities to determine the probability of the change being real or just chance: Of course, If it's not obvious from the data it's not much of an improvement!

Also, too many people don't understand the true statisical meaning of significance and confidence: they apply layman's (dictionary) interpretation to them...this is not correct. These terms have very specific, narrow meanings. teh user shoudl understand tham and use them correctly.

As for the use of independent samples - I've seen that most people have little clue what this means. Many people assume that piece to piece variation is the largest component of variation and therefore assume that sequential pieces are independent. However, if lot to lot or set-up to set-up are the teh largest component, then sequential pieces are definitely NOT independent and the test has given a wrong answer regardless of the statistical levels chosen (garbage in, garbage out)...

Deming has some good advice hidden in his statement about run order: this is the best way to determine waht consititutes independent samples.

Looking at the Run order of any hypothesis test also protects against spurious associations (The real root cause "lines" up with your test samples...) Any hypothesis test is particularly exposed to this if the tests are not randomized and/or the test levels are confounded with other fators. Way too common in my experience.

The last issue is that the classical t-test et al is based on summary data and can be influenced by a single extreme value data point that skews the average of on eof the test levels providing another false answer... One must still look at the individual points for validation that reality matches the "summary" data conclusion of the statistical test.

Bottom line: Always plot your dat in time sequence AND look at the individual data points for nonrandom or strange patterns! (Unfortunately, most stats packages don't automatically do this - you have to deliberately do it yourself. I use excel)
Many very good statisticians and analysts will always recommend this - Deming wasn't the only one. even the t-test/f-test guys do this...

Unfortunately too many "trainers" and authors spend far too little time on the above topics (there's no math in there and it's tough to get articles published that dont' have lots of formulas!). THEy spend most of their time on the mechanicso fh the statistics and not on how to desing the test properly and collect the appropriate data...

t-tests and f-tests will do the same job IF you plot the data properly and use your logic to interpret the results - don't just rely on the p value!

John C · Jul 6, 2000

The last three submissions seem to be on the right lines despite quotations listed earlier. Those ones seem to contradict others that I have seen. Below are quotes from notes I made, from a book about Deming (I have lost the source so I can’t stand over them absolutely). It seems to me that Deming is not against hypothesis but is concerned about statisticians having insufficient knowledge of the process and consequently, applying hypothesis to unstable processes or to data which has no statistical relevance. Here’s a sample;

‘Formulating a hypothesis and comparing against practise, is fundamental’
This doesn’t seem like a guy who wants to banish hypothesis from use.

‘understand differences between enumerative and analytical problems. Statistical Theory is vital for tests and experiments ‘which is an analytical problem’.’
‘any finite amount of data can have a model designed to fit it exactly, but it won’t necessarily repeat.’

I’m guessing that these relate to the fact that enumerative data can not predict. I don’t claim to understand it.

‘It is only in the state of statistical control that statistical theory aids prediction. An experiment in an unstable system yields data that can only be interpreted by knowledge of the subject matter.’
‘prototypes are not typical’
‘While a process is out of control, no-one can predict it’s capability’
‘There is no knowledge of an unstable system’

So, first you need to establish the process is stable. Can you apply hypothesis to an unstable system? Can you make a system stable by applying a hypothesis?

‘Statisticians must understand the system and how statistical theory can help optimise it.’
‘Statistical Theory, used cautiously, can...(be effective)’
‘A totally stable system is impossible’
Which is another call for caution.

All this implies that Deming is concerned about misuse of the statistics, as he is in many other areas to the point where he 'abolishes' things. He abolishes; mass inspection, lowest tender contracts, fear, barriers, exhorations, numerical targets, appraisal, short term profits, etc. It seems to me that we should think of his ‘abolition’ of hypothesis in the same way; It’s not the literal interpretation, but the idea behind it that he is getting at. My best shot at what he means is; ‘Too many of you are running around with a solution, trying to find a problem to fit it. Stop doing that. Consider the problem in detail, understand it and then apply the right solution’ which, in many cases, is not hypothesis but the search for special causes.

rgds, John C

[This message has been edited by John C (edited 06 July 2000).]

dnorthcutt · Jul 6, 2000

Deming gives a very detailed explanation of the difference between Analytic and Enumerative in his book, Some Theory of Sampling, (c 1950 Wiley, currently available through Dover). The entire 7th chapter deals with this topic. The following quote from the opening paragraph summarizes the distinction:

"In the enumerative problem something is to be done to some portion of the contents of the bowl regardless of the reasons why that portion is so large or so small. In the analytic problem, on the other hand, something is to be done to regulate and predict the cause system that has produced the universe (city, market, lot of industrial product, crop of wheat) in the past and will continue to produce it in the future."

Deming goes on to show that the two types of studies have different sampling errors. In an enumerative study, the sampling error can in fact be reduced to zero by conucting a complete census. In an analytic study, even a census still leaves us with sampling error, because the point of the study is to make a statement about the cause system, not the particular lot in question. To make statements about analytic studies, one needs multiple observations over time. Additionally, as has been mentioned already, the time order is critical, and failure to account for ordering can result in faulty analyses. I think these issues are what lead Deming to be so down on simple hypothesis testing.

johno · Jul 25, 2000

A lot of hypothesis testing that I run into seems to have conclusions of 'accept' or 'don't accept', which seems strange if one is willing to say 'don't accept' at 94% and 'accept' at 95%. A p value seems to make more sense, where one can say 'it would be significant at 94%' and let the customer decide if that is ok.

artichoke · May 28, 2007

Re: Deming vs Statistical Hypothesis Testing

This is an old thread but an important one. For those interested in more detailed support for Deming's comments that hypothesis testing is an example of “poor teaching of statistical methods” and they have "no application in analytical problems in science and industry", read "Statistical Tests of Hypothesis" pages 402 to 410 of "Advanced Topics in SPC" by Wheeler.

fireonce · May 29, 2007

Re: Deming vs Statistical Hypothesis Testing

Thanks for your clarification, that have made me know it much clear.

Deming vs. Statistical Hypothesis Testing

Marc

Fully vaccinated are you?

Marc

Fully vaccinated are you?

Don Winton

Kevin Mader

One of THE Original Covers!

bevdaniels

John C

dnorthcutt

johno

artichoke

fireonce

Similar threads