Date: Thu, 7 Sep 2000 11:04:02 -0400
From: Philip Stein
To: Greg Gogates
Subject: Re: Proficiency testing RE13
We assessors are often recommending that labs severely round off any
'precision' they calculate in uncertainty estimates, since the
underlying science for most Type B's is usually only an educated
guess. As long as they round off towards larger uncertainties, this
makes me comfortable.
While it's true that, as Howard says, the overall process should
contain as much actual information as possible, there are many cases
where a budget is required when there isn't much detail in the actual
information, therefore...
Also, I have to point out that in most cases, an approximation is
plenty good enough for all practical commercial purposes. The only
time it's important is when your calibration is high up in a pyramid
and many users are depending on your u value to be as small as
possible (that it, not smaller than it really is, but not greatly
inflated either, since that value will have wide ramifications on all
the daughters far down the tree).
Given that, I need to point out that NIST's calibration of gage
blocks smaller than one inch carry an expanded uncertainty of 1
microinch. Is it possible that this is rounded off?
Phil
>
>Larry,
>
>I know of no NIST conspiracy. To what are you referring? As for the
>advisability of the use of k=2 for expanded uncertainties (confidence
>limits?), why n=30? Are you implying that the degrees of freedom for test
>or calibration uncertainties is at least 30? If so, you may be right for
>many cases, but there are also many cases where this is not so.
>
>The degrees of freedom for a total uncertainty estimate is determined from
>the degrees of freedom for each component uncertainty according to the
>Welch-Satterthwaite relation (see the GUM). This relation "weights" each
>component degrees of freedom contribution according to the magnitude of its
>associated uncertainty estimate. In the past, Type B estimates were
>assigned a degrees of freedom of infinity simply because we didn't know any
>better. As was quickly realized, however, this was an ill-advised practice.
>
>This is because the degrees of freedom of an uncertainty estimate represents
>the amount of information that went into the estimate. For a Type A
>estimate computed from a data sample, it is simply the sample size minus
>one. Using Eq. G.3. of the GUM, a methodology has been developed that
>allows us to assemble this experience in such a way that we can estimate the
>degrees of freedom for Type B estimates (attached). It turns out that the
>degrees of freedom for a Type B estimate will often be less than thirty. In
>many cases, the degrees of freedom is not much larger than five or ten.
>Moreover, in many cases, these estimates are the dominating contributors to
>total uncertainty. Consequently, the degrees of freedom of the total
>combined estimate will sometimes be around the same number as the degrees of
>freedom for the dominating Type B estimate.
>
>In my earlier message, I included a table of 95% coverage factors for
>different degrees of freedom. As the table shows, using k=2 is sometimes
>not justified in such cases.
>
>So, what is the point of all this? Simply to make people aware that they
>are being short-changed when someone hands them an expanded uncertainty,
>based on k=2, with no other supporting information. It's just not good
>enough.
>
>In this regard, I've been reviewing some lab capability statements that
>suggest that, in spite of the requirements of 17025 and the increased
>awareness of accrediting bodies, some calibration organizations have found a
>way around doing the actual work by latching onto "accepted" bonehead
>practices, such as blindly applying k=2. For example, one statement listed
>an uncertainty of 2 parts in 10^7 for a parameter, citing k=2. Well, if we
>divide by 2, we get 1 part in 10^7. Does this look like the result of a
>rigorous uncertainty analysis or does it look more like shooting from the
>hip? Another example listed an expanded uncertainty of +/- 0.005, which
>means that the standard uncertainty is about 0.0025. Since the listing came
>from the same organization that produced 10^7, I suspect that 0.0025 is
>0.0029 rounded off, which is 0.005 divided by root 3. This suggests that
>the listed uncertainty is the result of fiddling with a set of convenient
>limits rather than the result of an actual analysis.
>
>Of course, the listed numbers may actually have been arrived at through
>detailed and painstaking analyses. We don't know. It would help immensely,
>however, if the authors of the capability statement did two things: (1) not
>insult our intelligence by multiplying the standard uncertainty estimate by
>two, and (2) publish the degrees of freedom accompanying each estimate.
>
>I can just hear the screaming now... "What?!! You mean after we've got
>this beautiful dodge all worked out, you want actual information?"
>
>Best Regards,
>Howard Castrup
>President, Integrated Sciences Group
>
>-----Original Message-----
>From: owner-iso25@quality.org [mailto:owner-iso25@quality.org]On Behalf
>Of Greg Gogates
>Sent: Thursday, August 31, 2000 9:05 AM
>To: iso25@quality.org
>Subject: Proficiency testing RE12
>
>
>Date: Thu, 31 Aug 2000 07:50:06 -0700
>From: "Nielsen, Larry E"
>To: 'Greg Gogates'
>Subject: RE: Proficiency testing RE7
>
>Howard,
>I must be missing something here in all this discussion on coverage factors
>of k = 2 and the great conspiracy theory by NIST, et al to perpetuate this
>practice. But it seems to me that the t-distribution and the normal
>distribution converge somewhere above about n = 30. Therefore, the purpose
>of applying the t-distribution is to state or estimate the resulting
>expanded uncertainty with the same level of confidence as had 30 or more
>data points been taken when the data at hand provides less than 30 degrees
>of freedom.
>
>So as long as we're talking about normally distributed data (which is the
>prevalent case in most physical calibrations), the coverage factor you
>actually apply should be based on effective degrees of freedom (per
>Welch-Satterthwaite), and may very well be something other than 2. However,
>the one you report is k = 2. This allows one to quote expanded uncertainty
>on a standardized (95.45% coverage probability, infinite degrees of freedom)
>basis without having to go into all the gory details of about what was
>actually done to arrive at the estimate for each and every test. In what
>way does this constitute incorrect practice or a conspiracy?
>
>Sincerely,
>****************************************************
>Larry E. Nielsen
>So. Cal. Edison - Metrology
>7300 Fenwick Lane
>Westminster, CA 92683
>(714) 895-0489; fax (714) 895-0686
>e-mail: nielsele@sce.com
>****************************************************
>
>
>
> > ----------
> > From: Greg Gogates[SMTP:iso25@fasor.com]
> > Sent: Tuesday, August 29, 2000 11:08 AM
> > To: iso25@quality.org
> > Subject: Proficiency testing RE7
> >
> > Date: Tue, 29 Aug 2000 17:59:18 +0100
> > From: Steve Ellison
> > To: iso25@fasor.com
> > Subject: Re: Proficiency testing RE5
> >
> > Interesting discussion on En numbers and k=2.
> >
> > My potted summary of the usual criteria:
> > En = (measured difference)/(combined expanded uncertainty). Values inside
> > 1.0 are usually acceptable.
> >
> > z=(measured difference)/(target value for standard deviation). Values
> > inside +-2.0 are usually acceptable; outside +-3.0 seriously deficient,
> > and values in between are doubtful.
> >
> > A variant of En, based on z, but with the divisor equivalent to a combined
> > standard uncertainty (ie 1 standard deviation) has also been proposed;
> > interpretation is then similar to z.
> >
> > For info: if your national standards body has copies of the current draft
> > ISO 13528 ("Draft ISO 13528: Statistical methods for use in proficiency
> > testing by interlaboratory comparisons", currently still at the working
> > group stage, it's worth a look, as it has all these formulae in it.
> >
> > Some personal views:
> >
> > On En:
> > Why anyone would use En=0.5 as a cutoff is beyond me. An En inside 1.0
> > says (allowing for some simplifications on degrees of freedom) "my
> > measured error, wrt the certified value, is inside the expanded
> > uncertainty of the comparison". That is a simple and useful criterion for
> > a cal. lab, given that it is an expanded uncertainty that is quoted on
> > certificates. On the other hand, if uncertainties behaved like normal
> > distributions (which they sometimes do, roughly) we'd _expect_ to be
> > outside En=0.5 about a third of the time if the calibration process is
> > behaving itself! Not a useful criterion, unless you really are looking
> > for calibrations with high levels of confidence.
> >
> > On k=2:
> > I don't feel much inclined toward the NIST conspiracy theory on this.
> > Broadly, k=2 is easy, and it's good enough for many practical purposes.
> > Also, if you state k (as GUM recommends) then an interval with increased
> > confidence can be calculated from the associated standard uncertainty. And
> > if you're worried about quoted uncertainty or compliance being affected by
> > the exact choice of k, you are probably better off looking at doing more
> > or better measurements, not fiddling with the value of k.
> >
> > More generally, though, I confess I've very mixed feelings about k=2. It's
> > simple and standard, which is good. And where the dominant contributions
> > are to do with calibration uncertainties, they are given with typically
> > high degrees of freedom on certificates, so k=2 is reasonable for a final
> > result. And if dominant uncertainties are obtained largely from repeated
> > measurements, it takes extraordinarily high numbers of experiments to get
> > an sd 'accurate' to two significant figures, so putting extra digits on k
> > is a bit pointless.
> > On the other hand, in my own field, the dominant contributions are often
> > from variability from one measurement to the next, and frequently are not
> > so well characterised (eg n<15). 2.5 or even 3 might be a better default
> > value for a lot of chemical testing, so a general k=2 recommendation makes
> > me uneasy.
> >
> > And a last note on using k for 'increased confidence': For many
> > measurement processes, the departures form normality are frequent and
> > large. We do get roughly 90-95% confidence for observed distributions at
> > k=2 (ish) in decent testing work. We do NOT get 99.7% confidence from k=3!
> > There are loads of results out past 3 sd's in most routine testing, due to
> > assorted mess-ups - I've seen results at over 1000 standard deviations
> > from the 'truth' in PT rounds (someone quoted the wrong units...). With
> > that sort of thing going on, there really is little point in using k
> > factors aimed at more than about 95% confidence.
> >
> > With all the misgivings, though, I'll plump for a 'standard' k=2 by
> > default for routine testing, unless the degrees of freedom really do drop
> > below about 6 or there's some other reason to look more carefully. And if
> > there is good reason to look harder, it won't generally be k that I'll be
> > looking at - it'll be the whole measurement process!
> >
> > Steve Ellison.
> >
> > >>> Greg Gogates 24/08/2000 23:02:47 >>>
> > Date: Thu, 24 Aug 2000 12:40:44 -0700
> > From: "Dr. Howard Castrup"
> > To: Greg Gogates
> > Subject: RE: Proficiency testing RE4
> >
> > Karl,
> >
> > Thanks for the reply to my discussion on the care and feeding of En. The
> > strong support for k=2 primarily comes from the fact that NIST uses it.
> > So,
> > if it's often an inappropriate coverage factor, why to they use it? The
> > answer comes in two parts:
> >
> > 1. A total uncertainty is usually a combination of Type A and Type B
> > uncertainty estimates. Some influential people at NIST and other
> > organizations can't make Type B estimates behave statistically, so they
> > usually side-step the issue, assert that measurement uncertainties are
> > guesses in the first place, and state that k=2 corresponds to 95%
> > confidence.
> >
> > Why can't they make Type B estimates behave statistically? The principal
> > reason is that, to use an uncertainty estimate to develop confidence
> > limits,
> > you need to know the degrees of freedom associated with the estimate. If
> > you look in the GUM, you'll find in Appendix G an expression for computing
> > the degrees of feedom for Type B estimates. Unfortunately, the relevant
> > expression (Equation G.3) contains a term that stops people from using it.
> > This term is the variance of the Type B uncertainty estimate. The GUM
> > does
> > not provide any guidance for calculating it.
> >
> > Since the GUM was published, I developed a rigorous methodology for
> > computing the uncertainty variance based on Equation G.3. The methodology
> > is built into UncertaintyAnalyzer, was reported at the 2000 MSC, and is
> > available in freeware from our Web site at www.isgmax.com.
> >
> > So, we can now treat Type B estimates statistically, can develop
> > confidence
> > levels and can consign k=2 to the scrap heap of bad ideas. Why aren't we
> > doing this? Well, part of the answer is that not enough people are aware
> > of
> > the existence of the uncertainty variance methodology. Part of the answer
> > is also that k=2 is easy to apply, although it frequently has little
> > meaning. But, a big part of the answer is that most people at NIST and at
> > NIST counterparts in other countries haven't been able to make the
> > connection between uncertainty and risk, which brings us to the second
> > part
> > of my answer...
> >
> > 2. The coverage factor to use should depend on the level of confidence
> > that
> > is appropriate for a given context. 95% is often a useful confidence
> > level
> > but sometimes is inappropriate. This is the case for critical parts
> > specifications, control of many manufacturing processes, rigorous testing
> > experimental results, etc. (This is something we should address in our
> > NCSL
> > Decision Risk RP.) You mentioned that some organizations want the cutoff
> > for En to be 0.5. This is the same thing as using k=4 and keeping the
> > cutoff at 1. If you have normally distributed measurement errors and an
> > infinite degrees of freedom, this corresponds to a two-sided confidence
> > level of about 99.9937%.
> >
> > Now, I ask you, why restrict ourselves to one or two specific confidence
> > levels when we can set the confidence level at whatever value is required?
> > This can be easily done by using the uncertainty variance methodology and
> > referring to the attachment of my earlier message. I know that many in
> > the
> > metrology community will resist this approach on the grounds that
> > technicians can't easily implement it. However, to be compliant with
> > 17025,
> > we need to produce results that are relevant to the context of usage --
> > which introduces the need for being in control of measurement decision
> > risk
> > in a meaningful way. Moreover, the appropriate k-factor can be computed
> > with the freeware package mentioned above, so technicians can determine it
> > anyway.
> >
> > Howard Castrup
> > President, Integrated Sciences Group
> >
> > -----Original Message-----
> > From: owner-iso25@quality.org [mailto:owner-iso25@quality.org]On Behalf
> > Of Greg Gogates
> > Sent: Wednesday, August 23, 2000 9:57 AM
> > To: iso25@quality.org
> > Subject: Proficiency testing RE4
> >
> >
> > Date: Tue, 22 Aug 2000 14:16:30 -0400
> > From: khaynes
> > To: Greg Gogates
> > Subject: Re: Proficiency testing RE3
> >
> > Hi Howard,
> > With regards to the formula used in the En numbers what I have seen (from
> > recollection) is that they do not explicitly define it as expanded
> > measurement uncertainty in ISO Guide 43-1, and A2LA and NVLAP
> > accreditation
> > program requirements, but the inference as it follows measurement
> > uncertainty and the report of measurement uncertainty at k=2 is pretty
> > strong. That is significantly different than the treatment in your paper.
> > Also, comments by a presenter at the 1999 NCSL conference were that some
> > accreditation bodies wanted En less than or equal to 0.5.
> >
> > As you know I'm not the statistician, but having been a Mikel Harry
> > trained,
> > directly and indirectly, six sigma black belt and a CQE I have previously
> > open a statistical text before and attempted to decipher the message
> > within.
> > To my eyes as a result of that strong inference of the use of the k=2
> > uncertainty in a pooled expanded uncertainty it seemed that there was
> > material for a couple of papers exploring the use of the En numbers with
> > varying cutoffs of 0.5 and 1.0, the estimation of reference and test labs
> > measurement uncertainty using short term capability analysis and
> > estimation
> > of the long term defects along with the consultants championed SPC
> > concepts
> > of just remeasuring an artifact over and over.
> >
> > Since this is one of the big tests of accreditation, ability in actually
> > making competent measurements, I would think that there would be more
> > treatment of this subject. Perhaps you and others can clarify this in the
> > thread for me and others
> > Thanks, Karl Haynes
> >
> > ----- Original Message -----
> > From: Greg Gogates
> > To:
> > Sent: Monday, August 21, 2000 4:09 PM
> > Subject: Proficiency testing RE3
> >
> >
> > > Moderator note,
> > > The file is available via
> > > ftp://ftp.fasor.com/accts/i/iso25/Two_Mean_Difference_Test.pdf
> > > Greg
> > >
> > > Date: Mon, 21 Aug 2000 12:51:16 -0700
> > > From: "Dr. Howard Castrup"
> > > To: Greg Gogates
> > > Subject: RE: Proficiency testing
> > >
> > > -----Original Message-----
> > > From: owner-iso25@quality.org [mailto:owner-iso25@quality.org]On Behalf
> > > Of Greg Gogates
> > > Sent: Friday, August 18, 2000 1:11 PM
> > > To: iso25@quality.org
> > > Subject: Proficiency testing
> > >
> > >
> > > From: "David Collins"
> > > To: iso25@quality.org
> > > Subject: Proficiency testing
> > > Date: Fri, 18 Aug 2000 12:06:15 EDT
Philip Stein O-
Fellow, ASQ and past member of its board of directors
A2LA Lead Assessor
Past Chair, ASQ Measurement Quality Division
Check out http://www.measurement.com