Thank you very much for this summary of your thoughts. As I have written a summary about tolerance intervals myself, I compared your summary with mine. Here are some comments:
1. The "Odeh & Owen table" for the k-factor is only applicable, if we possess a
two-sided specification. For a one-sided specification limit we need to use different k-factors.
2. Tolerance intervals use a "frequentist" interpretation of "probability" (= non-Bayesian). This is why your statements such as
- “There’s a 90% probability that at least 95% of the population falls within [Sample Average ± 3.1 * (Sample StdDev)], and thus within the Specification Limits (the Design Input requirement)"
are difficult to understand. It is much simpler to state
- "We are 90% confident that at least 95% of the population falls within [...] specification.”
This statement is (a) mathematically exact, and (b) uses the wording from your table.
3. Wheeler's statements about the usage of the k-factors for non-normal distributions is just wrong. Although it is true that the normal distribution possesses the largest entropy (under commonly accepted assumptions), this statement is not true, if we allow for further conditions (such as considering only 90% of the population). To convince ourself, we just have to compare the tolerance intervals for a given {confidence, coverage} pair for (i) the normal distribution and (ii) Binomial distribution -- this was derived by Wilks. Although I doubt that many auditors are able to catch this mistake, I would not place a wrong statistical statement into a verification document.
4. You included the following sample size note:
- If, for example, one operator measured 5 parts, twice each, 10 datapoints are available for analysis as above (N=10).
To me this statement seems to be out of place. If we try to demonstrate (in OQ or PQ) that we are 90% confident that 95% of our generated products are within specification, it is
not enough to take n=5 parts, measure each part r=2 times, calculate the k-factor for the N=n*r=10 measurements, and check if it exceeds k=3.026. The measurements are not independent, which is an assumption of the tolerance interval. To use the critical k-factor k=3.026 for the (gamma=90%, P=95%,N=10) tolerance interval, we have to measure ten independent parts.
However, if our measurements uncertainty is large ("bad gauge"), it is mathematically acceptable to do the following:
i) take n=5 parts,
ii) measure each part r=2 times,
iii) calculate the average value for each part, {ybar_1, ..., ybar_5},
iv) calculate the k-factor of these five averages, and
v) accept the verification if k >= 4.142.
Although this is mathematically correct, auditors won't like this extra step of averaging. Thus, I would use this procedure only if it is hard/expensive to produce additional parts.