# Use of Non-Parametric Statistics and Non-Normal Data

## My approach to Non-Parametric Statistics is:

• ### If the data merits, I'll use it

• Total voters
14

Staff member
I have seen this topic pop up several times regarding normality of data. I thought I would bounce it off of everybody to see what your thoughts are.

Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.

If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.

Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.

Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.

How does everyone deal with non-normality? Or do you?

#### Steve Prevette

##### Deming Disciple
Staff member
Super Moderator
Re: Use of Non-Parametric statistics

How does everyone deal with non-normality? Or do you?
I use Statistical Process Control as my primary statistical tool, and since it is non-parametric (re: Tchybychev Inequality) then I am pretty well covered.

I have seen folks using sampling here for many analyses, and most just run ahead with the t-test, with some rather small data sets. I did help out one group that was having trouble with using the t-test and determined that their data fit an exponential distribution, and used that to analyze their results. That got them to an answer that was within their limits, and more defensible (and actually more conservative) than the t-test.

#### Tim Folkerts

Super Moderator

You make several interesting points. My overall reaction is that, yes, you need to be careful in case the data is far from normal, but often the data is "close enough" to normal that you can get away with using tests that are based on the normal distribution.

As Steve mentioned, if it is clear that the distribution isn't normal, then it can be valuable to explore the actual distribution. As always, it is important to look at the actual plots, not just the summary numbers like mean & standard deviation.

If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same.
The 1st sentence sounds good, but any symmetric distribution (and even some non-symmetric distributions) will have mean = median.

But if you have a small sample size, then normality cannot be assumed.
Well, normality can be assumed - I've certainly seen enough people do it to know it is possible.

Tim F

#### Statistical Steven

##### Statistician
Staff member
Super Moderator
I have seen this topic pop up several times regarding normality of data. I thought I would bounce it off of everybody to see what your thoughts are.

Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.

If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.

Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.

Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.

How does everyone deal with non-normality? Or do you?
The question for me is not IF the data is normal, but rather do I expect the data to come from a normal distribution. There are many factors including outliers that can contribute to a sample being non-normal (either graphically or using a test such as Andersen-Darling). Test such as t-test and ANOVA are still robust to slight departures from normality, so the mean and median do not have to be equal. A bigger problem than the normality assumption is when comparing two samples (t-test) the variances are not equal between the groups. Here is where I employ nonparametric statistics if I cannot find a variance stabilizing transformation.

#### bobdoering

Trusted Information Resource
My preference is to use the statistics of the native distribution, rather than force feed it into another distribution (e.g. normal).

Staff member
My preference is to use the statistics of the native distribution, rather than force feed it into another distribution (e.g. normal).
Thanks. What methods do you utilize to determine the native distribution?

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
hmmm. well I am rarely at the point of performing a Hypothesis test where I don't know something about the underlying process performance. In fact I teach that to perform a hypothesis test without that knowledge is like buying a lottery ticket. you might get lucky if you got somehow stumbled onto the right answer and your test demonstrated it. you might get unlucky and not have the right answer but your test says you do. this iseither a result of alpha risk or 'foolishly' designed experiment (biased, non-random, outside normal operating conditions, etc.)

The things I want to know first are:

what is the average?

what is the rough range of the Y, min to max? or the occurence rate if it's categorical data? how else do I know what the sample size should be? low occuring rates take different stats than moderate to high event rates. Also if my hypothesis test results don't span the full range of the normal variation in the Y (or at least most of it) then I'll know I did something wrong in the test and the results can't be fully trusted without further analysis.

How long does it take to go from min to max? how else will know how I need to spread my samples out to avoid a spurious association.
of course all of these can be answered with a trend chart fo the data.
I also want to know what the normal operating range of the suspect causal factor so I know where to set the low and the high values for the test.

Once I have the trend (from which I can get the distribution shape) and I know the basic science behind the process I can get a useable understanding of the distribution. (an example of the science is that coating thinkness will be bounded at zero and it will be skewed; tool wear tends toward a uniform looking distribution...)

#### bobdoering

Trusted Information Resource
In brief: Typically run 50 to 100 pcs consecutively, plot them on a trend chart, generate a histogram, and see what it looks like. At that point it will be normal, or not. If it is a uniform distribution, it will look like a rectangle (the trend chart should look like a sawtooth curve). Otherwise, a Pearson analysis would help zero in on another appropriate distribution. Once you have identified the distribution, use its statistics.

Staff member
In brief: Typically run 50 to 100 pcs consecutively, plot them on a trend chart, generate a histogram, and see what it looks like. At that point it will be normal, or not. If it is a uniform distribution, it will look like a rectangle (the trend chart should look like a sawtooth curve). Otherwise, a Pearson analysis would help zero in on another appropriate distribution. Once you have identified the distribution, use its statistics.
Ahhh, now we're getting somewhere. My question is: how many people do this? No distribution is perfectly normal. At best, it will be somewhat normal. So how much data comes even close to being reasonably normal enough to assume normality?

#### bobdoering

Trusted Information Resource
Ahhh, now we're getting somewhere. My question is: how many people do this? No distribution is perfectly normal. At best, it will be somewhat normal. So how much data comes even close to being reasonably normal enough to assume normality?
Well, the best answer to any question is "it depends". I think it is a good idea to ponder whether a distribution can be expected to be normal. Normal distributions are a result of normal processes - and there is an emphasis on natural variation. One of Shewart's examples was tensile strength. Random natural variation caused by a myriad of influences - chemistry, crystalline structure, surface flaws, etc. Sure, I'd buy that. The example I like to use is a processing line of loaves of bread. The height of the loaves of bread is controlled by so many variables - proofing, yeast quality, humidity, accuracy of ingredient ratios, etc. The net result is a natural variation -most a particular height, some less, some more. If a process can be expected to stay at a particular "level", with some variation above and below that level - with NO operator intervention - until a special cause appears, it's normal. That is the "voice of the process". But, if you have to have someone adjust is to keep it there, then it is not normal - and you might as well start investigating what the distribution truly is. For precision machining I have found very little evidence of normal distributions - and plenty of evidence to the contrary! It is typically ruled by the uniform distribution.

Non parametric test for semi-quantitative data. Statistical Analysis Tools, Techniques and SPC 5
A "good" non-parametric test summary? Statistical Analysis Tools, Techniques and SPC 0
A Cnpk (Non-Parametric Capability Analysis) to assess whether the Process is Stable Capability, Accuracy and Stability - Processes, Machines, etc. 7
Non-sterile reusable surgical instruments - FDA sterilization requirement Other Medical Device Related Standards 2
Water requirement for Non-sterile topical OTCs Pharmaceuticals (21 CFR Part 210, 21 CFR Part 211 and related Regulations) 0
Procedure packs with non-medical devices EU Medical Device Regulations 1
Non Sterile Medical Device Environmental Tests Other Medical Device Related Standards 4
Advice on how to reduce overhead of handling non-conforming material Nonconformance and Corrective Action 7
Team to analyze a non conformance Customer Complaints 26
Promoting and marketing of a non approved device 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 4
0 non conformities in registrar audits over 4 years Management Review Meetings and related Processes 12
CE Marking Class 1 (Non sterile) medical device CE Marking (Conformité Européene) / CB Scheme 3
Supplier requirements - Major supplier is a Non-Profit registered with ICCBBA (FDA UDI) Supply Chain Security Management Systems 12
Free Sales Certificate for Non Medical Devices Other Medical Device Related Standards 2
Ppk results shown as asterisk after the transformation of Non-normal data Using Minitab Software 4
When is necessary to have RoHS declaration on non-electrical parts? REACH and RoHS Conversations 1
Non Aerospace topics - Anything for military trucks, trailers, Humvee type vehicles? AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 8
Dealing with non conformity caused by Supplier Components detected in the production line IATF 16949 - Automotive Quality Systems Standard 14
Who are the go to companies for non-destructive hardness testing? General Measurement Device and Calibration Topics 3
Non conformance (NC) or Corrective & Preventive action (CAPA) CE Marking (Conformité Européene) / CB Scheme 7
How does IEC-60601-1 apply to a non-medical device in the patient vicinity? IEC 60601 - Medical Electrical Equipment Safety Standards Series 1
FDA guidance on non-sterile Medical Device Packaging Medical Device and FDA Regulations and Standards News 7
Qualification for non gmp service providers Supplier Quality Assurance and other Supplier Issues 1
MDD x PPE Directive - Statement of Non-Applicability EU Medical Device Regulations 3
Exclusions or justification for non-applicability of IEC standards Reliability Analysis - Predictions, Testing and Standards 1
Non-EU Language Requirements Other Medical Device Regulations World-Wide 3
Non-Conformances Found After 3rd Party Sorting Supplier Quality Assurance and other Supplier Issues 12
ISO 13485 8.3 - Non-Conforming Materials - on-line rework or part of process? ISO 13485:2016 - Medical Device Quality Management Systems 11
Audit non-compliance API Q1 - Use of External Documents 4.4.4 in Product Realization Oil and Gas Industry Standards and Regulations 8
Using non-conforming components even though the final assembly is conforming? Manufacturing and Related Processes 5
Competent Authority notification for non-EU manufacturer EU Medical Device Regulations 4
CE marking for NON-EU EU Medical Device Regulations 0
Non-GMP examples in Pharmaceutical industry Pharmaceuticals (21 CFR Part 210, 21 CFR Part 211 and related Regulations) 2
Increasing PFMEA occurrence ranking after non-conformance FMEA and Control Plans 4
Non-Conformance and Deviations ISO 13485:2016 - Medical Device Quality Management Systems 4
Risk Assessment Checklist for Non product Software IEC 62304 - Medical Device Software Life Cycle Processes 1
Ideas for developing a Supplier Quality Management System, non automotive ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 5
Audit non-compliance - API Spec Q1 9th Ed 5.6.1.2 b Oil and Gas Industry Standards and Regulations 10
Applicability of new non-harmonized standards (MDD/MDR) EU Medical Device Regulations 14
Scope of Combined ISO 9001 and IATF 16949 QMS - Non-automotive customers ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 5
Guidance on manufacturing non-surgical face masks US Food and Drug Administration (FDA) 3
Do non-IATF customers need to be included in audit scope? IATF 16949 - Automotive Quality Systems Standard 23
If it doesn't prevent a non-conformance, is it a preventive action? IATF 16949 - Automotive Quality Systems Standard 13
Addressing Non-Conformances from an Internal Audit that are not product related ISO 13485:2016 - Medical Device Quality Management Systems 11
Non conformance product identification and traceability 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 4
Non-Conformance vs OFI -- your best descriptions ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 61
Question about Non-conformances during New Product Introduction Nonconformance and Corrective Action 14
Bioburden monitoring for surgical instrument provided non-sterile EU Medical Device Regulations 3
Packaging validation for non-sterile Medical Equipment Other Medical Device Related Standards 1
Can we provide training plan as corrective action for IATF 16949 Non conformity? IATF 16949 - Automotive Quality Systems Standard 9