# Use of Non-Parametric Statistics and Non-Normal Data

## My approach to Non-Parametric Statistics is:

• ### If the data merits, I'll use it

• Total voters
14

Staff member
I have seen this topic pop up several times regarding normality of data. I thought I would bounce it off of everybody to see what your thoughts are.

Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.

If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.

Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.

Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.

How does everyone deal with non-normality? Or do you?

#### Steve Prevette

##### Deming Disciple
Staff member
Super Moderator
Re: Use of Non-Parametric statistics

How does everyone deal with non-normality? Or do you?
I use Statistical Process Control as my primary statistical tool, and since it is non-parametric (re: Tchybychev Inequality) then I am pretty well covered.

I have seen folks using sampling here for many analyses, and most just run ahead with the t-test, with some rather small data sets. I did help out one group that was having trouble with using the t-test and determined that their data fit an exponential distribution, and used that to analyze their results. That got them to an answer that was within their limits, and more defensible (and actually more conservative) than the t-test.

#### Tim Folkerts

Super Moderator

You make several interesting points. My overall reaction is that, yes, you need to be careful in case the data is far from normal, but often the data is "close enough" to normal that you can get away with using tests that are based on the normal distribution.

As Steve mentioned, if it is clear that the distribution isn't normal, then it can be valuable to explore the actual distribution. As always, it is important to look at the actual plots, not just the summary numbers like mean & standard deviation.

If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same.
The 1st sentence sounds good, but any symmetric distribution (and even some non-symmetric distributions) will have mean = median.

But if you have a small sample size, then normality cannot be assumed.
Well, normality can be assumed - I've certainly seen enough people do it to know it is possible.

Tim F

#### Statistical Steven

##### Statistician
Staff member
Super Moderator
I have seen this topic pop up several times regarding normality of data. I thought I would bounce it off of everybody to see what your thoughts are.

Parametric statistics (say the t-test) are making inferences regarding the mean. If the population is normal, then the mean and median are the same. If a set of data is non-normal, then the mean and median are not the same. A non-parametric test will make inferences based on the median.

If a set of data is ‘reasonably’ normal (through graphing; although no data is truly normal), then yes, parametric statistics are significantly more efficient and the proper choice. But if you have a small sample size, then normality cannot be assumed.

Yes, the Central Limit Theorem (C.L.T.) does come into play. But the theorem deals with averages. If you have 17 samples, it is difficult in my mind (unless the data shows normality) to assume normality, simply based on the C.L.T. Sure, I can average 7000 times from those 17 numbers (Bootstrapping), and guess what, normality! But I still, in reality, only have my 17 numbers.

Parametric statistics are easier and way more efficient. But unless your data shows normality, non-parametric statistics should be utilized. NCSS shows both parametric and non-parametric tests. If the two tests give you the same results, then nothing is loss. If you get different results, a little more digging may be in order to determine which set of assumptions holds truer.

How does everyone deal with non-normality? Or do you?
The question for me is not IF the data is normal, but rather do I expect the data to come from a normal distribution. There are many factors including outliers that can contribute to a sample being non-normal (either graphically or using a test such as Andersen-Darling). Test such as t-test and ANOVA are still robust to slight departures from normality, so the mean and median do not have to be equal. A bigger problem than the normality assumption is when comparing two samples (t-test) the variances are not equal between the groups. Here is where I employ nonparametric statistics if I cannot find a variance stabilizing transformation.

#### bobdoering

Trusted Information Resource
My preference is to use the statistics of the native distribution, rather than force feed it into another distribution (e.g. normal).

Staff member
My preference is to use the statistics of the native distribution, rather than force feed it into another distribution (e.g. normal).
Thanks. What methods do you utilize to determine the native distribution?

#### Bev D

##### Heretical Statistician
Staff member
Super Moderator
hmmm. well I am rarely at the point of performing a Hypothesis test where I don't know something about the underlying process performance. In fact I teach that to perform a hypothesis test without that knowledge is like buying a lottery ticket. you might get lucky if you got somehow stumbled onto the right answer and your test demonstrated it. you might get unlucky and not have the right answer but your test says you do. this iseither a result of alpha risk or 'foolishly' designed experiment (biased, non-random, outside normal operating conditions, etc.)

The things I want to know first are:

what is the average?

what is the rough range of the Y, min to max? or the occurence rate if it's categorical data? how else do I know what the sample size should be? low occuring rates take different stats than moderate to high event rates. Also if my hypothesis test results don't span the full range of the normal variation in the Y (or at least most of it) then I'll know I did something wrong in the test and the results can't be fully trusted without further analysis.

How long does it take to go from min to max? how else will know how I need to spread my samples out to avoid a spurious association.
of course all of these can be answered with a trend chart fo the data.
I also want to know what the normal operating range of the suspect causal factor so I know where to set the low and the high values for the test.

Once I have the trend (from which I can get the distribution shape) and I know the basic science behind the process I can get a useable understanding of the distribution. (an example of the science is that coating thinkness will be bounded at zero and it will be skewed; tool wear tends toward a uniform looking distribution...)

#### bobdoering

Trusted Information Resource
In brief: Typically run 50 to 100 pcs consecutively, plot them on a trend chart, generate a histogram, and see what it looks like. At that point it will be normal, or not. If it is a uniform distribution, it will look like a rectangle (the trend chart should look like a sawtooth curve). Otherwise, a Pearson analysis would help zero in on another appropriate distribution. Once you have identified the distribution, use its statistics.

Staff member
In brief: Typically run 50 to 100 pcs consecutively, plot them on a trend chart, generate a histogram, and see what it looks like. At that point it will be normal, or not. If it is a uniform distribution, it will look like a rectangle (the trend chart should look like a sawtooth curve). Otherwise, a Pearson analysis would help zero in on another appropriate distribution. Once you have identified the distribution, use its statistics.
Ahhh, now we're getting somewhere. My question is: how many people do this? No distribution is perfectly normal. At best, it will be somewhat normal. So how much data comes even close to being reasonably normal enough to assume normality?

#### bobdoering

Trusted Information Resource
Ahhh, now we're getting somewhere. My question is: how many people do this? No distribution is perfectly normal. At best, it will be somewhat normal. So how much data comes even close to being reasonably normal enough to assume normality?
Well, the best answer to any question is "it depends". I think it is a good idea to ponder whether a distribution can be expected to be normal. Normal distributions are a result of normal processes - and there is an emphasis on natural variation. One of Shewart's examples was tensile strength. Random natural variation caused by a myriad of influences - chemistry, crystalline structure, surface flaws, etc. Sure, I'd buy that. The example I like to use is a processing line of loaves of bread. The height of the loaves of bread is controlled by so many variables - proofing, yeast quality, humidity, accuracy of ingredient ratios, etc. The net result is a natural variation -most a particular height, some less, some more. If a process can be expected to stay at a particular "level", with some variation above and below that level - with NO operator intervention - until a special cause appears, it's normal. That is the "voice of the process". But, if you have to have someone adjust is to keep it there, then it is not normal - and you might as well start investigating what the distribution truly is. For precision machining I have found very little evidence of normal distributions - and plenty of evidence to the contrary! It is typically ruled by the uniform distribution.

Non parametric test for semi-quantitative data. Statistical Analysis Tools, Techniques and SPC 5
A "good" non-parametric test summary? Statistical Analysis Tools, Techniques and SPC 0
A Cnpk (Non-Parametric Capability Analysis) to assess whether the Process is Stable Capability, Accuracy and Stability - Processes, Machines, etc. 7
Do you follow your QMS for non-device software features? Medical Information Technology, Medical Software and Health Informatics 4
Can we register non-device clinical decision support software under draft guidance? Other US Medical Device Regulations 5
IATF 16949 SI 10, External non-accredited lab IATF 16949 - Automotive Quality Systems Standard 4
Selling non-CE marked devices for evaluation EU Medical Device Regulations 4
Non API products need to comply to API Q1? Oil and Gas Industry Standards and Regulations 3
"Shelf-Life" Class I Non-Sterile Products Expiration Date? CE Marking (Conformité Européene) / CB Scheme 3
Process / Procedure - Radiographic (X-Ray) Non-Film Document Control Systems, Procedures, Forms and Templates 0
How to find column number for the last non-empty column? Using Minitab Software 0
Need to calculate tolerance Intervals with a set of non-normal data and 3-Parameter Weibull distribution Using Minitab Software 0
Non-conformance Register vs Corrective Action Register ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 13
Competent Authority non-EU Manufacturer EU Medical Device Regulations 7
Non conforming locked? Manufacturing and Related Processes 13
Changing Investigational Device from Non-Sterile to Sterile - what are the implications? Other Medical Device and Orthopedic Related Topics 3
Is Calibration Required for Non-Adjustable Commercial Inspection Devices? General Measurement Device and Calibration Topics 11
Informal vs formal scope creep... managing non-medical devices through system processes ISO 13485:2016 - Medical Device Quality Management Systems 2
Not accepting a non conformity during an audit General Auditing Discussions 11
In-house (NHS) manufacture and use (by staff) of non-medical devices.. any regulations apply? UK Medical Device Regulations 6
Does FDA Registration QSR need to cover non-medical devices for contract repackager? US Food and Drug Administration (FDA) 1
Brexit Mandate for EU Authorised Representative for non medical devices CE Marking (Conformité Européene) / CB Scheme 7
Accredited vs. non-accredited labs for 60601 compliance in the US IEC 60601 - Medical Electrical Equipment Safety Standards Series 2
Accredited vs. non-accredited labs for 60601 compliance in the US Other Medical Device Related Standards 4
Non-sterile reusable surgical instruments - FDA sterilization requirement Other Medical Device Related Standards 2
Water requirement for Non-sterile topical OTCs Pharmaceuticals (21 CFR Part 210, 21 CFR Part 211 and related Regulations) 0
Procedure packs with non-medical devices EU Medical Device Regulations 1
Non Sterile Medical Device Environmental Tests Other Medical Device Related Standards 4
Advice on how to reduce overhead of handling non-conforming material Nonconformance and Corrective Action 7
Team to analyze a non conformance Customer Complaints 26
Promoting and marketing of a non approved device 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 6
0 non conformities in registrar audits over 4 years Management Review Meetings and related Processes 12
CE Marking Class 1 (Non sterile) medical device CE Marking (Conformité Européene) / CB Scheme 3
Supplier requirements - Major supplier is a Non-Profit registered with ICCBBA (FDA UDI) Supply Chain Security Management Systems 12
Free Sales Certificate for Non Medical Devices Other Medical Device Related Standards 2
Ppk results shown as asterisk after the transformation of Non-normal data Using Minitab Software 4
When is necessary to have RoHS declaration on non-electrical parts? REACH and RoHS Conversations 1
Non Aerospace topics - Anything for military trucks, trailers, Humvee type vehicles? AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 8
Dealing with non conformity caused by Supplier Components detected in the production line IATF 16949 - Automotive Quality Systems Standard 14
Who are the go to companies for non-destructive hardness testing? General Measurement Device and Calibration Topics 3
Non conformance (NC) or Corrective & Preventive action (CAPA) CE Marking (Conformité Européene) / CB Scheme 7
How does IEC-60601-1 apply to a non-medical device in the patient vicinity? IEC 60601 - Medical Electrical Equipment Safety Standards Series 1
FDA guidance on non-sterile Medical Device Packaging Medical Device and FDA Regulations and Standards News 7
Qualification for non gmp service providers Supplier Quality Assurance and other Supplier Issues 1
MDD x PPE Directive - Statement of Non-Applicability EU Medical Device Regulations 3
Exclusions or justification for non-applicability of IEC standards Reliability Analysis - Predictions, Testing and Standards 1
Non-EU Language Requirements Other Medical Device Regulations World-Wide 3
Non-Conformances Found After 3rd Party Sorting Supplier Quality Assurance and other Supplier Issues 12
ISO 13485 8.3 - Non-Conforming Materials - on-line rework or part of process? ISO 13485:2016 - Medical Device Quality Management Systems 11
Audit non-compliance API Q1 - Use of External Documents 4.4.4 in Product Realization Oil and Gas Industry Standards and Regulations 8