# P-value is 0.05 - Normal or non-normal data?

R

#### RMedrano

Someone brought a capability study to me this morning, after running it through minitab...the distribution looked slightly bimodal.. I ran a normality test on the data and it kicked out a p-value of 0.050 i was always told tha a p > 0.05 was normal

does that mean 0.05 is non-normal?

#### Miner

##### Forum Moderator
Staff member
RMedrano said:
Someone brought a capability study to me this morning, after running it through minitab...the distribution looked slightly bimodal.. I ran a normality test on the data and it kicked out a p-value of 0.050 i was always told tha a p > 0.05 was normal

does that mean 0.05 is non-normal?
You are correct. A p-value > 0.05 means the null hypothesis (that the distribution is normal) is accepted. A p-value < 0.05 means that the null hypothesis is rejected and the distribution is not normal.

From your question, do you have a p-value of exactly 0.05?

At this point, it is time to be practical. Trust your eyes. If the distribution looks bimodal, look for possible reasons why it could be bimodal. Do the products come from more than one process stream? Plot the data on a run chart in time sequence. Is there a process shift? Does your gage have inadequate resolution? Did more than one person take the measurements? Were the measurements taken on different gages, or at different times?

If the answers to all of these questions are no, try repeating the study using a larger sample size. This should force the p-value higher or lower than 0.05 and make the histogram more obvious.

#### Statistical Steven

##### Statistician
Staff member
Super Moderator
Which test of normality did you run? I prefer the Wilks-Shapiro method for larger sample sizes (greater than 50), and the Kolmogorov-Smirnoff Test for smaller sample sizes.

R

#### RMedrano

this was coming from the minitab graphical summary function which uses the anderson darling method i believe.

I concluded after close examination of the data that it appeared to be bimodal.

after speaking with the technician who gathered the data, found that he took a total of 30 parts, but took one part off of each fixture of the machine in question. In that process we track each fixture with a seperate spc chart, because each can be adjusted seperatly... with 14 fixtures it was much more than bimodal is there a word for having 14 different modes? LOL thanks for the help guys. Im just starting to learn this stuff, you have been a great help.

#### Tim Folkerts

Super Moderator
RMedrano said:
is there a word for having 14 different modes? LOL Quadradecimodal #### Tim Folkerts

Super Moderator
And a distribution that only sort of has a hump in the middle is named after a person who sort of had a hump... The Quasimodal distribution Sorry! I just couldn't help myself ....

E

#### Ehsan Heidari

You are correct. A p-value > 0.05 means the null hypothesis (that the distribution is normal) is accepted. A p-value < 0.05 means that the null hypothesis is rejected and the distribution is not normal.

From your question, do you have a p-value of exactly 0.05?

At this point, it is time to be practical. Trust your eyes. If the distribution looks bimodal, look for possible reasons why it could be bimodal. Do the products come from more than one process stream? Plot the data on a run chart in time sequence. Is there a process shift? Does your gage have inadequate resolution? Did more than one person take the measurements? Were the measurements taken on different gages, or at different times?

If the answers to all of these questions are no, try repeating the study using a larger sample size. This should force the p-value higher or lower than 0.05 and make the histogram more obvious.
I have been confused!!
I have 125 datas in 25 groups (subgroup size=5), so i did normality test in minitab but I recived so diffrent answer for same data:
with Anderson-Darling :
P-value<0.005 so it`s nonnormal

with Ryan-Joiner :
RJ=1
P-value>0.1 so it`s normal

with Koimogorv-Smirnov:
KS=0.084
P-value=0.037 so it`s nonnormal

so, which models I should use?

Last edited by a moderator:

#### bobdoering

Trusted Information Resource
I really don't like doing the analysis in this manner. My preference is to run curve fitting and check to see which curve fits best. If it is the normal curve, then fine.

But, here is the thing: Minitab is a tool to aid understanding, not to provide understanding. What is the process, and what would one expect the distribution to be? Often the time-ordered run chart has as much to say about the distribution than just jumping to normality checking. Also, can you prepare the total variance equation? (click link to see example) What are the expected distributions of each of the contributors of variation? Assuming a process has one net variation is simultaneously assuming that all of the variation in the total variation equation are insignificant except one big factor. Sometimes that is true - sometimes the biggest factor is measurement error, which tends to be normal, and does a great job of masking the true process variation distribution.

So, the first question is: why do you need to know if the distribution is normal? Is it to determine capability? Then you probably jumped ahead a few steps.
-First step is figure out your total variance equation.
-Then, ensure all of the factors that should be held statistically insignificant are - measurement error, gage error, material variation, etc. - so that the resulting variation is the one you are trying to study. If there are more than one factor that are significant, you can get more than one "modes". Is that "not capable" or "not in control"? Of course not - but simpler statistical thinkers may have you thinking that is the case. Fact is, what are the odds that you will only have one significant factor looking at the total variance equation? Pretty slim.
-The next step is to prepare a run chart to see if there was any significant trends, or if the data was truly random and discrete (what you want for a normal distribution).
-Finally, run the data through a distribution fit analysis, and find the best fit. If, as an example, the best fit is p=.6 and the normal curve fit p=.4, you can likely use the normal statistics as an estimate of the curve behavior. But, the best fit is p=.6 and the normal curve fit p=.05, then you need to look at the correct distribution's statistics or...as a last choice, perform transformation.

Last edited:

#### Miner

##### Forum Moderator
Staff member
I have been confused!!
I have 125 datas in 25 groups (subgroup size=5), so i did normality test in minitab but I recived so diffrent answer for same data:
with Anderson-Darling :
P-value<0.005 so it`s nonnormal

with Ryan-Joiner :
RJ=1
P-value>0.1 so it`s normal

with Koimogorv-Smirnov:
KS=0.084
P-value=0.037 so it`s nonnormal

so, which models I should use?

Anderson-Darling is a modification of the Komolgorov-Smirnov test that makes it more sensitive to deviations in the tails of the distribution.

I have not used the Ryan-Joiner test, but know that it is correlation-based and is supposed to be of similar sensitivity as the A-D test.

I would have to see the data to be certain, but it would have to be something in the tail of the distribution that does not strongly impact correlation.

E

#### Ehsan Heidari

Normality Test with Minitab

Hello Minner

I attached my data.
If you have any experience about Runs Test(in mintab) please help me because of I did RUNS TEST in the same data and the P-value=0.97.

#### Attachments

• 118 KB Views: 350
Last edited by a moderator: