How to identify whether my data is non-normal?

B

bkarthikeyan

When using SPC how to identify whether my data follows normal or non-nroaml ( without using minitab or softwares). Can we use Histogram to find these?? If so any min. no samples to be taken??
 

BradM

Leader
Admin
Well, hello there! The easiest thing I would do is put the data into Excel and graph it. At least you can visually make a reasonable assertion as to the normality of the data.

Always take as much data as possible. The more the better.

If you don't mind me asking, why are you asking this? What situation are you attempting to remedy?

What kind of process are you trying to establish?

I'm just interested as to your question regarding how much data to determine normality.
 
B

bkarthikeyan

Dear Mr BradM,

I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies.
 

Stijloor

Leader
Super Moderator
Dear Mr BradM,

I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies.

Hello bkarthikeyan,

The simplest way to check for normality is to develop a histogram based on a 100 (or more) piece random sample and visually assess if the data are normally distributed. I do not know how familiar you are with developing a histogram...here is some information on how to do this. Hope this helps.

http://www.buhs.k12.vt.us/science/physicalscience/histograms/histogram_tutorial_page_4.html

http://www.google.com/search?q=how+to+make+a+histogram&hl=en&start=20&sa=N
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
There are a number of statistical tests for normality. Some of the easiest are based upon if the skewness and kurtosis are the proper value for the normal distribution.

The most common test (and can be implemented with a little excel programming) is a chi-square test on the distribution of the data into histogram bins as compared to what the normal distribution would predict for the bins.
 

BradM

Leader
Admin
Dear Mr BradM,

I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies.

Superb posts from Steve and Stijloor (as usual). Forgive me if I keep coming back to the same issue, but why limit how much data you will obtain? Is it cost prohibitive?

If you only collect 15 data points, you will need to be concerned with normality. If you collect 1500 data points, normality will virtually be a given. This is why it's important to try to figure out how much data you're looking at, and whether normality is an issue. Also, depending on how many variables per sample you are trying to assess, the power of your inferences may become very low if you don't have enough data.

As mentioned by the others, even though your post specifically said you wanted to keep away from a software package for a decision, most decent software packages have statistical tools to estimate normality.

Here is a link on a thread where we had a good discussion on non-parametric statistics, if you're interested:

Non-Parametric statistics
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
If you collect 1500 data points, normality will virtually be a given.

One comment - this statement is only true if you are dealing with the average of the 1500 values. If you are dealing with the "tails" of the distribution (such as trying to predict failure) normality is definitely not a "given". I think this is what fooled the developers of six sigma into thinking there was a 1.5 sigma shift.

There is a good book out there by Nassim Taleb called The Black Swan. It's an interesting book, written in plain language. Tom Peters has been highly supportive of his works. The Black Swan does show how we are often fooled by assuming normality, and by our reactions to rare events.
 

Stijloor

Leader
Super Moderator
There are a number of statistical tests for normality. Some of the easiest are based upon if the skewness and kurtosis are the proper value for the normal distribution.

The most common test (and can be implemented with a little excel programming) is a chi-square test on the distribution of the data into histogram bins as compared to what the normal distribution would predict for the bins.

Hello Steve,

Great suggestion, but the poster, bkarthikeyan, indicated that it needed to be determined without Minitab or other software. I assume that they may not have access to these resources. So it's back to basics....

You know? Now I'm thinking of it, that's what the (SPC) Masters did...

Stijloor.
 

BradM

Leader
Admin
One comment - this statement is only true if you are dealing with the average of the 1500 values. If you are dealing with the "tails" of the distribution (such as trying to predict failure) normality is definitely not a "given". I think this is what fooled the developers of six sigma into thinking there was a 1.5 sigma shift.

There is a good book out there by Nassim Taleb called The Black Swan. It's an interesting book, written in plain language. Tom Peters has been highly supportive of his works. The Black Swan does show how we are often fooled by assuming normality, and by our reactions to rare events.

Good point. I purposely did not state the Central Limit Theorem here, and it is the average of them that approaches normality, as you correctly pointed out.

However, if I am measuring the paint thickness and I have 1500 data points, that data will most assuredly represent a normal distribution. Over time (as you stated) it might become apparent that the distribution changes to more accurately represent the entire population.

Since the OP mentioned SPC, your statement is highly valid. 1500 data points in July may represent the upper end of the year's distribution.

P.S. If I tell you I still believe in the 1.5 shift, will you still respect me in the morning?:tg:
 

BradM

Leader
Admin
Hello Steve,

Great suggestion, but the poster, bkarthikeyan, indicated that it needed to be determined without Minitab or other software. I assume that they may not have access to these resources. So it's back to basics....

You know? Now I'm thinking of it, that's what the (SPC) Masters did...

Stijloor.

Yes, they did get back to the basics. Nice point. So I'm suggesting if there is a small amount of data, just graph it (with whatever tools you have... does anybody still have graph paper?:lol:) and you can see if it's reasonably close enough to assume normality. If it's highly skewed, that is valuable to know, and you then do something different.
 
Top Bottom