bkarthikeyan
1st July 2007, 10:46 PM
When using SPC how to identify whether my data follows normal or non-nroaml ( without using minitab or softwares). Can we use Histogram to find these?? If so any min. no samples to be taken??
|
*Please be aware that SOME RECENT forum threads may not yet be indexed by Google. |
|
View Full Version : How to identify whether my data is non-normal? bkarthikeyan 1st July 2007, 10:46 PM When using SPC how to identify whether my data follows normal or non-nroaml ( without using minitab or softwares). Can we use Histogram to find these?? If so any min. no samples to be taken?? BradM 1st July 2007, 11:40 PM Well, hello there! The easiest thing I would do is put the data into Excel and graph it. At least you can visually make a reasonable assertion as to the normality of the data. Always take as much data as possible. The more the better. If you don't mind me asking, why are you asking this? What situation are you attempting to remedy? What kind of process are you trying to establish? I'm just interested as to your question regarding how much data to determine normality. bkarthikeyan 2nd July 2007, 04:53 AM Dear Mr BradM, I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies. Stijloor 2nd July 2007, 05:26 AM Dear Mr BradM, I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies. Hello bkarthikeyan, The simplest way to check for normality is to develop a histogram based on a 100 (or more) piece random sample and visually assess if the data are normally distributed. I do not know how familiar you are with developing a histogram...here is some information on how to do this. Hope this helps. http://www.buhs.k12.vt.us/science/physicalscience/histograms/histogram_tutorial_page_4.html http://www.google.com/search?q=how+to+make+a+histogram&hl=en&start=20&sa=N Steve Prevette 2nd July 2007, 11:08 AM There are a number of statistical tests for normality. Some of the easiest are based upon if the skewness and kurtosis are the proper value for the normal distribution. The most common test (and can be implemented with a little excel programming) is a chi-square test on the distribution of the data into histogram bins as compared to what the normal distribution would predict for the bins. BradM 2nd July 2007, 11:18 AM Dear Mr BradM, I was explaining SPC concepts to one of the fresher to my org. During discussion this question came. I know that through Minitab we can check whether data is normal or not . If we dont have Minitab then how to find out ?? In our factory our Mfg process incudes Stamping, Painting, Assembly so I want to know how much data needed because each process output rate varies. Superb posts from Steve and Stijloor (as usual). Forgive me if I keep coming back to the same issue, but why limit how much data you will obtain? Is it cost prohibitive? If you only collect 15 data points, you will need to be concerned with normality. If you collect 1500 data points, normality will virtually be a given. This is why it's important to try to figure out how much data you're looking at, and whether normality is an issue. Also, depending on how many variables per sample you are trying to assess, the power of your inferences may become very low if you don't have enough data. As mentioned by the others, even though your post specifically said you wanted to keep away from a software package for a decision, most decent software packages have statistical tools to estimate normality. Here is a link on a thread where we had a good discussion on non-parametric statistics, if you're interested: Non-Parametric statistics (http://elsmar.com/Forums/showthread.php?t=18932&highlight=non-parametric) Steve Prevette 2nd July 2007, 11:23 AM If you collect 1500 data points, normality will virtually be a given. One comment - this statement is only true if you are dealing with the average of the 1500 values. If you are dealing with the "tails" of the distribution (such as trying to predict failure) normality is definitely not a "given". I think this is what fooled the developers of six sigma into thinking there was a 1.5 sigma shift. There is a good book out there by Nassim Taleb called The Black Swan. It's an interesting book, written in plain language. Tom Peters has been highly supportive of his works. The Black Swan does show how we are often fooled by assuming normality, and by our reactions to rare events. Stijloor 2nd July 2007, 11:25 AM There are a number of statistical tests for normality. Some of the easiest are based upon if the skewness and kurtosis are the proper value for the normal distribution. The most common test (and can be implemented with a little excel programming) is a chi-square test on the distribution of the data into histogram bins as compared to what the normal distribution would predict for the bins. Hello Steve, Great suggestion, but the poster, bkarthikeyan, indicated that it needed to be determined without Minitab or other software. I assume that they may not have access to these resources. So it's back to basics.... You know? Now I'm thinking of it, that's what the (SPC) Masters did... Stijloor. BradM 2nd July 2007, 11:36 AM One comment - this statement is only true if you are dealing with the average of the 1500 values. If you are dealing with the "tails" of the distribution (such as trying to predict failure) normality is definitely not a "given". I think this is what fooled the developers of six sigma into thinking there was a 1.5 sigma shift. There is a good book out there by Nassim Taleb called The Black Swan. It's an interesting book, written in plain language. Tom Peters has been highly supportive of his works. The Black Swan does show how we are often fooled by assuming normality, and by our reactions to rare events. Good point. I purposely did not state the Central Limit Theorem here, and it is the average of them that approaches normality, as you correctly pointed out. However, if I am measuring the paint thickness and I have 1500 data points, that data will most assuredly represent a normal distribution. Over time (as you stated) it might become apparent that the distribution changes to more accurately represent the entire population. Since the OP mentioned SPC, your statement is highly valid. 1500 data points in July may represent the upper end of the year's distribution. P.S. If I tell you I still believe in the 1.5 shift, will you still respect me in the morning?:tg: BradM 2nd July 2007, 11:38 AM Hello Steve, Great suggestion, but the poster, bkarthikeyan, indicated that it needed to be determined without Minitab or other software. I assume that they may not have access to these resources. So it's back to basics.... You know? Now I'm thinking of it, that's what the (SPC) Masters did... Stijloor. Yes, they did get back to the basics. Nice point. So I'm suggesting if there is a small amount of data, just graph it (with whatever tools you have... does anybody still have graph paper?:lol:) and you can see if it's reasonably close enough to assume normality. If it's highly skewed, that is valuable to know, and you then do something different. Steve Prevette 2nd July 2007, 11:45 AM P.S. If I tell you I still believe in the 1.5 shift, will you still respect me in the morning?:tg: But of course.:yes: Even if you are using SPC you may still have to watch for nonnormality. If you are doing the Cp and Cpk calculations, your are assuming normality in the tails. SPC is based upon Tchebychev, who showed that up to 1/9th of the data may be outside three standard deviations from the average. Darius 2nd July 2007, 11:52 AM Agree with Steve Prevette The most common test (and can be implemented with a little excel programming) is a chi-square test on the distribution of the data, no visual interpretations included. But the question that I have is.....:notme: What are you going to do if the data shows to be "no normal"?, Don Wheeler showed that SPC works altho the data shows no normal distribution. The problem may arrise that some patterns like runs can happen without any special cause but in such cases just deactive such rules.:bonk: Steve Prevette 2nd July 2007, 12:07 PM Agree with Steve Prevette , no visual interpretations included. But the question that I have is.....:notme: What are you going to do if the data shows to be "no normal"?, Don Wheeler showed that SPC works altho the data shows no normal distribution. The problem may arrise that some patterns like runs can happen without any special cause but in such cases just deactive such rules.:bonk: Yes, even Dr. Shewhart showed that SPC works on non-normal distributions. The question is - what does SPC do? It can detect changes (trends) in the data, even if non normal. It can give you predictions of future performance, if nothing changes. The difficulty is - if you are going to make predictions about performance in the "tails" such as - what is the probability of exceeding six standard deviations from the average, you may be making an incorrect assumption if you use the normal distribution to answer that question. Xymark 9th July 2007, 08:39 AM Does anyone know how best to deal with data which does not have an upper and lower limit ? We have some reflectivity measurements of optics we use in our lasers. We need a reflectivity of 99% but you can not have a figure greater than 100% reflective, so do not get a normal distribution. Should I just use the lower limit and ignore the upper end? We are being asked to calculate Cpk values but is that valid for data of this sort ? Darius 9th July 2007, 11:32 AM Altho there may be different opinions, IMHO Cpk is wrong from 1 spec control limit, and specially with cases where time is involved, you can have a target time, but the delays make a non normal distribution very likely or cases where you have a constrain that make your data non normal. Search about cpm or cpmk, those indicators could make the deal, or transform your data (but remember to transform specs too). Jim Shelor 9th July 2007, 01:00 PM One comment - this statement is only true if you are dealing with the average of the 1500 values. If you are dealing with the "tails" of the distribution (such as trying to predict failure) normality is definitely not a "given". I think this is what fooled the developers of six sigma into thinking there was a 1.5 sigma shift. There is a good book out there by Nassim Taleb called The Black Swan. It's an interesting book, written in plain language. Tom Peters has been highly supportive of his works. The Black Swan does show how we are often fooled by assuming normality, and by our reactions to rare events. Forgive me if I am jumping to a concussion, but did I just read that somebody agrees control charts assume normality and non-normal data produces results that are not necessarily reliable? Falling out of my chair. Jim Shelor PMP, CSSBB Jim Shelor 9th July 2007, 01:25 PM Good point. I purposely did not state the Central Limit Theorem here, and it is the average of them that approaches normality, as you correctly pointed out. However, if I am measuring the paint thickness and I have 1500 data points, that data will most assuredly represent a normal distribution. Over time (as you stated) it might become apparent that the distribution changes to more accurately represent the entire population. Since the OP mentioned SPC, your statement is highly valid. 1500 data points in July may represent the upper end of the year's distribution. P.S. If I tell you I still believe in the 1.5 shift, will you still respect me in the morning?:tg: Brad, I have also been converted. I used to think the 1.5 sigma drift was smoke and mirrors, I no longer believe that. I am now convinced that the 1.5 sigma shift is true and applies to virtually all processes. Best regards, Jim Shelor, PMP, CSSBB Steve Prevette 11th July 2007, 11:54 AM Forgive me if I am jumping to a concussion, but did I just read that somebody agrees control charts assume normality and non-normal data produces results that are not necessarily reliable? Falling out of my chair. Jim Shelor PMP, CSSBB Not to worry, you can right your chair. The issue here is what does SPC represent? For those that treat it as an exercise in Normal probabilities, and claim that 99.7% of the data are between three standard deviations from the average, I would (and Dr. Deming did) say that they are in error. If we acknowledge that non-normal data can allow up to 11% of the data to be outside three sigma, then that shows understanding the non-normal data exists. Yes, SPC still works in even those extreme cases. But, if I hop to the normal distribution to predict a probability that so many parts will fall out of specification (even with the "1.5 sigma shift") then I may be in error. Odd things can happen way out in the tails. artichoke 22nd July 2007, 02:29 AM When using SPC how to identify whether my data follows normal or non-nroaml ( without using minitab or softwares). Can we use Histogram to find these?? If so any min. no samples to be taken?? Why do you want to determine whether your data for a process is normal ? One of the greatest steps forward in quality in the past 100 years was made by Shewhart, in his development of charts that do not depend on the form of the data distribution (nor do they depend on the Central Limit Theorem). Wheeler has published an excellent little book on this topic "Normality and the process Behaviour Chart". He shows on page 36 that you need 3200 data points to fit your data to a distribution out to only +/- 2.95 sigma ... and by that time your data distribution will probably have changed anyway. Plotting normal distributions over histograms is not only a complete waste of time, it distracts people from the real purpose of histograms ... to gain a better understanding of what the process is doing ... it is the irregularities in histograms that will allow you to make inferences about your process. pagnonig 4th December 2008, 07:04 AM Why do you want to determine whether your data for a process is normal ? One of the greatest steps forward in quality in the past 100 years was made by Shewhart, in his development of charts that do not depend on the form of the data distribution (nor do they depend on the Central Limit Theorem). Wheeler has published an excellent little book on this topic "Normality and the process Behaviour Chart". He shows on page 36 that you need 3200 data points to fit your data to a distribution out to only +/- 2.95 sigma ... and by that time your data distribution will probably have changed anyway. Plotting normal distributions over histograms is not only a complete waste of time, it distracts people from the real purpose of histograms ... to gain a better understanding of what the process is doing ... it is the irregularities in histograms that will allow you to make inferences about your process. So, can I say that normality assumptions are only "useful" when trying to calculate process capability indexes and probability of being outside +/- 3sigma limits? Thank you. Giuseppe bobdoering 4th December 2008, 09:23 AM I like using Distribution Analyzer from variation.com. I reviews a comprehensive number of distributions to find the best fit, and gives you fit data. I use it as a starting point. But, I also look at the process empirically, and try to consider the contributions of variation, and what the expected distribution should be. It is like hypothesis testing. For example, a normal process is one where when it is set at a particular value, the process - without human intervention - will tends to stay at that point, with a few points above and a few points below. If that is a reasonable expectation for your process - and most 'natural' processes are - then your data should support it. If not, why not? There is the opportunity for a non-normal process to be masked by other simultaneous normal variations. Your process data, especially when derived from product data, is really a total variation. That is the sum of all variations - and they my be a mix of distributions. You need to ponder which causes of variation are the most significant and can be controlled. You need to minimize measurement error, as it is normal, and when large will make your process appear to have a normal distribution. However, it may be masking the true distribution. That is why measurement error is a key factor in process control. Bottom line, plug and chug is not going to always give you what you need. You may have to think through what make sense. :cool: bobdoering 4th December 2008, 09:28 AM Plotting normal distributions over histograms is not only a complete waste of time, it distracts people from the real purpose of histograms ... to gain a better understanding of what the process is doing ... it is the irregularities in histograms that will allow you to make inferences about your process. Dr. Wheeler and I may not agree on everything, but I agree with this! It drove me to some of the conclusions I have developed for precision machining! I would add that the histogram without a time-ordered run chart is also of minimal value. The absolutely worst thing that you can do for a capability study is have someone roll up a cart of 100 parts in a pile, measure them, and make a histogram. Unmitigated disaster. :cool: Bev D 4th December 2008, 09:56 AM If you collect 1500 data points, normality will virtually be a given. Not always. The Normal distribution is not typical. Many distributions are naturally skewed or uniform or exponential or geometric, etc. Many other processes yield distributions that are 'bells shaped' but have significant deviations in the tails or the peak. The Normal distribution is commonly known because it's the easiest one to deal with in training and research papers. AVERAGES will tend to have a roughly Normal distribution - close enough to be useful for certain tests like ANOVA and tools like SPC Xbar, S charts. (Ranges and standard deviations do NOT have Normal distribution) But it is not a given and there is nothing wrong with your process if it doesn't yield a Normal distribution. bobdoering 4th December 2008, 10:00 AM But it is not a given and there is nothing wrong with your process if it doesn't yield a Normal distribution. Amen! :agree1: artichoke 4th December 2008, 03:40 PM So, can I say that normality assumptions are only "useful" when trying to calculate process capability indexes and probability of being outside +/- 3sigma limits? Thank you. Giuseppe Sorry, wrong again. Capability indexes can be very misleading. Use a control chart to help understand your process. 3 sigma control limits are NOT probability limits. Understanding this is fundamental to understanding control charts (unfortunately most people don't). Forget about "six sigma tables" based on normal distributions ... such tables are so ridiculous as to be laughable, even for anyone with half an ounce of common sense. Forget about normal distributions in process management. They serve no useful purpose other than to help support poor teaching. artichoke 4th December 2008, 03:46 PM You might also enjoy this exercise in control charting of data with a highly skewed distribution Non Normality is Normal (http://www.q-skills.com/nm/nm.htm) What conclusions can you derive from the histogram for the data ? It should be apparent that no benefit is served (other than assisting in software sales) by the common practise of drawing a normal distribution over the histogram. bobdoering 15th March 2009, 10:25 PM When using SPC how to identify whether my data follows normal or non-normal (without using minitab or software). Can we use Histogram to find these?? If so any min. no samples to be taken?? One other point to make is it is good to understand what your process looks like over time (as in run chart), for starters. One problem with histograms is they can be easily fooled into thinking you have bi-modal distributions, normal distributions and tight distributions, when really you have been victimized by sampling error. Precision machining is clearly a uniform distribution once you have controlled it correctly and have developed the sawtooth curve from the run chart. To get the true distribution in a histogram, your sample must be multiples of complete cycles. If you sample contains any partial cycles, it will not represent the true distribution. This error would affect any statistical software's analysis, and - no matter how much you paid for it - render it useless. For more information for precision machining, see: Statistical process control for precision machining (http://elsmar.com/Forums/blog.php?b=79) |
|