Capability Analysis of Non-Normal Data

S

Seyed

Hey,

I have some problems with my capability analyses of my data. The data that I have are from a Temperature uniformity survey of a heat treatment furnace from the automotive industry. I have used 20 different thermo couples positioned throughout the cross-section of the furnace (imagine two parallel boxes with 10 different thermo couples mounted on each).

The total process time is 7 hours and every three seconds the temperatures are stored from all the thermo couples. Since the data is huge I have averaged the data after each 90 seconds.

How do I know that my data are non-normal? I have performed the Andersson-Darling test and the P-values are less than 0.05 and therefore I assumed that the data are non-normal.

I have also tried to use the Cox-box transformation to transform the data, but I really don’t understand the output as the input is temperature (in the range of 890 - 920C). The output is values in the range of 650E+15 which I don’t understand.

How can I perform a correct Capability analysis and calculate my Pp, Ppk, Cp and Cpk an using my non-normal data? And how can transform the data and understand the output?

Please is there anybody who can guide my throughout this problem. I can give you more information upon request.
 

Miner

Forum Moderator
Leader
Admin
Re: Capability Analyses of non-normal data

Hey,

I have some problems with my capability analyses of my data. The data that I have are from a Temperature uniformity survey of a heat treatment furnace from the automotive industry. I have used 20 different thermo couples positioned throughout the cross-section of the furnace (imagine two parallel boxes with 10 different thermo couples mounted on each).
Before analyzing the aggregate data, I recommend that you analyze the individual process streams (i.e., each thermocouple) separately. Individual streams may be normal, yet the aggregate be non-normal because each stream average is different.
The total process time is 7 hours and every three seconds the temperatures are stored from all the thermo couples. Since the data is huge I have averaged the data after each 90 seconds.
Temperature data are almost guaranteed to be autocorrelated. This means that each temperature reading is dependent on the temperature reading taken 3 seconds earlier. Perform an autocorrelation analysis. Once you have identified the period of autocorrelation (i.e., the period during which the dependency exists), select an individual temperature measurement at a period longer than that. Do not average.

How do I know that my data are non-normal? I have performed the Andersson-Darling test and the P-values are less than 0.05 and therefore I assumed that the data are non-normal.
You are correct in your approach and decision. The trick is in figuring out WHY the data are non-normal. I suspect that it is multi-modal from the mixing of 20 process streams.

Once you have determined whether autocorrelation exists, and sampled at a frequency greater than the autocorrelation period, plot the data on I-MR charts by thermocouple. This will tell you whether each thermocouple zone is stable and what mean temperature and variance exists in that zone. This will be extremely important if improvements are necessary.

You may consider other tests such as a 0ne-way ANOVA followed by multiple comparison of means to determine whether the differences between thermocouples are significant. You can safely pool those thermocouples that are not statistically different. Do not pool those that are different from each other.
I have also tried to use the Cox-box transformation to transform the data, but I really don’t understand the output as the input is temperature (in the range of 890 - 920C). The output is values in the range of 650E+15 which I don’t understand.
I am not a big fan of transforming data for SPC or capability studies. This is one reason among many. The capability indices are the only numbers with any meaning when you transform the data.

How can I perform a correct Capability analysis and calculate my Pp, Ppk, Cp and Cpk an using my non-normal data? And how can transform the data and understand the output?

Please is there anybody who can guide my throughout this problem. I can give you more information upon request.
I use Minitab's non-normal capability analysis that allows you to analyze the data in the untransformed state. You do not have this issue with numbers that are meaningless.
 
Last edited:
S

Seyed

Thanks for your support,

More information (comment to the reply from Mr. Miner):

As the temperature is increasing from 600C (the charge is entering the furnace) up to 910 and back to room temperature (the charge is leaving the furnace) it is difficult to analyze the entire process time at once. One have to divide the process time in several sections, however as the NADCAP advises one can select that are where the temperature should be stable. By selecting that time of period with a stable temperature, which are approx. 2 hours the temperature are in the range of 890 - 920 C.
Within that period I have averaged as mentioned before the data in order to reduce the data points and analyzed each thermocouple separately. Only three different channels are normal-distributed and 17 are non-normal.

The question is that should the data be non-normal? The process should be stable in the region of interest, however we know that when the door (entrance or exit) is opened we will have a drop in temperature and it will influence the readings.

So I think the idea of autocorrelation is important and it should be performed in order to identify the period of autocorrelation. But I think the data should be averaged, or?

Thanks in advanced for your expertise and comments.
Seyed
 

Miner

Forum Moderator
Leader
Admin
As the temperature is increasing from 600C (the charge is entering the furnace) up to 910 and back to room temperature (the charge is leaving the furnace) it is difficult to analyze the entire process time at once. One have to divide the process time in several sections, however as the NADCAP advises one can select that are where the temperature should be stable. By selecting that time of period with a stable temperature, which are approx. 2 hours the temperature are in the range of 890 - 920 C.
Within that period I have averaged as mentioned before the data in order to reduce the data points and analyzed each thermocouple separately. Only three different channels are normal-distributed and 17 are non-normal.
Are the non-normal thermocouples those closest to the doors and most likely to be influenced by the temperature loss when the doors are opened?

The question is that should the data be non-normal? The process should be stable in the region of interest, however we know that when the door (entrance or exit) is opened we will have a drop in temperature and it will influence the readings.

I would stratify the data into three (or more) stages: 1) ramp-up after doors are closed; 2) Steady state; and 3) ramp down when doors are opened. Analyze autocorrelation, normality and capability of stage 2 (steady state). I would analyze stages 1 and 3 separately, but using the same approach as follows: Analyze autocorrelation then time series analysis to understand the ramp-up, ramp-down effect. If you have more stages than this, use a similar approach for each stage.

So I think the idea of autocorrelation is important and it should be performed in order to identify the period of autocorrelation. But I think the data should be averaged, or?

Do not average the data before performing the autocorrelation study. When you average data, you lose information about your process. As I said before, once you know the period of autocorrelation select one measurement at intervals slightly greater than the period of autocorrelation. This will reduce your data set.
 

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
When you average data, you lose information about your process.

That is the bottom line!! Any chance of sharing the raw data in a spreadsheet form, by thermocouple, with conditions identified (ramp start, door open, etc.)?
 
S

Seyed

Hello again,

I have tried to perform the autocorrelation for the unaveraged data in the steady state. I have als tried to read me through different guide lines for how to perform a autocorrelation and how to interpret the data.

However I have a couple of questions:
1) should i perform the autocorrelation for each channel by it self or should I put all the data in one row in the MINITAB and perform the autocorrelation for all the data at once?
2) After performing the autocorrelation or channel 1 at all the tags the autocorrelation value (a=0.05) is larger than the significance level of 95%. What does this means?

Check the PDF file attached.

Thanks for your support.
seyed
 

Attachments

  • Autocorrelation.pdf
    53.1 KB · Views: 175

Miner

Forum Moderator
Leader
Admin
Analyze each thermocouple separately. The attached graphs do show a high degree of autocorrelatation. When the blue bars dip between the two red lines, autocorrelation no longer exists. The corresponding number of lags x the sampling interval is the period of autocorrelation.

Use data points selected at an interval greater than the period of autocorrelation and analyze it as discussed previuosly.
 
S

Seyed

Yes, of course,

I will add a Excel file with 4 different thermocouples. The time of period selected are within the stable zone. Thanks your your support!!!

Seyed
 
S

Seyed

Hey,

Here is the excel file for those who would like to have a challange. And of course I really are thankful for all your support and expertise.

Seyed

ps. looking forward for your help.:agree:
 

Attachments

  • Test 1.xls
    131.5 KB · Views: 159

bobdoering

Stop X-bar/R Madness!!
Trusted Information Resource
Hey,

Here is the excel file for those who would like to have a challange. And of course I really are thankful for all your support and expertise.

Seyed

ps. looking forward for your help.:agree:

Since you are looking for capability, what is the tolerance for the region of the data you supplied?
 
Top Bottom