Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs DOE

V

VijayMaldini

Hi all
I work with new product Quality. The product is not even into the market and still in pre production and Testing stage (Prototyping).

We are in the process of finding out the reasons why we get a strength (durability) of 90 units in the lab but we only get 60 units when we put it to manufacturing.

It involves only a few process steps. How can we go about in identifying significant factors. Cause and Effect diagram and 5 Why analysis?
Once we identify them should we go for Regression or Correlation or ANOVA or DOE ?

I also would like to know the general difference between using Regression, T- Test, DOE or ANOVA. What is the difference between these approaches? and Any Special Cases to use them?

B

Barbara B

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

It involves only a few process steps. How can we go about in identifying significant factors. Cause and Effect diagram and 5 Why analysis?
A cause and effect diagram will show possible vital influences on the process outcame (good&bad). With 5 why you can try to find the root cause for bad process outcomes. Both methods will help to select possible significant factors, but none will give you a list with THE significant factors - simply because C&E and 5Why are based on your knowledge, not the data gathered.

Once we identify them should we go for Regression or Correlation or ANOVA or DOE ?

I also would like to know the general difference between using Regression, T- Test, DOE or ANOVA. What is the difference between these approaches? and Any Special Cases to use them?
The appropriate method depends on the type of data collected and the data structure. For multiple factors a model (Regression, ANOVA, GLM) is better than a simple test (like a t-test), because with a model more complex data structures could be evaluated.

For example: Your process consists on three process steps (ps1, ps2, ps3) and the strength was measured after each step, so you have three means for the strength (m1, m2, m3). With a t-test you could evaluate pairwise differences (like difference between m1 and m3). A model would test the hypothesis "Does (at least) 1 process step exist which has different values than other process steps?" And a model can take into account further process settings like temperature, materials used, and so on.

More details on the differences between models are given here.

Regards,

Barbara

V

VijayMaldini

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

Thanks a lot Barbara.

Yes. I knew Regression is to be used when the factors (Xs) are all numeric.
And ANOVA only when the Response is numeric. Is there any other special cases for the usages of these methods?
- Regression does not show the interaction between the factors (Xs) right?
- T-test just shows that there is a significant difference between pairs. To identify which pairs are different can we use the Turkey pairwise test?

And also I would like to clarify that We are not measuring the strength at each and every step. There is atleast one input needed at each step and the strength is measured finally.
So in that case, assuming we have 3 steps, how do i go about it? Now should I take factors involved in all the steps and use any of those above methods?

B

Barbara B

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

I knew Regression is to be used when the factors (Xs) are all numeric.
And ANOVA only when the Response is numeric. Is there any other special cases for the usages of these methods?
What do you mean with "special case"?

- Regression does not show the interaction between the factors (Xs) right?
No. Regression methods are quite flexible and can evaluate direct influences (main effects) as well as interactions between 2 or more factors and polynomial influences (quadratic effects, cubic effects, and so on).

With Minitab 16 you can analyze your data using
Stat > Regression > General Regression
where you can model any user-defined structure for numeric variables and additionally the influence of text variables (like material) and the interactions between both. Polynomial effects could only be estimated for numeric variables (and that's a mathematical limitation, not one of Minitab).

- T-test just shows that there is a significant difference between pairs. To identify which pairs are different can we use the Turkey pairwise test?
Tukey's pairwise tests for differences is one method to compare the means of groups with respect to an overall confidence level, Dunnett, Bonferroni and Sidak are others which could be chosen in Minitab (Stat > ANOVA > GLM).

Even if the menus are named differently and the option with general regression and GLM also differ, the results for regression and GLM are identical due to identical formulas (just try it for yourself and you'll get the same values e.g. in the ANOVA tables).

And also I would like to clarify that We are not measuring the strength at each and every step. There is atleast one input needed at each step and the strength is measured finally.
So in that case, assuming we have 3 steps, how do i go about it? Now should I take factors involved in all the steps and use any of those above methods?
You can take a close look at the cause and effect diagram to select likely vital factors/variables for each process step. For these settings (e.g. temperature=60°F, method=A1, etc.) the corresponding process outcome "strength" can be assigned. The data could be evaluated with general regression or GLM (depending on the options provided in the menus) or both.

Hope this helps,

Barbara

Bev D

Heretical Statistician
Staff member
Super Moderator
Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

one thought for the root cause analysis (not statistical method): the "5-why" approach is usually more effective than the fishbone diagram/brainstorming approach when the appropriate diagnostic tools are applied. Simplistically this is because fishbone diagrams focus more on how things are supposed to work and 5-why focuses on how they can fail.

Since you state that the difference in results is between the R&D process and the manufacturing process, consider changing process steps. using the same raw materials build a set of product in R&D and in Manufacturing. (these are your 'controls') Now using the same raw materials build set 3 halfway through the manufacturing process and finish it in R&D. Build set 4 halfway thru R&D and finish it in Manufacturing. Then you can repeat this split within the half of the process that made a difference. (with differences of 90-60 you won't really need any statistical math to 'see' the difference.)

IF you are using different measuremetn systems in R&D and in manufacturing a sanity check is to calibrate and perhaps perform a method comparison on the two systems to ensure that the difference you are seeing is not due to the measurement system.

K

kaikai

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

When it comes to analysis, if Response variable is numeric, independent and normal-distributed, General linear model(GLM) is useful choice.
This method include ANOVA,ANCOVA,Regression Analysis etc.
It is very useful and flexible method that can deal with categorical and continuous dependent variables(Factors) at the same model. Of cource interaction is freely modeled.
Most recent statistical software(MiniTab,JMP,R,etc...) can afford GLM.
So, I recommend this method for the data analysis.

B

Barbara B

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

When it comes to analysis, if Response variable is numeric, independent and normal-distributed, General linear model(GLM) is useful choice.
The response variable doesn't even have to be numeric for a GLM, see e.g. R help on glm for details how to model data which follows a binomial distribution with the option glm(..., family=binomial()). In Minitab a glm could only be assigned to a numeric response, but this is a Minitab thing, not a mathematical restriction. For modelling a text variable as a response in Minitab, you could use Stat > Regression > binary/ordinal/nominal logistic regression.

What do you mean with "independent response variable"? Independent of what? If there aren't any dependencies of the outcome assignable to process factors, a model could not give you information about the process (e.g. if you take the strength as response it should be independent of the number of cars in the parking lot of the company - but what is the point in proving this aspect?)

And the response variable doesn't have to follow a normal distribution (or any other distribution). There are requirements for a good model (like a glm) which deal with the distribution / mathematical properties of the error term (see Gauss–Markov_theorem -> Gauss-Markov assumptions for details).

Regards,

Barbara

K

kaikai

Re: Identifying Significant Factors - Regression Analysis vs Correlation vs ANOVA vs

As I wrote my post, I meant GLM as general linear model.
In this model the response variable is limited to be numeric.

Your GLM must mean generalized linear model.
Both model have same abbreviated name, GLM.