(ctd.)

To describe the impact of the groups (process conditions), the samples and the displacement TOGETHER a model is used. This could be done in Minitab via

Stat > Regression > General Regression [available since Minitab R16]

or

Stat > ANOVA > General Linear Model

The models in both menus are (mathematically) the same, the options differ a little bit.

Build a model (you'll find the Minitab project attached in the zip archive):

Stat > Regression > General Regression

Response: 'value [N]'

Model: displacement group

> Graphs: Residual Plots: Choose "Four in One"

> OK > OK

The terms in the Model field (displacement group) are the possible impacts on the response (value [N]).

The residual plots show a parabolic pattern on the upper right side (residuals vs. fits). Therefore a quadratic impact of displacement should be added to the model:

Stat > Regression > General Regression

Response: 'value [N]'

Model: displacement group displacement*displacement

> Graphs: Residual Plots: Choose "Four in One"

> OK > OK

Now the residual plots look better, except for a non-constant variation in the residual vs. fits plot on the upper right side. This is called heteroscedasticity (see

Heteroscedasticity) and shouldn't be present in a good model.

To stabilize the variation pattern a Box-Cox transformation can be used (see

Box-Cox_transformation for details). The Box-Cox-transformation can only be applied to positive values (>0), so we add a constant to the "value [N]" (e.g. +0.1) and check if such a transformation yield better results:

Store "value +0.1" in the worksheet:

Calc > Calculator

Store result in variable: 'value +0.1'

[single quotation marks are necessary!]
Expression: 'value [N]'+0.1

> OK

Build a model and check if a Box-Cox transformation gives better results:

Stat > Regression > General Regression

Response: 'value +0.1'

Model: displacement group diplacement*displacement

> Graphs: Residual Plots: Choose "Four in One" > OK

> Box-Cox: Check "Box-Cox power transformation (W=Y**Lambda)"

> OK > OK

The residuals vs. fitted plot looks better now. (It should show no pattern in a good model). The session window provides informations about the optimal Box-Cox transformation

Code:

Box-Cox transformation of the response with estimated lambda = 0.541669
The 95% CI for lambda is (0.439169, 0.651169)
Rounded lambda = 0.5 used in the regression analysis

A lambda of 0.5 is the same as using the square root of the response. So in the next model we'll use the square root of the original response "value [N]":

Store "sqrt(value)" in the worksheet:

Calc > Calculator

Store result in variable: 'sqrt(value)'

[single quotation marks are necessary!]
Expression: SQRT('value [N]')

> OK

Build a model for the square root of 'value [N]':

Stat > Regression > General Regression

Response: 'sqrt(value)'

Model: displacement group displacement*displacement

> Graphs: Residual Plots: Choose "Four in One" > OK

> Box-Cox: UNcheck "Box-Cox power transformation (W=Y**Lambda)"

[we don't need this any more]
> OK > OK

The fourth residual plots look okay, but the residual vs. fits plot shows a non-linear pattern. In addition the Lack-of-Fit test is significant (p=0.0003<0.05):

Code:

Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 5 20.6407 20.6407 4.12814 505.28 0.0000000
displacement 1 16.2452 9.8576 9.85763 1206.57 0.0000000
group 3 0.0499 0.4161 0.13870 16.98 0.0000000
displacement*displacement 1 4.3456 4.3456 4.34561 531.90 0.0000000
Error 208 1.6994 1.6994 0.00817
**Lack-of-Fit 69 0.8427 0.8427 0.01221 1.98 0.0003400**
Pure Error 139 0.8567 0.8567 0.00616
Total 213 22.3401

Without any technical knowledge about the process and measurement it is hard to tell whether a more complex model is appropriate. But if a cubic term for displacement is added, the model quality is acceptable:

Stat > Regression > General Regression

Response: 'sqrt(value)'

Model: displacement group displacement*displacement displacement*displacement*displacement

> Graphs: Residual Plots: Choose "Four in One"

> OK > OK

The residuals follow the blue line in the normal probability plot (upper left). (I won't rely on the histogram because its appearance depends highly on the number of bars used and the location of the bar limits.) In the residual plot residuals vs. fit only a slight pattern is visible now. The pattern in the plot residuals vs. observation order (bottom right) is still present and could be caused by heating/cooling of the measurement system or other systematic effects in the process.

Code:

Lack-of-Fit test: p=0.9112 > 0.05

(no lack of fit detectable)

Code:

R?=94.77%, R?(adj)=94.62%, R?(pred)=94.37%

(model explains the response quite well with a high prediction quality)

A significant impact on the response sqrt(value) is present for all model terms (group, displacement, quadratic displacement, cubic displacement) and the differences between the groups can be seen in the regression equations (session window):

Code:

Regression Equation
group
Group1 sqrt(value) = 0,121673 + 2,33448 displacement - 1,81231
displacement*displacement + 0,467946
displacement*displacement*displacement
Group2 sqrt(value) = 0,0388895 + 2,33448 displacement - 1,81231
displacement*displacement + 0,467946
displacement*displacement*displacement
Group3 sqrt(value) = 0,045901 + 2,33448 displacement - 1,81231
displacement*displacement + 0,467946
displacement*displacement*displacement
Group4 sqrt(value) = 0,0190389 + 2,33448 displacement - 1,81231
displacement*displacement + 0,467946
displacement*displacement*displacement

If you want to illustrate these differences, the General Linear Model function can be used:

Stat > ANOVA > General Linear Model

Response: sqrt(value)

Model: displacement group displacement* displacement displacement* displacement* displacement

> Covariates: displacement

[the numeric terms must be filled in here] > OK

> Graphs: Residual Plots: Choose "Four in One" > OK

> Factor Plots: Main Effect Plots: Select "group"

> OK > OK

The main effects plot shows that the mean in the first group (Group1) is much higher than the means in the other three groups. This main effects plot is build based on the model (see subtitle "Fitted Means"), meaning it shows the differences between the groups without the displacement impact.

What statistical analysis couldn't answer is if this model makes sense from a technical point of view, why the process conditions in Group2, 3 and 4 were only tested with smaller displacement values than in Group1 and what causes the pattern in the residuals vs. observation order plots.

Hope this helps nevertheless