B
Barbara B
Re: Minitab and stats newbies..please help
Your interpretation of the significance tests in the ANOVA table and of the Tukey tests are correct. The difference between both lies in the hypothesis tested:
ANOVA:
H0: All group means are equal.
H1: At least one mean is different from the others.
pairwise comparison (e.g. Tukey, Bonferroni, Dunnett):
H0: The 2 means compared (e.g. type3 and type6) are equal.
H1: The 2 means compared (e.g. type3 and type6) are different.
S = 0.643302 R-Sq = 26.79% R-Sq(adj) = 25.42%
In this line the model quality is characterized:
The better the model, the smaller S and the higher becomes R-Sq and R-Sq(adj). Usually values for R-Sq and R-Sq(adj) above 80% are one criterion for a good model (=a model which can explain the results good). In your model less than 30% of the variation in the ratings is explained due to the terms (factors and interactions) in the model, so this model explains just a small amount of the rating values and is therefore insufficient.
Unusual Observations for mean ratings
Obs mean rating Fit SE Fit Residual St Resid
76 0.37500 3.46500 0.09098 -3.09000 -4.85 R
176 1.80000 3.52000 0.09098 -1.72000 -2.70 R
248 1.33333 3.03333 0.09098 -1.70000 -2.67 R
251 1.33333 2.70667 0.09098 -1.37333 -2.16 R
[...]
R denotes an observation with a large standardized residual.
"Large" is defined as more than 2*standard deviation away from the expected mean. The interval mean+/-2*standard deviation covers 95% of the data (under normal assumption). Your sample size is N=599, so 5% of 599 correspond to 30 ratings which could be expected to lie beyond +/-2*standard deviation. In the table of unusual observations 28 ratings are listed - pretty much the same amount of data which could be expected to be "unusual" or "large" for a data set of N=599.
But there is one rating which differs extremly from the others: Obs 76 (first row) with a standardized residual of -4.85. This value is in fact different and shown as a point far away in the residuals plot. (Btw: I recommend to use the residual plots for analysis instead of the table of unusual observations. It's imho much easier to get the pitfalls out of the plots.)
If you take a look in the data you could see that something went wrong for observation 76: Values for mean ratings for a scale between 1 and 4 have to lie between 1 (all ratings equal 1) and 4 (all ratings equal 4), but the mean in observation 76 (Group 2, Type 1) is 0.375. This occured because the means were calculated out of the sums in your third posting, but the number of ratings were not given there.
The deviations from the blue line and the other deviations from the normal distribution in the residual plots occur probably due to the small resolution of the rating scale. Even if the mean (or percentage alternatively) is calculated, the number of different values is limited, so one couldn't expect that the residuals follow the normal distribution exactly or show a bell-shaped histogram. I would therefore recommend to look for clear deviations from the normal assumption or obvious violations of the variance homogenity assumption after the analysis was done with correct means or percentages (see below).
To avoid further wrong means (and to clarify how I prepared the data earlier), recalculate the means using the row-statistics-function:
Calc > Row Statistics
Statistic: choose mean
Input variables: c23 c36 c37 c41 c42 c43 c44 c45 (e.g. for type1)
Store result in: 'type1 (mean)'
OK
Afterwards stack the columns 'type1 (means)'-'type6 (means)' in a new worksheet:
Data > Stack > Stack Columns
and assign the appropriate Group in a separate column, e.g. by
Calc > Make Patterned Data > Text values
Store patterned data in: Groups
Text values: G1 G2
Number of times to list each value: 50 (n=50 ratings per group and type)
Number of times to list each sequence: 6 (6 types)
OK
and redo the analysis with ANOVA > GLM. (I couldn't do this because I don't have the information how many ratings are given for a sum in a specific row.)
Could you please attach the data used for the GLM analysis in a spreadsheet format, if you have more questions? Thx!
Best regards,
Barbara
Why not plot the percentages with boxplots, just like the means? You could calculate the percentage per row by subtracting 1 of each rating to adjust the range of ratings from 0 to 3 (instead of 1 to 4). Then you calculate the mean of the adjusted ratings (Calc > Row Statistics) and divide it by 3, so the results will lie between 0/3=0% fulfillment and 3/3=100% fulfillment.
- However, how do I use minitab to represent the percentages of data for main items in each group in the graphical mode-for better understanding?
- For comparison between G1 and G2, I have tried to use ANOVA>GLM based on the data given. You are right! There are lots of complexities going on for me to interpret the data right now. Take a look on the result and my assumption as attached. Please correct me if I am wrong on the data interpretation.
Your interpretation of the significance tests in the ANOVA table and of the Tukey tests are correct. The difference between both lies in the hypothesis tested:
ANOVA:
H0: All group means are equal.
H1: At least one mean is different from the others.
pairwise comparison (e.g. Tukey, Bonferroni, Dunnett):
H0: The 2 means compared (e.g. type3 and type6) are equal.
H1: The 2 means compared (e.g. type3 and type6) are different.
S = 0.643302 R-Sq = 26.79% R-Sq(adj) = 25.42%
In this line the model quality is characterized:
- S: Standard deviation of the residuals / uncertainty in ratings which cannot be assigned to a factor (group, type) or the interaction (group*type) in the model
- R-Sq and R-Sq(adj): coefficient of determination / percentage of variation which is declared through the terms in the model, see Coefficient of determination[url] for further details
[*]R-Sq(adj): percentage of variation which is declared through the terms in the model
The better the model, the smaller S and the higher becomes R-Sq and R-Sq(adj). Usually values for R-Sq and R-Sq(adj) above 80% are one criterion for a good model (=a model which can explain the results good). In your model less than 30% of the variation in the ratings is explained due to the terms (factors and interactions) in the model, so this model explains just a small amount of the rating values and is therefore insufficient.
Unusual Observations for mean ratings
Obs mean rating Fit SE Fit Residual St Resid
76 0.37500 3.46500 0.09098 -3.09000 -4.85 R
176 1.80000 3.52000 0.09098 -1.72000 -2.70 R
248 1.33333 3.03333 0.09098 -1.70000 -2.67 R
251 1.33333 2.70667 0.09098 -1.37333 -2.16 R
[...]
R denotes an observation with a large standardized residual.
"Large" is defined as more than 2*standard deviation away from the expected mean. The interval mean+/-2*standard deviation covers 95% of the data (under normal assumption). Your sample size is N=599, so 5% of 599 correspond to 30 ratings which could be expected to lie beyond +/-2*standard deviation. In the table of unusual observations 28 ratings are listed - pretty much the same amount of data which could be expected to be "unusual" or "large" for a data set of N=599.
But there is one rating which differs extremly from the others: Obs 76 (first row) with a standardized residual of -4.85. This value is in fact different and shown as a point far away in the residuals plot. (Btw: I recommend to use the residual plots for analysis instead of the table of unusual observations. It's imho much easier to get the pitfalls out of the plots.)
If you take a look in the data you could see that something went wrong for observation 76: Values for mean ratings for a scale between 1 and 4 have to lie between 1 (all ratings equal 1) and 4 (all ratings equal 4), but the mean in observation 76 (Group 2, Type 1) is 0.375. This occured because the means were calculated out of the sums in your third posting, but the number of ratings were not given there.
The deviations from the blue line and the other deviations from the normal distribution in the residual plots occur probably due to the small resolution of the rating scale. Even if the mean (or percentage alternatively) is calculated, the number of different values is limited, so one couldn't expect that the residuals follow the normal distribution exactly or show a bell-shaped histogram. I would therefore recommend to look for clear deviations from the normal assumption or obvious violations of the variance homogenity assumption after the analysis was done with correct means or percentages (see below).
- Please advise me further on the steps to get to this result as well. I need to understand right from the start when you extract the data into the table that you attached to me in your reply before.
To avoid further wrong means (and to clarify how I prepared the data earlier), recalculate the means using the row-statistics-function:
Calc > Row Statistics
Statistic: choose mean
Input variables: c23 c36 c37 c41 c42 c43 c44 c45 (e.g. for type1)
Store result in: 'type1 (mean)'
OK
Afterwards stack the columns 'type1 (means)'-'type6 (means)' in a new worksheet:
Data > Stack > Stack Columns
and assign the appropriate Group in a separate column, e.g. by
Calc > Make Patterned Data > Text values
Store patterned data in: Groups
Text values: G1 G2
Number of times to list each value: 50 (n=50 ratings per group and type)
Number of times to list each sequence: 6 (6 types)
OK
and redo the analysis with ANOVA > GLM. (I couldn't do this because I don't have the information how many ratings are given for a sum in a specific row.)
Could you please attach the data used for the GLM analysis in a spreadsheet format, if you have more questions? Thx!
Best regards,
Barbara