Re: Minitab and stats newbies..please help
- Then, I do the sum of rating as suggested.
P1 (C23+C36+C41+C42+C43+C44+C45) = ‘Type1 (sum)’
P2 (C24+C27+C34+C39+C40) = ‘Type2 (sum)’
P3 (C25+C32+C33) = ‘Type3 (sum)’
P4 (C26+C28+C31+C38+C46) = ‘Type 4(sum)’
P5 (C30+C35) = ‘Type 5(sum)’
P6 (C29+C47+C48) = ‘Type 6 (sum)’
- Once I got the values in the column, I did a box plot as attached.
- However, I am struggling to write a description for this box plot. I know the middle line is the median, but I am still can’t figure out how to describe this data. My assumption when I look at this boxplot, Type1 or P1 has scored the highest among all and Type 5 or P5 is the lowest.
- I did the same steps to a set of data in G2 and attached is the boxplot as well. Again, please help me to describe the data.
If you take the sum of the ratings, the sums for P1 will usually be higher because you sum up 7 ratings and the sums for P5 will be lower because you sum up only 2 ratings. So even if the P5 ratings are high (e. g. 4 and 4) the sum will only be 8. For P1 with mid-high ratings (e. g. 2,3,3,2,1,2 and 3) the sum is 16. (Note: In your formula for Type 1 (sum) you have only 7 ratings C23+C36+C41+C42+C43+C44+C45 for P1, but in the formula for the percentages are 8 ratings or subitems.)
To get comparable numbers you have to divide the sum with a figure which reflects the number of items in the sum. One approach is to use the mean (sum/ (no. of subitems)), another the calculation of a percentage value advised by your supervisor (sum / (highest mark*no. of subitems*sample size)).
To calculate this or other formulas in Minitab you could use the calculator with several build-in function similar to those in Excel:
Calc > Calculator
Store result in variables: Type 1 (%)
Expression: 'Type1'/1600*100
OK
I calculated the means of your G1 and G2 data as follows:
'Type1 mean' = 'Type1'/8
'Type2 mean' = 'Type2'/5
'Type3 mean' = 'Type3'/3
'Type4 mean' = 'Type4'/5
'Type5 mean' = 'Type5'/2
'Type6 mean' = 'Type6'/3
(sum / no.of.subitems)
The boxplots seem to be similar for all 6 types (see data and graph attached).
- I have done normality test for each items to compare and attached the result in the normality test file.
- As all items have formed a straight line, so may I assume all data in each items are normally distributed?
- ...
The variation in your ratings is too small to see a normal distribution in the data. Due to the small amount of different values (either in the sum, mean or percentage) you get stacked points in the probability plot and therefore a violation of the normality assumption.
The Anderson-Darling test for normality (AD-test) shown in the legend on the right side of each graph detects almost always a significant deviation from the normal assumption (p-value <0.05 = data not normally distributed), so you could not use a t Test and get reliable results in the comparison of G1 and G2. A nonparametric test is recommended here.
- Based on my assumption of the normality test, I have done 2-t-test as suggested and here are the results.
Two-Sample T-Test and CI: Type1 (sum), Type 1-G2 (sum)
Two-sample T for Type1 (sum) vs Type 1-G2 (sum)
N Mean StDev SE Mean
Type1 (sum) 50 26.96 3.32 0.47
Type 1-G2 (sum) 50 27.72 4.62 0.65
Difference = mu (Type1 (sum)) - mu (Type 1-G2 (sum))
Estimate for difference: -0.760
95% CI for difference: (-2.359, 0.839)
T-Test of difference = 0 (vs not =): T-Value = -0.94 P-Value = 0.347 DF = 88
[...]
Again, I am struggling here to interpret the result. Please help.
The 2t-test looks for differences between two means, here: mean of Type1-G1(sum) and mean of Type1-G2(sum). This difference is calculated as
26.96-27.72=-0.760
The p-value gives the probability to find this difference (or a higher one) in the data, if the means of G1 and G2 are in fact equal (H0 assumption). The p-value for the difference between mean of G1 and G2 is given as p=0.347 >0.05, so it could be interpreted as "no significant difference between G1 and G2 in the type 1 ratings", but the normality assumption isn't met for the data.
If your values for G1 and G2 are in different columns, you could use the nonparametric Mann-Whitney test to evaluate if the medians of G1 and G2 are different. (If your data is in one column and the grouping factor in another like in the spreadsheet attached, you could use the Kruskal-Wallis test instead. The Mann-Whitney test is the same as the Kruskal-Wallis test for 2 groups.)
Mann-Whitney Test and CI: Type1 mean_G1; Type1 mean_G2
N Median
Type1 mean_G1 50 3.5000
Type1 mean_G2 50 3.6250
Point estimate for ETA1-ETA2 is -0.1250
95,0 Percent CI for ETA1-ETA2 is (-0.2500;-0.0001)
W = 2273.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0830
The test is significant at 0.0813 (adjusted for ties)
"ETA1" is the median of Type1 for G1 and "ETA2" the median for G2. The p-value in the Mann-Whitney test is the probability to get this difference (or a higher one) in the data if the medians of both samples are in fact the same. The p-value obtained for Type1 in G1 and G2 is p=0.0830 or rather p=0.0813 (ties occur due to equal values in the sample, so the later is the more appropriate p-value). p is greater than 0.05 so there is not enough evidence in the data to assume that a significant difference exist between Type1 in G1 and G2.
My supervisor has also suggested for me to use ANOVA for comparison. Do you think it would be sufficient using t-test or should I further analysis using ANOVA as per advised?
You'll get a lot more informations out of an ANOVA than test procedures could provide, but the complexity of the analysis will increase just as well. Here are some opportunities provides with ANOVA:
*Evaluation of interactions (e.g. Type & Group)
*Evaluation of model quality: Which amount of variability in the data could be assigned to the factors in the model? (R-Sq, R-Sq(adj))
*Comparisons and multiple comparisons of levels (e.g. Tukey, Bonferroni)
*Detection of extreme values and outliers not explained due to factor settings (residual plot)
Due to missing values you have to use the GLM out of the ANOVA menu (Two-way ANOVA will only work for equal numbers of items in every subgroup, that is in a balanced design):
Stat > ANOVA > GLM
Responses: 'mean ratings'
Model: Group Type Group*Type
Group (G1/G2), Type (Type1-Type6) and interaction Group*Type are included in the ANOVA model
> Graphs:
Residual Plots: Choose "Four in one"
OK
> Factor Plots:
Main Effect Plot: Factor: Group Type
Interaction Plot: Factor: Group Type
[without asterisk!]
OK
> Comparisons:
Terms: Group Type
Method: Tukey, choose "Grouping information", "Confidence Interval", "Test"
OK > OK
Take a look at the results and decide for yourself if it't worth to struggle with it or if you want to stick to the tests instead.
Best regards,
Barbara