I
I want to do a 2-sample t-test to compare the burst pressure performance of two different products that are used in the same application. The data is not very normal: the Anderson-Darling p-value is 0.016 for one group and 0.096 for the other group. There are 30 data points in each group. The data is skewed to the right in both groups (most of the data is on the left side of the histogram, with a pretty long right tail). I would like to transform the data using Box-Cox transformation and Johnson Transformation in Minitab and then see which one has the highest resulting normality p-value and the highest correlation coefficient when plotted. I believe this will result in the most accurate t-test results ? is that right? To do the transformation correctly, it's necessary to use the same transformation equation for both groups in order for the t-test to be valid. If I use different equations I will end up with either diverging or converging data sets and means, and the result will be meaningless. Is there a way in Minitab (or any other tool) to find the optimum equation for two different data sets? You can?t just stack the data in one column, because they are not from the same population ? they come from two different products.
I ran a Box Cox on Group 1 and got an optimum lambda of 0.171, and on Group 2 I got an optimum lambda of 0.231. I need to use only one lambda. So do I use the average of 0.171 and 0.231 = 0.201? The resulting A-D p-values are 0.470 and 0.262, which is a lot better than the raw data, but I?m not sure whether it?s optimal.
With Johnson Transformation, it?s much more complicated to find one optimum equation because there are so many factors in the equation. For Group 1, I got 2.79632 + 1.50848 * Log( ( X - 17.2401 ) / ( 2265.63 - X ) ), and for Group 2 I got -0.0969158 + 1.32128 * Asinh( ( X - 190.918 ) / 121.167 ). The A-D p-values are 0.534 and 0.431 respectively, which is better than the Box-Cox transformation. But I can?t figure out how to find the optimum Johnson Transformation for both groups ? the equations are so different.
Does anyone have any suggestions?
I?m also aware that transformations can be tricky, and that it?s best to use a single type of transformation which fits the type of data in question and then stick with it, rather than finding the ?optimum? equation for every data set. For example, break strength of round cables varies with the square of the diameter of the cable, so SQRT(X) is probably the best transformation equation. Bacteria multiplying in a Petri dish would have exponential growth, so would that be log(X)?, survival time to failure might be Weibull, etc. My test is burst pressure of a vessel, so I?m not sure which transformation is best ? I generally get good results with natural log, but it?s not that clear. Does there need to be a further justification for doing a transformation besides simply to improve normality for purposes of doing a t-test? Any comments on transformations in general and in choosing the most appropriate transformation for an ongoing series of tests would be appreciated.
Thanks.
Ari Goldberg
I ran a Box Cox on Group 1 and got an optimum lambda of 0.171, and on Group 2 I got an optimum lambda of 0.231. I need to use only one lambda. So do I use the average of 0.171 and 0.231 = 0.201? The resulting A-D p-values are 0.470 and 0.262, which is a lot better than the raw data, but I?m not sure whether it?s optimal.
With Johnson Transformation, it?s much more complicated to find one optimum equation because there are so many factors in the equation. For Group 1, I got 2.79632 + 1.50848 * Log( ( X - 17.2401 ) / ( 2265.63 - X ) ), and for Group 2 I got -0.0969158 + 1.32128 * Asinh( ( X - 190.918 ) / 121.167 ). The A-D p-values are 0.534 and 0.431 respectively, which is better than the Box-Cox transformation. But I can?t figure out how to find the optimum Johnson Transformation for both groups ? the equations are so different.
Does anyone have any suggestions?
I?m also aware that transformations can be tricky, and that it?s best to use a single type of transformation which fits the type of data in question and then stick with it, rather than finding the ?optimum? equation for every data set. For example, break strength of round cables varies with the square of the diameter of the cable, so SQRT(X) is probably the best transformation equation. Bacteria multiplying in a Petri dish would have exponential growth, so would that be log(X)?, survival time to failure might be Weibull, etc. My test is burst pressure of a vessel, so I?m not sure which transformation is best ? I generally get good results with natural log, but it?s not that clear. Does there need to be a further justification for doing a transformation besides simply to improve normality for purposes of doing a t-test? Any comments on transformations in general and in choosing the most appropriate transformation for an ongoing series of tests would be appreciated.
Thanks.
Ari Goldberg