5th March 2008, 04:00 PM
 5th March 2008, 04:00 PM
Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

During a manufacturing process, data is collected (approximately 700 measurements an hour) . At the end of an hour a mean and standard deviation is calculated. The process begins again and data is collected, etc.

If I assume equal numbers of measurements every hour, and want to pool this information for an 8-hour shift, I think I can pool the means (sum them and divide by 8), but I am uncertain of the standard deviation. Can I do the same, or do I need to square them individually to obtain variance, then sum them, divide by 8, and then take square root to obtain standard deviation?

Any help would be appreciated.
6th March 2008, 10:30 AM
 6th March 2008, 10:30 AM
Steve Prevette
Re: Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

Quote:
 In Reply to Parent Post by REVANS During a manufacturing process, data is collected (approximately 700 measurements an hour) . At the end of an hour a mean and standard deviation is calculated. The process begins again and data is collected, etc. If I assume equal numbers of measurements every hour, and want to pool this information for an 8-hour shift, I think I can pool the means (sum them and divide by 8), but I am uncertain of the standard deviation. Can I do the same, or do I need to square them individually to obtain variance, then sum them, divide by 8, and then take square root to obtain standard deviation? Any help would be appreciated.
The statistical answer is that the standard deviation of an average of 8 numbers is the standard deviation of the individual numbers, divided by the square root of 8. So, if you have calculated a standard deviation for the hourly numbers, and now want to express the standard deviation of an 8 hour shift, you would divide by the square root of 8.

Be careful of pooling too much data together, you may end up missing some signals. You do need to achieve a balance between generating too many data points to deal with (and increased false alarms) versus pooling too much together into one point, and missing a signal.
Steve Prevette
"A Passionate Statistician", ASQ CQE, ASQ Fellow
6th March 2008, 01:33 PM
 6th March 2008, 01:33 PM
Bev D
Re: Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

The real question is what are trying to accomplish with the data? this will allow us to provide a truly useful answer as oppposed to a mathematically correct one that may not help you accomplish your goal.

unless thsi is a homework assignment... : )
6th March 2008, 01:50 PM
 6th March 2008, 01:50 PM
Re: Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

I did not quite follow Steve's formula {sum individual std dev/ square root 8}?

No this is not homework...real world application.
The 700 measurements are weights. Every hour they are tallied, an average and standard deviation computed. By law (Dept Commerce) there is allowable variation defined by standard deviation and average of declared weights. So, this question has to do with sample or lot size. I may wish to increase my lot size to 8 hours (5600 units), but I need the correct standard deviation to do this. {The simple solution is run statistics for 8 hours, rather than 1 hour, but I do not have that luxury}

Still looking for help. Thanks
6th March 2008, 01:51 PM
 6th March 2008, 01:51 PM
Tim Folkerts
Re: Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

There are two different possible issues here, I think.

Suppose you measure 100 items and get a mean of 200 and a st dev of 20 (and that the data follows a normal distribution). This implies that most of the data falls in the range 200 +/- 20.

The "square root" method that Steve was talking about relates to the certainty of the mean value. The "standard error" of the mean is (StDev)/N^0.5. The mean will most likely be in the range of 200 +/- 20/ (100^0.5) = 200 +/- 2

But this is the "standard error", which gets better as you add more data. The actual standard deviation does not improve as you collect more data. If you repeated the measurement above with new items, you might get 198 +/- 18 and 202 +/-20 and 200 +/- 22. The actual standard deviation does not get better just by repeating the measurements.

Ideally, you should find the overall st dev as you suggest, by squaring, averaging, and taking the sq root. However, for large sample sizes, the st dev won't vary much from batch to batch.

In the case above, the ideal estimate of the overall stdev would be [ (20^2 + 18^2 + 20^2 + 22^2) / 4 ] ^ 0.5. = 20.05. This practically the same as the simple average of 20.00. For small samples (perhaps just 4 instead of 100), then the "correct" method of squaring would be more important.

(The standard error would decrease to 20/(400)^0.5 = 1, so this improves the knowledge of the mean.)

Tim F
To wonder is to begin to understand.
6th March 2008, 01:59 PM
 6th March 2008, 01:59 PM
Steve Prevette
Re: Statistical Analysis Problem - Pooling Information for 8 Hour Shifts

To add to Tim's response (and having had to teach this concept to non-statisticians):

We need to differentiate between the standard deviation of the individuals, and the standard deviation of an average of several individuals (or standard error).

If I have collected 8 hours of hourly data, I can calculate the average and standard deviation of the hourly rate. If I've collected 800 hours of hourly data, I can still calculate the standard deviation of the hourly rate.

But, if you then asked me - what would the standard deviation be for the average hourly rate, taken over an 8 hour shift. The prediction for the average would still be the same as when I collected the individual hourly data. But the standard deviation of an 8 hour average would be the standard deviation from above, divided by the square root of 8. That would be the standard deviation of the hourly rate ftaken over each shift.
Steve Prevette
"A Passionate Statistician", ASQ CQE, ASQ Fellow

