Tim,
poor Indul
He ask a simple excel question and we start to discuss about curve approximations... Sorry for hijacking your thread Indul, perhaps you'll although find some useful hints in our postings.
For me the decision between two or more curves is
not an chicken/egg-thing. From Indul's spreadsheet you could see, that his intention is to draw a Xbar-R-chart. For this charts the assumption of normally distributed data had to be true to get valid results.
So if you're trying to track a process with Xbar-R-charts (or any other tools with the assumption of normality), you have to check this assumption first. Indul's values are not normally distributed, so what could he do?
1. He could use a transformtion of the data or the distribution (lognormal or something like that), but there are no hard rules to get the "right" solution here out of the data without additional informations. Maybe he'll get something normally distributed at the end, but he'll throw away a lot of valuable information.
2. He could use two curves to get a good fit of the data. But what does that mean for his process? How could he decide whether a new value like 2.2 belongs to the first or second distribution without informations about the cause for the two distributions?
3. He could use sample-subgroups. If the expected mean and variance of the process are stable, he'll get normally distributed values (Central Limit Theorem). But he had no information about the causes for the peaks in the original data.
4. He could go back to the process and search the cause(s) for the peaks (different materials, machines, suppliers,...) And then he could split the data into useful subgroups with (approximate) normally distributed data and draw Xbar-R-charts (e. g. one for each supplier).
Taking that way he has used the whole information of the data and is able to make stable predictions about the behaviour of the process in future. IMO that is the appropriate use of statistical tools.
Regards,
Barbara