How to Statistically Prove 2 Dimensions are Related

S

susanyap123

Hi, I had a set of data as attached which shows that a machining process from 'Dimension-A' will affect 'Dimension-B'. I need to prove to my customer in statistical. Is there any way of proving it using Minitab? If yes, which tools should I use? Thanks

Attachments

• Book1.xlsx
13.9 KB · Views: 140

Miner

Forum Moderator
Start by plotting the data using a scatter plot. Doing so, you will see that there does appear to be a relationship, but there also appears to be an outlier (sample 18). Verify that these measurements are correct to determine how to handle this outlier.

Next, you can do a simple correlation followed by a regression analysis if you want to proceed that far. Note that the relationship only explains approximately a third of the variation. The remaining variation may be due to measurement variation, process variation, or other factors.

S

susanyap123

Thanks! Will try out.

Bev D

Heretical Statistician
Super Moderator
Ian r squared value .294 is very low. dimension A is NOT the primary driver of Dimension B. If you look at the scatter diagram you can clearly see this. I know you want to 'prove' that A causes B, but it doesn't...

perhaps if we understood the process and what the dimensions were we could provide more insight. it is possible that A does cause B and your study design is flawed.

Miner

Forum Moderator
Ian r squared value .294 is very low. dimension A is NOT the primary driver of Dimension B. If you look at the scatter diagram you can clearly see this. I know you want to 'prove' that A causes B, but it doesn't...

perhaps if we understood the process and what the dimensions were we could provide more insight. it is possible that A does cause B and your study design is flawed.

Part of this issue is "probably" because the data were collected from the process, which was presumably stable. There is too little spread to clearly see the relationship. If a larger spread were used the R^2 would probably increase.

Statistical Steven

Statistician
Super Moderator
Just a point a clarification and a real bone of contention for me. You cannot PROVE that A cause B. You can only show there is a relationship between A and B. Through experience and subject matter expertise, can you use the relationship to assign the cause. Another point to consider is R-Squared is not proof a good fit! Se Anscombes Quartet (https://en.wikipedia.org/wiki/Anscombe's_quartet)

Miner

Forum Moderator
Good point about R^2. As Ellis Ott said: "Plot the data!"

I disagree about the comment on proving a relationship. What you stated is correct for observational data. However, experimental data that has been replicated can essentially prove a relationship. For example, I only have to flip the light switch on and off a few times to essentially prove that it causes the light to turn on and off.

Statistical Steven

Statistician
Super Moderator
Good point about R^2. As Ellis Ott said: "Plot the data!"

I disagree about the comment on proving a relationship. What you stated is correct for observational data. However, experimental data that has been replicated can essentially prove a relationship. For example, I only have to flip the light switch on and off a few times to essentially prove that it causes the light to turn on and off.

You are correct that every time you turn on the switch the light does on proves there is a cause and effect relationship. But when you have observational data (as is the case with regression) with MSE, you cannot prove a cause and effect relationship.

S

susanyap123

Hi, I do agreed that using scatter plot we are only able to see both DIM A & DIM B do have a relationship but may not be cause. My company BB also considered operators may be 1 of the KPIV. Thus, he used some hypothesis testing. But I not sure is this method correct? should 2-sample t test be better?

the samples are collected from 2 operators (20 samples each).

Attachments

• Book3.xlsx
196 KB · Views: 170

Bev D

Heretical Statistician
Super Moderator
First - lets be clear. if there is any relationship between the performance of A and B it is tenuous at best. LOOK at the scatter diagram the 'best fit' line is almost flat. the r squared value is also clear on this. B may be built on A but the variation in B is not controlled by A. At least in the way you have measured things.

IF we had a better idea of the physical relationship between A and B and the process by which B is created we might be able to suggest a more definitive analysis. Parallelism of A related to 4 individual dimensions of B could be a very poor way of assessing the relationship.

Secondly the 'hypothesis' test of the operators is clear from a LOOK at the data (again given the obvious limitations of trying to correlate parallelism of A to 4 values of B) even with box plots that there is NO relationship (either statistical or physical) between the operators and the results in A or B. No amount of statistical test hunting will overcome this...

All of this doesn't mean that A isn't causing B. ALL statistical analysis must match the physical reality of the system. Statistics doesn't obviate physics.
If you really want to understand the relationship between A and B we need to understand what A and B are and how they are created...without it you are just yanking on the statistical slot machine arm...