# How to Statistically Prove 2 Dimensions are Related

S

#### susanyap123

Hi, I had a set of data as attached which shows that a machining process from 'Dimension-A' will affect 'Dimension-B'. I need to prove to my customer in statistical. Is there any way of proving it using Minitab? If yes, which tools should I use? Thanks

#### Miner

Start by plotting the data using a scatter plot. Doing so, you will see that there does appear to be a relationship, but there also appears to be an outlier (sample 18). Verify that these measurements are correct to determine how to handle this outlier.

Next, you can do a simple correlation followed by a regression analysis if you want to proceed that far. Note that the relationship only explains approximately a third of the variation. The remaining variation may be due to measurement variation, process variation, or other factors.

#### Bev D

Ian r squared value .294 is very low. dimension A is NOT the primary driver of Dimension B. If you look at the scatter diagram you can clearly see this. I know you want to 'prove' that A causes B, but it doesn't...

perhaps if we understood the process and what the dimensions were we could provide more insight. it is possible that A does cause B and your study design is flawed.

#### Miner

Part of this issue is "probably" because the data were collected from the process, which was presumably stable. There is too little spread to clearly see the relationship. If a larger spread were used the R^2 would probably increase.

#### Statistical Steven

Just a point a clarification and a real bone of contention for me. You cannot PROVE that A cause B. You can only show there is a relationship between A and B. Through experience and subject matter expertise, can you use the relationship to assign the cause. Another point to consider is R-Squared is not proof a good fit! Se Anscombes Quartet (https://en.wikipedia.org/wiki/Anscombe's_quartet)

#### Miner

Good point about R^2. As Ellis Ott said: "Plot the data!"

I disagree about the comment on proving a relationship. What you stated is correct for observational data. However, experimental data that has been replicated can essentially prove a relationship. For example, I only have to flip the light switch on and off a few times to essentially prove that it causes the light to turn on and off.

#### Statistical Steven

You are correct that every time you turn on the switch the light does on proves there is a cause and effect relationship. But when you have observational data (as is the case with regression) with MSE, you cannot prove a cause and effect relationship.

S

#### susanyap123

Hi, I do agreed that using scatter plot we are only able to see both DIM A & DIM B do have a relationship but may not be cause. My company BB also considered operators may be 1 of the KPIV. Thus, he used some hypothesis testing. But I not sure is this method correct? should 2-sample t test be better?

the samples are collected from 2 operators (20 samples each).

#### Bev D

First - lets be clear. if there is any relationship between the performance of A and B it is tenuous at best. LOOK at the scatter diagram the 'best fit' line is almost flat. the r squared value is also clear on this. B may be built on A but the variation in B is not controlled by A. At least in the way you have measured things.

IF we had a better idea of the physical relationship between A and B and the process by which B is created we might be able to suggest a more definitive analysis. Parallelism of A related to 4 individual dimensions of B could be a very poor way of assessing the relationship.

Secondly the 'hypothesis' test of the operators is clear from a LOOK at the data (again given the obvious limitations of trying to correlate parallelism of A to 4 values of B) even with box plots that there is NO relationship (either statistical or physical) between the operators and the results in A or B. No amount of statistical test hunting will overcome this...

All of this doesn't mean that A isn't causing B. ALL statistical analysis must match the physical reality of the system. Statistics doesn't obviate physics.
If you really want to understand the relationship between A and B we need to understand what A and B are and how they are created...without it you are just yanking on the statistical slot machine arm...

