# My Bivariate Ppk Error Ellipse Graphing Tool

#### JaxonH

##### Starting to get Involved
Ok. Let me start off by saying this is not something you'll read about in a textbook, nor is it standard practice.

However.

I believe the way we think about 2D coordinate data and positional capability is inadequate. Here is a good example of why that is:

So how can we assess positional capability accounting for it's 2-dimensional nature? There have been many proposed methods, but I have found the error ellipse to be the most informative approach, though it does have its drawbacks.

Drawback #1: "Bivariate Ppk" is a term I came up with to describe Ppk of a 2-dimensional nature. Customers won't know what this is. So this tool is really for internal analysis only (though I have found customers love seeing the ellipse graphed against the positional tolerance).

Drawback #2: If the centroid of the distribution lies outside the tolerance, the bivariate Ppk cannot be calculated. In 1 dimension, you can simply go left or right, so means which lie beyond the spec limit only have one direction to go. But in 2 dimensions, there are infinite directions, and since Ppk technically reports "worst case", the worst case cannot be calculated. The best case can be calculated, in the direction toward nominal, but that is not how Ppk works. However, if your centroid lies outside the tolerance, I don't think you need a bivariate Ppk to tell you that you have problems

So allow me to explain where this all came from, and how I calculated the error ellipse mathematically:

Because the data is not time-ordered, Cpk and Cp go out the window. It is what it is. I experimented with a Cpk Ellipse using sigmas calculated from Moving Range, but ultimately it didn't add much value. If your process is in control, the Cpk Error Ellipse is virtually identical to the Ppk Error Ellipse anyways. And as we all know, capability analysis is useless with an out of control process. So there ya go.

Here is what the Error Ellipse looks like. Click the graph to auto-resize. You must enter the nominal (x,y) and tolerance for the TP, then paste the data underneath. If it's using material condition, you must also enter the USL and LSL for diameter, and paste that data underneath. Because the expanded tolerance would be different for every part, and that would make the graph too messy, i use the minimum of either "mean minus 3 st dev" and the "min data point". Whichever is lowest, that is used for the expanded tolerance calculation, as a safety net.

The 3 sigma ellipse is based off the square of the mahalanobis distance which follows the Chi-Square distribution, and is set to cover exactly 99% of the process.

I've used this for many years now, and was always reluctant to share it because it's a novel approach, and I feared criticism from statistical gurus who would pick it apart. But, it's been so useful I just can't keep it to myself any longer.

#### Attachments

• Error Ellipse.xlsm
1.1 MB · Views: 95
• Error Ellipse Example.png
75.4 KB · Views: 49

#### Miner

##### Forum Moderator
Actually, you have raised many excellent points and that there are many problems with applying capability indices to GD&T callouts, particularly for position. This has come up before, and another Cover has done a lot of work on this topic. I will try to locate it and link it to this thread.

#### Miner

##### Forum Moderator
@JaxonH I could not locate the threads that I mentioned, but I reached out to the poster on LinkedIn and asked him to reply. I have not talked with him since pre-pandemic, so I hope that he will respond. He did some excellent work along similar lines to your efforts.

#### JaxonH

##### Starting to get Involved
@JaxonH I could not locate the threads that I mentioned, but I reached out to the poster on LinkedIn and asked him to reply. I have not talked with him since pre-pandemic, so I hope that he will respond. He did some excellent work along similar lines to your efforts.

I think I know the thread you're referring to. I remember seeing the dual histograms plotted for the position and diameters, if it's the same one I'm thinking of. It was an interesting approach, and actually inspired me to create the error ellipse in the first place (well, credit should also be given to a certain engineer from GKN who was first to bring this issue to my attention, though he couldn't figure out a way to work out the math).

#### Miner

##### Forum Moderator
(well, credit should also be given to a certain engineer from GKN who was first to bring this issue to my attention, though he couldn't figure out a way to work out the math).
That is probably the same person (definitely the same company), and he had figured out the math when I last discussed it with him.

#### MOester

##### Starting to get Involved
I think I know the thread you're referring to. I remember seeing the dual histograms plotted for the position and diameters, if it's the same one I'm thinking of. It was an interesting approach, and actually inspired me to create the error ellipse in the first place (well, credit should also be given to a certain engineer from GKN who was first to bring this issue to my attention, though he couldn't figure out a way to work out the math).

Hi Jaxon. I'm the guy from GKN and I HAVE sorted out how to do this. What do you need to know?

#### JaxonH

##### Starting to get Involved
Hi Jaxon. I'm the guy from GKN and I HAVE sorted out how to do this. What do you need to know?

Hey Mark, how's it going? I figured you would by now, as that was many years ago when we spoke about this. At the time, I remember you had suggested regression analysis could work to find the major axis of the ellipse, but hadn't yet developed a spreadsheet to do the calculations.

I tried using that approach, and did use it for a few years, but ultimately ran into some problems with it. It worked some times, but other times not as well. So I tried something different- using a covariance matrix to derive the eigenvalues/eigenvectors of the major and minor axis of the ellipse. This has worked better for me than the regression, actually. You can get an ellipse from it based on the mahalanubis distance- basically a distance that's equal to a standard deviation in all directions based on the variation in each direction.

I am curious though what your approach is when the centroid lies outside the tolerance. I can find no solution to this aside from calculating in the direction toward nominal, which doesn't seem very useful.

#### MOester

##### Starting to get Involved
So first, using the major and minor axes of the ellipse and THEN finding the "closest" edge to the tolerance is challenging. With the mindset having something simple that others could replicate, THAT solution would require software or a macro heavy spreadsheet.

Ages ago, I tried using something I called "principle standard deviation" method. In this, you basically square root the sum of the squares of the standard deviations of the coordinates. Then just use THAT as the standard deviation. Again, running lots of simulations, that didn't work in every case. Especially if the shape of the distribution was close to round/uniform. It tended to understate capability in those cases.

So I looked for a trigger point and actually found one. With this approach I found satisfactory results in all cases without all the crazy number crunching. It agreed to 2 decimal places with the strict approach in every case I tried. To me, that's good enough. (Honestly, how many decimal places do we believe when we calculate capability anyway?)

What I do, that's easy to code is:

1) Get the x,y data for the true position (the TP data is not needed at all)
2) Calculate the mean and sigma of x and y
3) Examine the two standard deviations:
3a) If max_sigma/min_sigma > 3.5, calculate principle_sigma = SQRT(sigma_x^2+sigma_y^2) and use principle_principle sigma for sigma
3b) Otherwise, just use max_sigma for sigma

Cp = (True position tolerance) / 6sigma

Cpk = (Half TP tolerance - distance from nominal to xmean,ymean) / 3sigma

What I found was you only needed to really start worrying about how "squashed" the distribution was if it had a ratio of 3.5 or more.

(This is from memory, I will try and find the writeup I did. I don't know if I can post something with company branding on it. So it will take me some time to get it on a neutral presentation)

#### JaxonH

##### Starting to get Involved
So first, using the major and minor axes of the ellipse and THEN finding the "closest" edge to the tolerance is challenging. With the mindset having something simple that others could replicate, THAT solution would require software or a macro heavy spreadsheet.

Ages ago, I tried using something I called "principle standard deviation" method. In this, you basically square root the sum of the squares of the standard deviations of the coordinates. Then just use THAT as the standard deviation. Again, running lots of simulations, that didn't work in every case. Especially if the shape of the distribution was close to round/uniform. It tended to understate capability in those cases.

So I looked for a trigger point and actually found one. With this approach I found satisfactory results in all cases without all the crazy number crunching. It agreed to 2 decimal places with the strict approach in every case I tried. To me, that's good enough. (Honestly, how many decimal places do we believe when we calculate capability anyway?)

What I do, that's easy to code is:

1) Get the x,y data for the true position (the TP data is not needed at all)
2) Calculate the mean and sigma of x and y
3) Examine the two standard deviations:
3a) If max_sigma/min_sigma > 3.5, calculate principle_sigma = SQRT(sigma_x^2+sigma_y^2) and use principle_principle sigma for sigma
3b) Otherwise, just use max_sigma for sigma

Cp = (True position tolerance) / 6sigma

Cpk = (Half TP tolerance - distance from nominal to xmean,ymean) / 3sigma

What I found was you only needed to really start worrying about how "squashed" the distribution was if it had a ratio of 3.5 or more.

(This is from memory, I will try and find the writeup I did. I don't know if I can post something with company branding on it. So it will take me some time to get it on a neutral presentation)
Simple is good. I do recall you mentioning that 3.5 ratio actually, and using principle standard deviation for cases where the eccentricity exceeded that ratio.

I did thoroughly investigate using regression analysis for the major axis of the ellipse but... it didn't work well in all cases, and was way off in others. That's when I started curiously poking around alternate approaches, however complex they may be. I found using the variance-covariance matrix to establish the eigenvectors and eigenvalues would always give a reliable major and minor axis. It just required a ton of math in the background.

The actual challenge I found was, as you mentioned previously, establishing the worst case ppk. Because unlike normal 1D data, with 2D data it must be checked in all directions 360 degrees around from the centroid (x-bar, y-bar), calculating not the gap between the tolerance and ellipse with a subtraction, but the ratio with a division. I'm sure mathematically one could find a nifty calculus trick but my practical solution was to simply check 360 times in 1 degree increments, dividing distance to tolerance circle by distance to error ellipse boundary (which uses the mahalanubis distance with a scalar to ensure the entire ellipse is exactly 3 standard deviations away. Took some algebra making lines with slopes calculated based on the increments of 1 degree working it's way around (technically only had to do 180 calculations, then just use the other intersection for each line to get the remaining 180).

Downside to my approach is if the centroid lies outside the tolerance you can't calculate anything. Unless you agreed to find "best-case" in direction pointing directly at nominal for tolerance center.

I found the TP calculation (USL + AVERAGE(BT) - AVERAGE(TP)) / 3*SQRT(VAR.S(BT) + VAR.S(TP)) to be a remarkably reliable formula that doesn't do anything fancy and yet somehow always seems to come reasonably close to my bivariate Ppk.

#### Semoi

##### Involved In Discussions
Have you looked into the so called Hotelling's T^2 statistic? It is a multivariate method, which works for correlated response variables, if they are normally distributed
where mu is the vector of the mean values, and Sigma is the square root of the covariance matrix. The idea is to calculate

for which we need to invert the covariance matrix

If each measurement consists of p=2 variable {X, Y} and if we have in total "s" such measurements (s=sample size), we can use the following two relationships: I
1. The average value is given by

2. a the 3 sigma (=99.7%) limit is given by

I use these formulas in a SPC chart, that's why I called the average "CL" (for center line), and the 99.7% percentile "UCL" (for upper control limit).

Using Hotelling's T^2 statistics has one effect, which might be a deal breaker: An "outlier" does NOT indicate a significant deviation from the target value, but a significant deviation from the model. This is shown in the following graphs:

The second point (=outlier) in the left graph corresponds to the red circle in the right graph -- the target values are (X,Y)=(0,0). If the two variables (X,Y) were independent, there is an alternative method using t-scores. However, I feel like the independence assumption is not valid in your case, which is why I skip the description.

When we have 2d coordinates (X,Y), and our specification limit is given by two radii (rMin, rMax), I don't see why we should not just calculate the radial distance to the target value for each data point and use it to calculate the Cpk-value. Using your dataset "Test D10-5 RFS" I get the following

Average Cpk = 1.05
Standard deviation of the Cpk = 0.284
The axis are actual "DEVIATION from target value" and not "Position".

I am rather sure that you tried this. Why is it not working?

#### Attachments

• Bildschirmfoto 2024-09-04 um 22.49.11.png
13.1 KB · Views: 15