Search the Elsmar Cove!
**Search ALL of Elsmar.com** with DuckDuckGo Especially for content not in the forum
Such as files in the Cove "Members" Directory

Interesting Discussion Analysis of half normal distribution in minitab

01mercy

Involved In Discussions
#1
Hi all,

I have a very simple question. Is there a way to analyse half normal distributed data in minitab?
Like a process that has most of the time no deviation 0 and only deviates from the target in a positive tail.

And take from the analysis the distribution characteristics like stated in the wiki page below (variance, median)
Half-normal distribution - Wikipedia
 

Miner

Forum Moderator
Staff member
Admin
#2
There is no direct way to analyze a true half-normal distribution in Minitab. There may be indirect ways of which I am unaware. However, there are multiple distributions available in Minitab that will handle positive, right skewed data that may be a close match. What type of data do you have? Is this GD&T data such as position or flatness?
 

01mercy

Involved In Discussions
#3
Hello miner,

Thank you for your reply. Indeed, I understood today there are other distributions that I can apply like weibull and log-normal.

My question is related to another post of me on ppm calculations. Though I got there an alternative, I still need to challenge what is currently created as a model. In my view they didn't take in account this distribution. I will explain and try to keep it simple.

We have x amount of complaints per month
We have x sales per month

If there would be no delay between when product is being sold and the complaint comes in we could just calculate the fraction
(complaints / sales) per month.

However, there is a delay between time of sales and registration of complaint.
Of the total complaints we have a subset of which the serial numbers are known.
With that we know the delay time between sales and complaint.

If I look at the how the distribution of delay looks like it is like (I thought) a half normal distribution, but the others are better because a complaint will not be raised with no delay.

The idea is to use the right distribution, and take the mode of the fitted model as correction for delay to use on all complaints to put them in the "right" month of sales.

However this introduces the spread of this distribution and I understood today that the month of sales is a condition used on this fraction calculation and therefore if I correct with the mode of this distribution the fraction calculated would get a spread introduced.

Final goal is to take that into account, draw the fitted line of the fraction model with the introduced spread and proof that this spread is so big that it makes no sense to use this model (assumption is that it is)
 

Miner

Forum Moderator
Staff member
Admin
#4
I understand the issue and have worked with it many times. The best way to handle this is with a reliability (events in time) approach. The behavior that you described is called arbitrary censoring and Minitab can handle this easily. Read the Minitab Help link provided then come back and ask your questions.
 

01mercy

Involved In Discussions
#5
Hi Miner, thank you for your input.

As I understand that with this technique I can determine the distribution that fits this data.
I have read the part that explains data censoring.

I indeed have some questions on how to choose right, interval or left censored data analysis.
Data I have: Subset of pruducts in the market -> Failure in the market with exact know time production to failure in days.

According to minitab help
1) To choose right censored when I have exact failure times
2) To choose interval censored when I have interval times
3) To choose left censored when failure is between 0 and inspection (regstration time failure)

For the product failure in market I could say
1) I know the exact times, in days, when the product failed
2) I know the exact interval, in days (0-1d, 1-2d etc.) when the product failed
3) I have data that starts to have a chance of failure at zero days.

So I'm struggling which method to use.
 

Miner

Forum Moderator
Staff member
Admin
#6
You have two options:
  • Enter your data in three columns (times, frequency and censoring), then analyze using right censoring
  • Enter your data in Nevada format, run Pre-process for Warranty data, then analyze using arbitrary censoring
The first option will allow you to use the exact times, but can be very tedious since you may have many rows of data to enter. The second option tends to be easier because you can summarize your data into intervals such as one month.
 

01mercy

Involved In Discussions
#7
Thanks
I indeed will take option 2 because I can pivot this easily in excel.

For me the total shipped like here would than be replaced by the total nr of complaints of know production to failure time. The hor/vert month/month will stay the same.

I will give it a try.
 

Miner

Forum Moderator
Staff member
Admin
#8
In order to analyze this correctly, you need to include how many were shipped in total, not just the number of complaints.
 

01mercy

Involved In Discussions
#9
Well that is actually the problem we have and why I talked about a subset. Which is actually a subset of a subset.
Sold devices -> devices that give a complaint -> devices that give a complaint of which the production date is known.

This is due to the fact that we don't get the serial numbers back from the customer at each complaint.

This leaves me with 3 sets

-> 1) Total produced devices, of which the production date is known.
-> 2) Devices that give a complaint of which only the complaint date is known not the production date.
-> 3) Devices that give a complaint of which the complaint date and the production date is known.

What I wanted to do is look at the distribution of 3 to apply this on 2.
Than to calculate the fraction of 2(corrected) over 1 given the deviation that the distribution of 3 brings into the calculation.
I understood here from someone that the fraction is calculated given a condition, the condition is the time period (year/month/week) and that this time period condition introduces the deviation of 3.

It is already mentioned in my ppm post that from our data as given above we can't calculate the fraction per time period in a reliable way because of the fact we don't have the production date of all devices that give a complaint so don't know to which production volume to relate to.
But I still need to prove here that this is the case in a mathematical sense.

Would it make sense to downscale 1 with the ratio 3 / (2+3)? To have a volume that relates to set 3 which I can use as total devices produced.
 

Bev D

Heretical Statistician
Staff member
Super Moderator
#10
Here’s a thought: instead of torturing the data - and yourself - with what can only amount to meaningless numbers, why don’t you investigate the parts that failed and fix the causes? I know that sounds somewhat radical, but really isnt’ It better to just go fix things? A simple count (and perhaps understanding of severity of the failure to the Customer) in these cases is sufficient to prioritize what to fix first.
 
Top Bottom