View Full Version : FMEA Occurrence Ranking in Microelectronics
brutas 23rd August 2006, 05:17 AM In our company there is the following non-clear question:
Occurrence PFMEA rating:
Probability of Failure Failure Rates in ppm/% Ranking for O
Very High: Persistent Failures >10% O=10
>5% - 10% (1 per 10) O=9
High: Frequent Failures >2% - 5% (5 per 100) O=8
>0.5% - 2% (2 per 100) O=7
Moderate: Occasional Failures >2000ppm - 5.000ppm (5 per 1.000) O=6
>1000ppm - 2.000ppm (2 per 1.000) O=5
>500ppm - 1000ppm (1 per 1.000) O=4
Low: Relatively Few Failures >100ppm - 500ppm (5 per 10.000) O=3
>10ppm - 100ppm (1 per 10.000) O=2
Remote: Failure is Unlikely up to 10ppm (1 per 100.000) O=1
The question is: these suggestions (5 per 1000, 1 per 10000...) - what are they describe: a single chip fail or one event with a lot?
Typically our lots are around 300 000pcs in size.
If in one lot we have for example 50 pcs failed by non-legible marking reason how to consider this - as 50 occurrences of the failure mode or as 1 single event?
The cause is for example bad adjustment of the camera system by the operator (this is single event) but the effect is 50pcs with non-legible marking.
How would you interpret this?
Thanks
Michael Walmsley 23rd August 2006, 08:59 AM Leave the "lot"s out of it!
You are looking for cummulative # of failures over the product life!
Not per lot!
brutas 24th August 2006, 05:09 AM Any other opinions?
ChuckHughes 24th August 2006, 08:28 AM Interesting question!!!!
I take this approach. Frequency of occurance is based on root causes , not on the frequency of the symptoms. For example a faulty thermocouple in the wave solder machine may cause several boards to not solder properly. The first fault may cause the solder to be too cold and poor wetting is the result. The second fault may cause the solder to be too hot and burned components may result. You may look at this as two separate problems: cold solder and burned boards. One cause with two frequencies.
Another example: 1) cold solder joints caused by an operator adding more solder to the fountain and not allowing the temperature to reach the correct heat; 2) cold solder joints caused by poor cleaning/contamination in the wash section. This may be looked at as a frequency of "two" if the symptom is counted instead of the root causes.
This may not match any publication but it saves money by not wasting time chasing one time occurances that are disguised as a frequent symptom.
tymer5 24th August 2006, 09:19 AM The question is: these suggestions (5 per 1000, 1 per 10000...) - what are they describe: a single chip fail or one event with a lot?
Typically our lots are around 300 000pcs in size.
If in one lot we have for example 50 pcs failed by non-legible marking reason how to consider this - as 50 occurrences of the failure mode or as 1 single event?
The cause is for example bad adjustment of the camera system by the operator (this is single event) but the effect is 50pcs with non-legible marking.
How would you interpret this?
Thanks
It seems to me you are mixing apples and oranges. You are talking about people being causes but you are counting defective pieces. I'm not sure how you are describing your function or failure mode, but I would expect it to go something like this based on your cause description. Function: Camera can focus on piece, Failure Mode: Camera not focused properly, Effect: non-legible marking on piece: Manufacturing Severity: 4 (The product may have to be sorted with no scrap, and a portion (less than 100%) reworked.), Cause: Operator did not follow work instructions: Occurrence: ?
In this case I would place the occurrence of how often the operator does not follow the WI at your facility. Then I would try to place a control on the Cause not the Effect (catching non-legible markings) as you have done above. Why because 1) you want to list controls that prevent or detect the causes before controls that detect the failure modes. This is what helps you prevent the failure modes from happening. 2) Assuming here that the scrap is the worst severity, then you have already taken into account the number of pieces you have to sort and rework.
Regards,
Michael Walmsley 24th August 2006, 10:44 AM Pg 21 of the AIAG FMEA manual conveys :
"Occurence is the liklihood that a specific cause / mechanism will occur during the design life."
tac123 29th August 2006, 12:04 AM I have heard this question ask many times with differing opinions. My answer would be if the cause happened once and lead to the failure mode even though 50 parts were damaged it would be 1 occurrence.
Michael Walmsley 29th August 2006, 08:13 AM For 1 in 50 over product life it would be O=8 approx. per AIAG which is heavily used across many industries. Ref pg 23.
Jim Wynne 29th August 2006, 09:36 AM Maybe I can clear up some of the confusion here. Michael is quoting from the AIAG manual with regard to design FMEA; the OP is asking about process FMEA. But while we're at it, the passage he quoted says,
Occurence is the liklihood that a specific cause / mechanism will occur during the design lifeOccurence is the liklihood that a specific cause / mechanism will occur during the design life.
But "design life" isn't defined; is it synonymous with product life? Another instance where the AIAG manual leaves us to guess about what they want.
But to get back to the original question, there should be no doubt that "occurrence" refers to causes and not to the number of defectives caused. This is yet another reason that I maintain that process failure modes should be identified as failures of the process and not manifestations of process failures in the product. The reasoning is simple: if the process doesn't fail, there will be no defective product (assuming that the process has been efficaciously designed). So it just makes sense, in doing the PFMEA, to consider the likelihood of process failure when determining the "occurrence" factor. Defective parts do not constitute failure modes (regardless of what the manual says); defective parts are the effects of process failures.
Michael Walmsley 29th August 2006, 09:56 AM Good point Jim!!!
brutas 4th October 2006, 04:06 AM Jim, thank you for your opinion. It is very reasonable.
I fully agree with you for the PFMEA, but I still want to discuss this topic for the DFMEA.
This is a quotation from SAE-J-1739:
Page 13, DFMEA Occurrence evaluation criteria:
"Likely failure rates over the design life:
0.5 per thousand vehicles/ items
0.1 per thousand vehicles/ items..."
So, this is what confuses me:
"Occurrence is the likelihood that a specific cause (I understand "root cause")/ mechanism will occur during the design life"
In the table is given defect rate by number of vehicles!
How to count/ assess the Occurrence?!! - as a root cause during the design (single event) or as a number of failed vehicles in the field?
Or maybe "design life" means that I should count all failures occurred during the life of the product (vehicle)?
Michael Walmsley 4th October 2006, 08:14 AM You are assigning occurrence to a specific cause.
This specific cause may happen at a given rate over the product life.
eg for every 1000 vehicles produced that have or will run through the life cycle, this cummulative rate may be .5 per that population (.5/1000).
Product life is left up to an agreement between the OEM and supplier.
It could be 1 year/12,000 miles or as high as 10 years/150,000 miles. This is generally agreed to in the RFQ for the project.
This would certainly affect the occurrence rate you place into your FMEA.
Now for the hard part.
If you do not have a good warranty reporting system/parts return program /reliability test (weibull database) process for your projects where you have generated the information you need,then
Use your engineering judgement.
ChuckHughes 4th October 2006, 10:15 AM I agree with Mike: FMEA's are based on causes, not symptoms. The total focus is prioritizing the likelihood a root cause will occur, not a symptom. Unfortunately almost all of the reporting schemes I see during audits collect data on symptoms, not causes.
Poor solder joints may have many root causes, but unless your FMEA is able to determine likelihood of low heat at the fountain, dirty components, dross in the melt, etc your efforts are prioritizing are wasted.
brutas 5th October 2006, 10:59 AM Sorry, but it still unclear for me.
1) If, for example, during the design of certain device inappropriate material was chosen, this is one time occurrence of particular root cause. Then with this inappropriate material 100 000pcs are produced. It happens that in the field (during the design life) 100 vehicles fail because of this defect (root cause). How to interpret this? For me this is one time occurrence which leads to 100pcs failed devices. Please explain.
2) How to assess the occurrence rating for the new devices (not yet been in the field)?
3) We monitor ppm customer returns.
Don't forget we are talking about DFMEA
Thanks for your help!
ChuckHughes 5th October 2006, 04:58 PM If we stay with your example of "inappropriate material" at the design phase, then the root cause will be somewhere between the failure of the design review team to realize some material may be inappropriate such as lead solder in lead-free applications, the skill of the design engineers in picking the wrong material or in the design verification/validation effort for not discovering the material. Wrong material should result in 100% nonconforming parts.
As I said in earlier messages, the FMEA is focused at root causes. If we take the example you provide, the frequency of selecting inappropriate material in the design of parts is what should be counted, not the number of parts made from inappropriate material.
If 5 of the last 100 design projects resulted in a material change during prototyping or launch, the frequency is .05. The number of parts that were produced may be more a reflection of problems in detection of the inappropriate material before prototyping begins.
brutas 6th October 2006, 04:51 AM Why in the manual they count number of failed vehicles. How this can be linked to the customer complaints/returns:
Quotation from SAE-J-1739:
Page 13, DFMEA Occurrence evaluation criteria:
"Likely failure rates over the design life:
0.5 per thousand vehicles/ items
0.1 per thousand vehicles/ items..."
I understand that the Occurrence should be focused on the root cause, but why they use counting of the failed vehicles (which is in fact the effect of the failure)?!! :confused:
brutas 9th October 2006, 05:53 AM Can somebody explain the meaning of:
"Likely failure rates over the design life:
... per thousand vehicles/ items
... per thousand vehicles/ items..."
Please help!
tymer5 12th October 2006, 02:27 PM Can somebody explain the meaning of:
"Likely failure rates over the design life:
... per thousand vehicles/ items
... per thousand vehicles/ items..."
Please help!
This is a quotation from SAE-J-1739:
Page 13, DFMEA Occurrence evaluation criteria:
"Likely failure rates over the design life:
0.5 per thousand vehicles/ items
0.1 per thousand vehicles/ items..."
Brutas,
I think I have found the issue. Go with the AIAG FMEA Third Edition. This is supposed to be identical to SAE-J-1739 but apparently it is not. :notme: Forget the "Likely Failure Rates Over Design Life" and go with "Possible Failure Rates" (pg. 23) as described in the AIAG manual.
Pg 21 of the AIAG manual states " Occurrence is the likelihood that a specific cause/mechanism will occur during the design life. The likelihood of occurrence ranking number has a relative meaning rather than an absolute value. Preventing or controlling the causes/mechanisms of the failure mode through a design change or design process change (eg. design checklist, design review, design guide) is the only way a reduction in the occurrence ranking can be effected."
Note the wording in AIAG is "Occurrence is the likelihood that a specific cause/mechanism will occur during the design life."
This is actually the same wording as in the SAE-J-1739 pg 12 section 3.2.15.
So from your wrong material selected example. You could actually change the design process not just the design to reduce the occurrence of how often the wrong material is selected. Your current occurrence ranking on the wrong material being selected is basically the percentage of how many times you select a material that is incorrect. (# of material changes/# of original materials selected) now convert this into an occurrence ranking. This may be related to the failures in the field but is not necessarily related. For instance the material selected may be better than what is needed. This is probably more related to product cost than to failures. Never the less it is still the same cause (the wrong material was selected). This is one more example of why you want to relate your Effects to your Failure Modes and your Causes to your Failure Modes. NOT your Causes to your Effects.
I hope this helps. Best of luck to you.
brutas 13th October 2006, 04:29 AM This is interesting and unclear for me:
You are assigning occurrence to a specific cause.
This specific cause may happen at a given rate over the product life.
Why do you assume that the cause may happen at a given rate over the product life?
For my understanding when doing the design of certain product, the failure mode will occur during this design phase. It is initially built into the design. What fails further is only an effect of this initial root cause. Isn't it so? :rolleyes:
Michael Walmsley 13th October 2006, 08:19 AM From your DV/PV P&R testing , life tests are generally performed (over product life / accellerated to product life). Each design phase seeks to eliminate the "weak" points in its predecessor.
The goal of any dynamic testing is to understand and minimize the mean time to failure of the product,thus increase reliability during the testing process and consequently product life.
The minimum mean time to failure for a given mode and cause is the weakest link in the chain.
It has a pattern or distribution to it (eg a probability distribution such as lognormal,weibull,normal,extreme value,....).
When we reach the point where we have optimized customer satisfaction by minimizing the mean time to failure for all the modes and potential causes we have tested against,then we are ready to release product.
Given the pattern or distribution associated with this "optimized" mean time to failure for a specific mode / cause, it will display a rate of failure over the product life. We take that rate and correlate it to the occurrence tables.
If no dynamic testing or little dynamic testing is done, then the rates of failure are anyones best guess!:agree1:
brutas 13th October 2006, 09:03 AM From your DV/PV P&R testing , life tests are generally performed (over product life / accellerated to product life). Each design phase seeks to eliminate the "weak" points in its predecessor.
The goal of any dynamic testing is to understand and minimize the mean time to failure of the product,thus increase reliability during the testing process and consequently product life.
The minimum mean time to failure for a given mode and cause is the weakest link in the chain.
It has a pattern or distribution to it (eg a probability distribution such as lognormal,weibull,normal,extreme value,....).
When we reach the point where we have optimized customer satisfaction by minimizing the mean time to failure for all the modes and potential causes we have tested against,then we are ready to release product.
Given the pattern or distribution associated with this "optimized" mean time to failure for a specific mode / cause, it will display a rate of failure over the product life. We take that rate and correlate it to the occurrence tables.
If no dynamic testing or little dynamic testing is done, then the rates of failure are anyones best guess!:agree1:
Thank you Michael.
Can you give me some example to illustrate how do you obtain the occurrence?
A real example from your practice.
Jim Wynne 13th October 2006, 10:29 AM The goal of any dynamic testing is to understand and minimize the mean time to failure of the product,thus increase reliability during the testing process and consequently product life.
Not always, and in some cases, not ever. Testing is used to characterize the reliability of a product--that is to say, to provide empirically-derived knowledge of it. What one does with that knowledge is another matter.
The knowledge is often used in determining when to stop making improvements. The question often becomes, "At what point will the market accept failure?"
When we reach the point where we have optimized customer satisfaction by minimizing the mean time to failure for all the modes and potential causes we have tested against,then we are ready to release product.
Well, sometimes customer satisfaction is optimized not by making a thing last as long as conceivably possible, but by determining the point at which failure no longer results in customer dissatisfaction. It's not an easy nut to crack in most cases. For example, I have a TV that's been operating flawlessly for more than 10 years, so I'm very happy with it. If it fizzles out tomorrow, I'll have gotten my money's worth, but where was the point of demarcation when it might have failed that would not have resulted in me being disgruntled? Three years? Seven? Once that point is understood, testing may be used to exploit it.
Michael Walmsley 13th October 2006, 12:09 PM Brutus,
Attached is the example you requested.
:agree1:
Jim,
Point 1
I should have stated that the "acceptance" criteria is generally provided to us in a goal at life.This defines to us the maximum rate of failure allowed in warranty over product life.
Rarely does one meet the goal out of the gate.
Yes, we do have to characterize the reliability,but as years of experience have taught me and others,we end up cutting to the chase and shoot for optimizing / attempting to exceed their goal. With competition fierce,costs similar,this is what often tips the scale in our favor. Back in the old days,people were happy with vehicle drivetrain performance thru 10 years/100K miles.
As driving habits change,this goal had been increased to 150K miles. The latest Ford commercials are touting 250 Kmiles as defining a vehicle as being Ford tough.
I will not debate with you on what ones does with their data. I have no control over this.
The basic theory and practice that I have followed is to use the output of the DVP&R on reliability and performance as an input to the FMEA (sev,occ,...).And to use the output of the FMEA as an input to the DVP&R ( current controls ,....)
In point 2, I agree.
Again you must consider the environment.
Are we dealing with safety vs convenience systems is a crucial factor.
Jim Wynne 13th October 2006, 12:36 PM Brutus,
Jim,
Point 1
I should have stated that the "acceptance" criteria is generally provided to us in a goal at life.This defines to us the maximum rate of failure allowed in warranty over product life.
Rarely does one meet the goal out of the gate.
Yes, we do have to characterize the reliability,but as years of experience have taught me and others,we end up cutting to the chase and shoot for optimizing / attempting to exceed their goal. With competition fierce,costs similar,this is what often tips the scale in our favor. Back in the old days,people were happy with vehicle drivetrain performance thru 10 years/100K miles.
As driving habits change,this goal had been increased to 150K miles. The latest Ford commercials are touting 250 Kmiles as defining a vehicle as being Ford tough.
I will not debate with you on what ones does with their data. I have no control over this.
The basic theory and practice that I have followed is to use the output of the DVP&R on reliability and performance as an input to the FMEA (sev,occ,...).And to use the output of the FMEA as an input to the DVP&R ( current controls ,....)
In point 2, I agree.
Again you must consider the environment.
Are we dealing with safety vs convenience systems is a crucial factor.
I understand, and wasn't really disagreeing, just trying to add a little context. "Failure" in the context of product life is not a negative concept. Just about everything will fail sooner or later. Control of failure is the key, and when you think about it that way, "failure" is equal to "lack of success." Determining what success should look like--not failure, per se-- is the first step in a good design.
Michael Walmsley 13th October 2006, 12:41 PM I agree bro!
|
|