Minitab error: Zero or negative degrees of freedom

P

Pergula

Hello all!


When doing a balanced ANOVA I get the error:
* ERROR * Zero or negative degrees of freedom.


Here is what it is being done by me:


I am new to MiniTab Software and familiar with basics of statistics. I know what a balanced ANOVA is and what a model is and how it should be written.
I did some tutorials on how to use MiniTab etc.
Ok, so much on my background.


I am using sample data from SEMI E89 RI 4 and am trying to repeat the analysis as it is described in there with Minitab. Having failed with the premade analysis options in Minitab (the part-to-part variance is killing the results!?) I went back to the old fashioned way of just doing a balanced ANOVA and entering all the factors and terms myself.


the model proposed is
Thickness(hkji) = mu + day(h) + wafer(k) + wd(kh) + cycle(hkj) + repeat(hkji) + e(hji)
mu..grand mean
day(h)..effect of h days
wafer(k)..effect of k wafer values (its a thickness)
wd(hk)..effect of wafer by day interaction
cycle(hkj)..effect of j cycle nested in day and wafer
repeat(hkji)..effect of i repeats nested in cycle, wafer and day
e(hji)..residual effects


So this is a crossed and nested model. I use the dialog:
stat > ANOVA > balanced ANOVA


The following are the set parameters:
Response: Thickness
Model: Wafer Day Wafer* Day Cycle( Day Wafer) Repeat( Wafer Day Cycle)
Random factors: Wafer Cycle Repeat
Options: use restricted form of model - no


The data is balanced and arranged as required (it worked for other tests with the premade MSA analysis).


Ok having done all this I get the following error message:
* ERROR * Zero or negative degrees of freedom.


The only way it seems to work is when I drop the factor repeat completely out of the model.*:( However, I do need to get this component somehow.


Unfortunately I have found nothing on the web, the help files or this forum yet. If anyone can point me into the right direction on this I would be more than grateful!


Thanks a lot!
-Jean-
 

Miner

Forum Moderator
Leader
Admin
The usual cause of zero degrees of freedom is a saturated model. That is, you are trying to extract more information than there are degrees of freedom available. Without seeing the example and your project file, it would be difficult to answer your question.
 
E

ernestphoon

Dear Jean, I suspect that the problem is in the "balance" in the data.

Please refer to the Minitab help which gives the following:

You can use balanced ANOVA to analyze data from balanced designs. See Balanced designs. You can use GLM to analyze data from any balanced design, though you cannot choose to fit the restricted case of the mixed model, which only balanced ANOVA can fit.

Your design must be balanced to use balanced ANOVA, with the exception of a one-way design. A balanced design is one with equal numbers of observations at each combination of your treatment levels. A quick test to see whether or not you have a balanced design is to use Stat > Tables > Cross Tabulation and Chi-Square. Enter your classification variables and see if you have equal numbers of observations in each cell, indicating balanced data.

Hope that helps,
Ernest
 
P

Pergula

Thanks for the reply. I tested the balance and it gives me all equal numbers. So I guess that is fine.

I attached the project file for reference. The original example I cannot attach as it is the official SEMI E89 norm.

Any ideas?
-Jean-


PS: the forum interface didnt let me attach the minitab project file so I zipped it. sorry for that - I personally dont like zips much.
 

Attachments

  • Example_E89.zip
    5.1 KB · Views: 138
E

ernestphoon

Dear Jean, Try dropping the cycle i.e. Wafer Day Wafer*Day Cycle( Wafer Day) Repeat(Wafer Day)

:) therein lies the balance
 
P

Pergula

Thanks ernestphoon!
This seems to solve the DoF problem. .... Why?:)

The Repeat is nested in Cycle (in the experiment 2 repeats are performed for each loading cycle). So why should it not described as such in the model?

The values that are expected and calculated in Minitab of the value set are:
Code:
  Component       Variance      Minitab V
  Day             12.1348       - (nothing?)
  Cycle           0.1256        1
  Repeat          0             0
  Wafer*Day       462.9869      461
  Residual        0.8446        45

The Minitab output is:
Code:
Analysis of Variance for Thickness

Source              DF          SS         MS          F      P
Wafer                4  3800399981  950099995  502544.64  0.000
Day                  7       15135       2162       1.14  0.365
Wafer*Day           28       52936       1891      40.33  0.000 x
Cycle(Wafer Day)    40        1844         46       1.02  0.471
Repeat(Wafer Day)   40        1833         46       1.02  0.479
Error               40        1802         45
Total              159  3800473530

x Not an exact F-test.


S = 6.71103   R-Sq = 100.00%   R-Sq(adj) = 100.00%

                       Variance  Error  Expected Mean Square for Each Term
   Source             component   term  (using unrestricted model)
1  Wafer               29690566      3  (6) + 2 (5) + 2 (4) + 4 (3) + 32 (1)
2  Day                               3  (6) + 2 (5) + 2 (4) + 4 (3) + Q[2]
3  Wafer*Day                461      *  (6) + 2 (5) + 2 (4) + 4 (3)
4  Cycle(Wafer Day)           1      6  (6) + 2 (4)
5  Repeat(Wafer Day)          0      6  (6) + 2 (5)
6  Error                     45         (6)

* Synthesized Test.
*
So the correspondence for the Wafer by Day interaction is quite good. The cycle... well with lots of rounding error it has a similar magnitude. However the Day component is missing and the residual is quite high!?

One problem I see is the huge wafer to wafer variance that seems to throw off Minitab a bit. The value is several orders of magnitude higher than everything else - as is to be expected in an MSA when the system is tested in the expected process range.

So the remaining questions to me are:
I. Why exclude Cycle in the model as a factor for repeat to be nested in?
II. Where did the variance for the component Day go? ( Well I know it vacation time, yet...;))
III. What about the large residual in Minitabs calculation - I guess this will relate to the second question?
IV. Does Minitab have a limit in display accuracy depending on the order of magnitude of the terms?


Thanks a lot!!
-Jean-
 
E

ernestphoon

Dear Jean, running the wrath of the statistical community, my layman explanation is that if you do a DOE 2 x 2 and only did 2 trials there isn't the "right amount of data", to run the stats even if the design is correct.

Which may explain why Day variance was in "ounca punca land"

At the risk of being castrated again by the stats community for not asking the contextual questions first before answering you. Are you doing the equivalent of a Measurement Systems Analysis because of the large value differences between wafers? If you are then you should consider using Minitab's Gage Study under - Stat -> Quality Tools -> Gage Study -> Gage Study Expanded.

All the best in the discovery,
Ernest
 

Miner

Forum Moderator
Leader
Admin
So the remaining questions to me are:
I. Why exclude Cycle in the model as a factor for repeat to be nested in?
II. Where did the variance for the component Day go? ( Well I know it vacation time, yet...;))
III. What about the large residual in Minitabs calculation - I guess this will relate to the second question?
IV. Does Minitab have a limit in display accuracy depending on the order of magnitude of the terms?
I am still working on this in my spare time, but I can answer a few of these questions.

I. Enter it as Repeat(Cycle). Since Cycle is already entered as Cycle(Wafer Day), Minitab understands the hierarchy.
II. You set Day as a fixed variable. Variance components are only calculated for random variables. Set Day as random and you will see the variance component.
III. Yes. The variance that would have been attributable to Day was pooled in the residual.
 
P

Pergula

The content is indeed a MSA. The data is taken from an example - supposedly from real life with a bit of adjustment to make a point.


It is a thickness measurement series on 5 different wafers (with very different thickness values) that was performed on 8 days. Each day the wafers were loaded twice in random order and measurement performed twice per load cycle.


Therefore the assumed statistical model is:
Thickness(hkji) = mu + day(h) + wafer(k) + wd(kh) + cycle(hkj) + repeat(hkji) + e(hji)


see previous post for parameter names


---------


Ok, I tried using Repeat(Cycle) this still yields the DoF error. I guess there is something wrong with the model itself.


Also I tried using the Gauge R&R in Minitab. However I find the output fairly useless since:
- most graphs are trashed by the very high wafer-to-wafer variance, which I need to test the system within its measurement process range
- the output is not as detailed, also because of the first point
- the input makes quite some assumptions that I feel uncomfortable with, e.g. the everlasting focus on operators - does most of the world really still measure manually?


Maybe it just me on these points and I just havent got the hang of Minitab yet.




What worked out well is using Day as a random factor (can time ever be random? - that skews my mind a bit). Now I get almost the expected variance of 12.13 *(Minitab: 14).*:)
Thanks to you Miner!


What I find interesting is that the residual stayed pretty much the same. So with Day being a fixed factor the residual is 45,
and Day being a random factor the residual is 45.
Is that as expected? I would have guessed that the contribution of 14 to Day would have reduced the residual!?




Ernest: Me not being a native english speaker - where is "ounca punca land"?*:lol:


Thanks a lot folks!
-Jean-
 
E

ernestphoon

Dear Jean, its me with the gibberish, for the never never land or stuff that seems to disappear into nothingness... :) One has to amuse oneself when the stats doesn't seem to work.

Its great though that you found your solution.

To share with you a dirty little secret that the stats guys frown on, just replicate the 160 data and make it 320 and your old model will work.

Balancing is an art for me while the stats guys know the science and trying to figure it drives me insane.

Enjoy,
Ernest

I am a bit concerned with the data though. There are two outliers: graph attached and there could have been some translation error and data trimming? (explaining why the model could not run), wafer 4, data number 20 substitution of 1 for 7. Wafer 2, data number 13 2 for 8 (refer to outlier graphs). There also appears to be a temporal (time based effect) influencing the data when it was collected (refer to temporal graphs). Hopefully your real experiment was planned with randomness and then the interpretation of the anova would be correct.
 

Attachments

  • Outlier.pdf
    12.4 KB · Views: 102
  • Outlier1.pdf
    12.3 KB · Views: 112
  • Temporal.pdf
    12.3 KB · Views: 105
  • Temporal1.pdf
    12.5 KB · Views: 97
Last edited by a moderator:
Top Bottom