Validation of Predictive Models of Yield

Q

quick_silver

hi,

The company hired the help of an outsider to come up with a predictive model for the yield in the manufacturing plant where i was assigned. Now we are using these models, but then i noticed that there were times that the actual is very far from the predicted values. My question is that how will i measure the effectiveness of the model using the new data set.
According to some of the articles i have read regarding this, they were talking about arriving the r square of the new data set. Is calculating the r square enough to determine the model validity as of this moment.
Now I'm asking for a guidance on how to really measure the model validity for the new data set.

Thanks in advance!
 

Bev D

Heretical Statistician
Leader
Super Moderator
Re: validation of predictive models

quicksilver - this is a great question. as George Box once said: all models are wrong; some are useful.

it would be helpful is you could add a few more specifics. for example, is the model addressing the total yield of the factory? or is there a specific model for each product or manufacturing line? or is it the yield of a single processing step? or individual characteristic?

is the model based only on historical values of the yield? or is it based on a combination of input factors and their influence on the yield?

it's also important for us to understand how the model is intended to be used. is a mmodel being used to determine what input settings will be used to guarantee a certain yield? or is it being used to understand when something in the process has changed and to direct you to take some type of corrective action?
 
Last edited:
Q

quick_silver

Re: validation of predictive models

Thanks Ma'am..

1) we have six models for each product type which undergoes unique processing
2) the model was determined by using 2 yrs historical data, we have about 50 factors monitored that we believe have impact on the different yield. Now the 3rd party provider they run a correlation analysis and then from that they arrived the regression models. Actually we did a lot of iterations, because upon seeing the 1st model there are certain factors that the models says will have positive impact but violates human logic (it should be negative not positive). And the most acceptable model is the one that we are currently using
3) the importance of the model for us, is first from among the 50 factors we need the model to tell us which among the 50 have really the biggest impact to the yield and so we can focus more our action plans on how to control those factors.
 
D

Darius

Re: validation of predictive models

they run a correlation analysis and then from that they arrived the regression models. Actually we did a lot of iterations, because upon seeing the 1st model there are certain factors that the models says will have positive impact but violates human logic (it should be negative not positive). And the most acceptable model is the one that we are currently using

There are some asumptions that have to be made when using regression, if the asumptions are not met can lead to weir conclusions. One of the most common mistake is to take the model as a lineal ones when the data isn't. If you think that the factor defies the logic; it could be noise/non significant for the model and almost sure that the factor is a value near 0.

If you are woried about the effect of the factor is not taken in account the way it should, you can make a chart of the factor against the difference between the real and the forecasted value. If some effect was not taken in account you will see some trend (a curve), showing that the efect was not taken the way it should. IMHO graphical methods show things that the numbers can't, you may have a non lineal trend but an r squared low because the lack of lineality.

from among the 50 factors we need the model to tell us which among the 50 have really the biggest impact to the yield and so we can focus more our action plans on how to control those factors.

As you may know the bigger the absolute of the factor the more impact on yield will have, but the relation will change if the variables that affect such variation are controled, you can have a strong relationship between a variable and another but if the variable is controled the impact will be negligible, it's like seeing a line from a micron, you will only see the noise (in the case of a line on the center, only black).

You ask about the use of R squared to determine wich model is right, if you are comparing a model result against the other IMHO is better to use the methods used in forecasting to determine wich gives a better results (RMSE, MAPE,MdAPE, GMRAE, MdRAE). IMHO Median Relative Absolute Error (MdRAE) is well protected against outliers.
 

Bev D

Heretical Statistician
Leader
Super Moderator
well, 50 input factors is a LOT of factors. You said that they used 2 years of historical data? How many independent runs or lots does this represent? it is possible that the model is overfitted. This can give a decent r square value but will result in a model with very little actual predictive power in practice.

the only true way to validate the model is overtime: plot the predicted value vs the actual value for some time, say 20-30 runs (or lots)...
 
Q

quick_silver

Re: validation of predictive models

Thanks Sir Darius. I did calculate the tracking signal of the six models. Accdg to my readings the ideal should be -1 to 5 but unfortunately i got many outliers. That's why i was looking of some other ways of checking the validity. How can i attached the excel file here?
 
Q

quick_silver

Re: validation of predictive models

you may see attached file re: tracking signal result
 

Attachments

  • Actual yield vs predicted yield.xlsx
    2.3 MB · Views: 191

Miner

Forum Moderator
Leader
Admin
There is a definite flaw in the predictive model.

If you follow BevD's suggestion to plot predicted value (Y) versus actual value (X) as well as Actual - Forecast (Y) versus actual value (X), you will see the anomaly clearly.

Ideally, predicted value plotted by actual value should be a line running at 45 degrees with scatter on either side, while Actual - Forecast versus actual value should be a horizontal line at zero with scatter above and below. This data show the reverse, which means there is a very fundamental problem with the equation. That is, its not predicting at all.
 
Q

quick_silver

Thanks Miner.

So moving forward I should recommend that the predicted model should be revisited and create a new one.
My question would then be how can I measure if the model that will be created is a useful one without waiting for new data, is there a way? Because when we accepted the model, the acceptability criteria was on the model or iteration no. with highest r adj square and the model whose equation agrees with conventional wisdom.
we have already paid the company who did the modeling for us. I just realized that the payment should have been after the availability of the new data and that the new data proved that the predicted equation is really useful. It turns out it is a waste money after all. :(

Thanks in advance!
 
Last edited by a moderator:

Steve Prevette

Deming Disciple
Leader
Super Moderator
For monitoring whatever predictive model is used, I'd suggest running a control chart of the delta between the prediction and the actual. Then see if that delta is stable and predictable, and if so, evaluate if the error is acceptable or not. If there are trends, then there probably is something that is affecting actual production that the model is not taking into effect.
 
Top Bottom