Evaluating a Regression Model Using the Constant Variance Assumption

J

Jim Shelor



Covers,

I am reviewing a regression model that is structured like this:

Y = β0*(X1^β1*X2^β2*X3^β3*X4^β4)

The analyst performed a transformation to obtain:

Ln(Y) = lnβ0 + β1lnX1 + β2lnX2 + β3lnX3 + β4lnX4

The analyst then regressed the ln equation to find the coefficients and provided the equation in both the ln form and the power law form.

When I regress the power law form against the experimental result, there is a severe violation of the constant variance assumption. A severe triangle increasing to the right on the residuals vs fit plot.

When I regress the ln form of the regression against the ln of experimental data the constant variance assumption is met, slightly increasing variation, (naturally since it is an ln function).

The analyst says that the equation for Y is valid because the equation for ln(Y) satisfies the constant variance assumption.

Since I am not interested in the value of ln(Y), I am only interested in Y, I am telling the analyst that the violation of the constant variance assumption makes the regression model of interest (Y) invalid, regardless of the result of residuals vs fits for the ln form of the equation.

Is there something I am missing here? Doesn’t the model for Y have to meet the constant variance assumption? Am I supposed to care whether or not the ln(Y) model meets the constant variance assumption when ln(Y) is not the number I care about?

By the way, if I take the antilog of the result of the ln form of the equation, the severe violation of the constant variance assumption returns (as would be expected).

Thanks in advance for the help.

Sincere regards,

Jim Shelor
 
Last edited by a moderator:
J

Jim Shelor

Thanks Miner,

I guess I must just be thick because when I do the equation

Y = exp(ln(Y))

and I regress it against the observed Y, I get exactly the same residuals vs fits plot as I do if I use the non-log form of the equation.

It is difficult for me to picture why working the problem to get ln(Y) and then converting to Y (exp(ln(Y)) makes the end result any less flawed.

Jim Shelor
Not the brightest bulb in the lamp.
 
J

Jim Shelor

I am thinking perhaps I have not expressed my question correctly. I have attached a more thorough explanation of the question for your review.

Thank you in advance for the help.

Jim Shelor
 

Attachments

  • My analyst has produced two equations as models to represent the process I am evaluating.doc
    271 KB · Views: 149

Bev D

Heretical Statistician
Leader
Super Moderator
Jim: you asked what you are missing. you are missing physics. you need an equation (model) that works for your situation.

are you only interested in the mathematics of this situation or are you looking for a model that will help resolve some real world situation?

If the latter a bit more info on the situation, how the original model(s) was developed - if you could, post the data - what you hope to use the model for etc. will help us help you...
 

Statistical Steven

Statistician
Leader
Super Moderator
I am thinking perhaps I have not expressed my question correctly. I have attached a more thorough explanation of the question for your review.

Thank you in advance for the help.

Jim Shelor

Jim -

I think you are confusing some concepts (then again, I might be also). Typically when you have a data set with X's and Y's you try to fit the linear model Y=X. If the residual pattern shows that the assumptions of regresssion are violated, you transform the data. Sometimes you transform both X and Y, sometimes just X and sometimes just Y (see Daniel and Wood for residual pattern interpretation). In your case, you are doing a log-log transformation. So the coefficients (betas) are not in the natural units, but in log units. So you transform back the betas to get the appropriate coefficents for the model. If you regress ln(Y)=ln(X), then take exp(beta), you will NOT get the same coefficent as Y=X.

Does that make any sense?
 
J

Jim Shelor

Bev and Steven,

Right now I am only interested in the mathematics of the situation.

I cannot provide more detail on the experiment because the data files are hugh and the experiment is a one of a kind research project.

I can say that the only number I am concerned with at this point is Y and is the model presented by my analyst adequate for predicting Y.

It seems to me that if every model that actually outputs Y has these serious constant variance issues, and every model that calculates lnY then Y is recovered using exp(ln(Y)) has these serious variance issues, the models do not effectively predict Y for future X combinations.

I have studied transformation some Steven and there are two reasons why my analyst wanted to do the ln-ln transformation.

1. To linearize the right side of the equation.
2. To stabilize the variance by transforming the left side of the equation.

I think transforming both sides of the equation was a mistake. I would have transformed only the right side of the equation and then examined the variance. But before I have my analyst redo this regression analysis I wanted to find out if anyone else thought the current analysis is correct from a purely statistical perspective and I just don't understant constant variance as well as I thought I did.

The bottom line for me is that Y must have a constant variance, ln(Y) only needs a constant variance if ln(Y) is my response value of interest.

Am I misinterpreting the statistics of this situation?

Thanks,

Jim
 

Statistical Steven

Statistician
Leader
Super Moderator
Bev and Steven,

Right now I am only interested in the mathematics of the situation.

I cannot provide more detail on the experiment because the data files are hugh and the experiment is a one of a kind research project.

I can say that the only number I am concerned with at this point is Y and is the model presented by my analyst adequate for predicting Y.

It seems to me that if every model that actually outputs Y has these serious constant variance issues, and every model that calculates lnY then Y is recovered using exp(ln(Y)) has these serious variance issues, the models do not effectively predict Y for future X combinations.

I have studied transformation some Steven and there are two reasons why my analyst wanted to do the ln-ln transformation.

1. To linearize the right side of the equation.
2. To stabilize the variance by transforming the left side of the equation.

I think transforming both sides of the equation was a mistake. I would have transformed only the right side of the equation and then examined the variance. But before I have my analyst redo this regression analysis I wanted to find out if anyone else thought the current analysis is correct from a purely statistical perspective and I just don't understant constant variance as well as I thought I did.

The bottom line for me is that Y must have a constant variance, ln(Y) only needs a constant variance if ln(Y) is my response value of interest.

Am I misinterpreting the statistics of this situation?

Thanks,

Jim

Jim -

The problem is that Y is NOT distributed as a normal distribution. I agree that it is probably more accurate to only transform the X side of the equation then fit the regression. If that stabilizes the variance, you have your model. Since I am not the statistician doing the analysis, it is hard for me to know if just transforming the X's is sufficient.

Since Y is most likely a lognormal distribution, the variance increases with the X and must be stabilized if you want to use "linear" regression. You can use logistic regression with a log link function to get the parameters directly in the correct distribution.

Hope that helps.
 
J

Jim Shelor

Steven,

I believe the plots clearly demonstrate that Y is normally distributed.

Respects,

Jim Shelor
 
Top Bottom