# Zero setting of correlation graph when plotting best fit linear regression

Can you please advise do we need to set intercept=0 when plotting best fit linear regression line at a scatter correlation graph?

Normally, you should only force the y-intercept to zero when there is a sound theoretical reason for doing so.

For example, take the first principle equation F=ma. If a = 0, then F must also = 0. The equation does not contain a +x (intercept), so you would force the intercept to zero.

If you do not have a theoretical basis for it, do not force a zero intercept.

Note: If you do force a zero intercept, you CANNOT use R^2 to evaluate the goodness of fit. The R^2 will always appear to be good. Some software packages will not provide an R^2 for this reason, but some still do so.

Piece of cake, you can use Excel (lotus, Works or any kind of worksheet), just remember what does least squares means..

(Y_estimate - Y)^2
being Y_estimate = Slope * X+ intercept
or in your case Y_estimate = Slope * X

Excel has something called Solver (IMHO the greatest tool in Excel), you can obtain the estimate for Slope using the solver, so that the sum of the squares is the minimum value or do it iteractively your self.
If you can give data, we can show it.

Can you please explain further what is the reason that "If you do force a zero intercept, you CANNOT use R^2 to evaluate the goodness of fit. The R^2 will always appear to be good."
Thanks again very much.

While I have not found any papers to back this up, my observations of the examples used in papers covering this topic, and my personal experience seems to indicate that for a legitimate regression through the origin, your independent variable must be a ratio variable, not an interval variable. A ratio variable is one where zero has real meaning (e.g., height, weight). An interval variable is one where zero is an arbitrary value (e.g., temperature in degrees C or F).

However, this is not a guarantee. In the fish length example in the paper provided by Darius, length is a ratio variable. However, regression should not be forced through the origin. Why? Because fish cannot start producing eggs until they reach a certain level of maturity, and they do not necessarily produce 1-2 eggs when they do begin laying. Forcing a regression through the origin would imply that they could lay 1 egg as soon as the are hatched.

