Understanding of Regression and ANOVA in Minitab

M

Mayank Trivedi

#1
Hello Everyone,

I have a few queries related to interpretation of certain terms in Minitab related to Regression (GLM) and ANOVA. There are a few statistical concepts which I encountered in my research and I am taking the liberty of asking about them as well.

1) What is the difference between ANOVA in Regression and ANOVA in general as displayed in Minitab? In general how should one interpret ANOVA in regression?

2) When we look at ANOVA output in Minitab we also see an R-Sq value. Where does this come from?

3) I understand what residuals are but why should one do a normality test on residuals? Also I have read that post a regression test residulas should be plotted against predicted values. Why?

Best Regards,

Mayank
 
Elsmar Forum Sponsor

Darius

Quite Involved in Discussions
#3
I don't use Minitab but...
2) When we look at ANOVA output in Minitab we also see an R-Sq value. Where does this come from?

3) I understand what residuals are but why should one do a normality test on residuals? Also I have read that post a regression test residulas should be plotted against predicted values. Why?
r-sq is pearson's correlation coefficient and is the relationship of variance of the dependent variable (y) against the variance of the independent (x) ones.

A site with an example

The residuals MUST have homocedasticity according to the theory

Wikipedia term

One can assume that, when heterocedasticity is found, the model is not right (you need to do transformations - functions to get rid of it), a way to find such behabiur is charting the residuals, I almost always chart it against sequence and against the calculated value.:popcorn:
 
M

Mayank Trivedi

#4
Thank you for your response Darius.

I know about R-Sq in regression and I know ANOVA. My query though still remains unanswered.

When you look at the output of Regression in Minitab there also is ANOVA and it mentions Sum of squares for Regression and an Error. My query is what is this ANOVA. As I understand ANOVA, it is to compare means of 3 or more data sets. So how do they differ?

You mentioned that if there is heteroscedasticity you need to transform data. Why is this required? Is it ever a case where the R-Sq value is high but the residuals display non-normality? If yes why transform the residuals?

Best Regards,

Mayank
 

Steve Prevette

Deming Disciple
Staff member
Super Moderator
#5
ANOVA is literally an analysis of variance. Yes, at first though you think about comparing two or more data sets and trying to conclude if they are the same or different, and ANOVA does this through comparing their variances. There is a residual term which expresses the difference between the variance within each data set and compared to the variance between each data set.

When I do a regression (be it linear or non-linear, single or multiple) I am trying to "explain" some component of variation in the Y data based upon X1, X2, . . . Xn. The hope is that I have explained the variation in the Y data through the regression, and only have left a set of residual data. ANOVA helps me to show how much of the variation is within the fit versus left over in the residuals.

The theory of regression (using the least-squares fit) assumes that you are able to explain all of the variation in the Y data, with the exception of a set of residual data, which are independent from each other, and randomly distributed as a Normal random variable with mean zero, and some variance (which is calculated by the ANOVA).
 

Statistical Steven

Statistician
Staff member
Super Moderator
#6
Hello Everyone,

I have a few queries related to interpretation of certain terms in Minitab related to Regression (GLM) and ANOVA. There are a few statistical concepts which I encountered in my research and I am taking the liberty of asking about them as well.

1) What is the difference between ANOVA in Regression and ANOVA in general as displayed in Minitab? In general how should one interpret ANOVA in regression?
ANOVA is regression tells you if the model is signifcant, that is does the model explain more than just the random error of the data.


2) When we look at ANOVA output in Minitab we also see an R-Sq value. Where does this come from?

R-Sq is the coefficient of determination. It tells you how much of the variation is explained by the model. It is not a very good statistic as it is highly influenced by outliers and influential points.
3) I understand what residuals are but why should one do a normality test on residuals? Also I have read that post a regression test residulas should be plotted against predicted values. Why?
As has been stated, least squares regressions has an assumption that the residuals are normally distributed. Plotting against the dependent variable allows you to see if there are patterns that would indicate a transformation. See Daniel and Wood for a good explaination.
Best Regards,

Mayank
See answers above in Red. Why would you use software and do analysis if you do not understand what the output is telling you?
 

Darius

Quite Involved in Discussions
#7
You mentioned that if there is heteroscedasticity you need to transform data. Why is this required? Is it ever a case where the R-Sq value is high but the residuals display non-normality? If yes why transform the residuals?
Leaving the technical aspects of regression (of course are important but..), the practical ones is to get a better estimator.

An example of this could be seen if you obtain a lineal regression of another model, (ie. X vs. 1/X), if the X variation range is small enough (let's say 0.25 to 1), a lineal regression could fit with a "nice" number (r2>0.8), but if you chart the residuals, you can see a pattern on them, a pattern that will not exist if you transform the x before the regression analisys is performed (with r2=1).

Sometimes, there is theory about that process that document the relationship of the variables, it can give you a start point.
Other times, eventhough you KNOW that the relationship exist, you can not see it because the variation of the dependent variable was too small, and the random variation of other variables came to the play, remember..., as far as you can, reduce environment noise of other variables and obtain the dependent variables at least at the full lenght of the forecasting (determination) range, a little more can be of help.:cfingers:
 

Miner

Forum Moderator
Staff member
Admin
#8
I have a few queries related to interpretation of certain terms in Minitab related to Regression (GLM) and ANOVA. There are a few statistical concepts which I encountered in my research and I am taking the liberty of asking about them as well.

1) What is the difference between ANOVA in Regression and ANOVA in general as displayed in Minitab? In general how should one interpret ANOVA in regression?

2) When we look at ANOVA output in Minitab we also see an R-Sq value. Where does this come from?

3) I understand what residuals are but why should one do a normality test on residuals? Also I have read that post a regression test residulas should be plotted against predicted values. Why?
1) The ANOVA in Regression tests for the significance of the constant term and of the coefficients of each term in the regression model. For example, you have a simple linear regression of Y = B0 + B1X. If the ANOVA table shows B0 has a p-value of 0.35, and B1 has a p-value of 0.023, you should remove the constant term from your model leaving Y = B1X. Multiple regression works the same way. Just remove terms with non significant p-values.

2) R-Sq shows how good the model is at explaining the variation. Even if all terms in your model have significant p-values, it may not be good at making predictions. R-Sq helps answer that question. It is a positive number that varies from 0 to 1. In simple terms, an R-Sq of .60 means the model will explain 60% of the variation.

There are three types of R-Sq. R-Sq has one weakness. If you keep adding terms to the model, it will get bigger. This is where R-Sq (adj) comes in. This measure penalizes you for adding terms. If you add a significant term it will increase. If you add a nonsignificant term, it will decrease.

Finally, there is R-Sq (pred). This measure evaluates how well the model will predict. If you have overfitted your model, you may have an R-Sq (adj) of .94, but an R-Sq (pred) of 0.4. This means the model explains the variation very well, but is worthless to predict new results.

3) One of the assumptions for regression is that the residuals (unexplained variation) have i) independence, ii) constant variance, and iii) normality. I'll explain this in reverse.

iii) The residuals should be normally distributed with a mean of zero. If they are not, there are a number of causes, but the most likely is that there is not a linear relationship. There may also be outliers in the data.

ii) There should be constant variance across the range of measurement. Sometimes the variance will increase as the measurement gets bigger. This usually occurs when the theoretical relationship (regression line) MUST pass through the zero point. If you are far from zero, this won't make much difference, but if you are close to zero it will.

i) The residuals should be independent. Test this by plotting the residuals in time sequence. This protects you from lurking variables and autocorrelated data.
 
Last edited:
A

Allattar

#9
It is worth pointing out that there is no difference in the way ANOVA is calculated in Regression or ANOVA.

We are finding sum of squared distances.

How it calculates the sum of squares can be different depending on what you have asked it to do.

In a simple regression you are fitting a linear model to the results. The sum of squares in the error are found from the distances of the observed values to the least squares fitted line.

In an ANOVA the predictor is usually identified as a categorical value. The Sum of squares of the error is found from the squared distances of the observed values to the mean of each level.

If you have a predictor of just two levels, -1 and +1, then the difference between the two methods will be small. In regression a fitted line is generated based on the smallest sum of squares of the error to the line. In ANOVA the model will use the mean of the two levels and find the sum of squares of the error as the squared distance from the means.

Ill stop there before I end up going into degrees of freedom, mean square errors, random or nested factors.
 
Thread starter Similar threads Forum Replies Date
DuncanGibbons Understanding the applicability of Design of Experiments to the IQ OQ PQ qualification approach Reliability Analysis - Predictions, Testing and Standards 0
B Measuring and monitoring equipment - Understanding which procedures to be compliant with ISO 13485 ISO 13485:2016 - Medical Device Quality Management Systems 6
M Informational Health Canada has launched an e-Learning tool to aid in understanding the premarket regulatory requirements for medical devices in Canada Medical Device and FDA Regulations and Standards News 0
S Understanding UDI requirements - Class 2 medical device (hearing aids) 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 3
M Informational Understanding Costs And Risks For HFE Usability Studies — Part 1: Testing In-House Medical Device and FDA Regulations and Standards News 0
P Understanding FDA draft "Management of Cybersecurity in Medical Devices" Medical Information Technology, Medical Software and Health Informatics 3
J Properly understanding SPC - Newbie SPC questions Statistical Analysis Tools, Techniques and SPC 29
S Understanding control chart and measurement capability Statistical Analysis Tools, Techniques and SPC 2
P Minitab Data Analysis - Understanding if a Process is in Control or Not Using Minitab Software 2
R Understanding a few points on ISO 9001's Design and Development Planning ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 1
Z Understanding Cycle Time - Why the time of the other activities are left out Lean in Manufacturing and Service Industries 11
J Understanding ISO 9001:2015 - 10.3 Continual Improvement ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 10
J Understanding ISO9001:2015 - 8.3: Design and Development of Products and Services ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 3
E Root Cause Analysis - Is Insufficient Understanding an acceptable Root Cause? General Auditing Discussions 9
E Understanding of TS 16949 Clause 7.6.2 IATF 16949 - Automotive Quality Systems Standard 5
K Understanding IEC 60601-2-68 requirements ISO 13485:2016 - Medical Device Quality Management Systems 1
A Training material for interpretation & understanding Part 11 requirements 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 2
N Understanding the absolute uncertainty specification for a Fluke 5500A Measurement Uncertainty (MU) 3
N Understanding, Challenging & Approving Supplier Control Plans FMEA and Control Plans 7
M Definition Recommendations - Understanding "recommendations" and "recommended corrective action" Definitions, Acronyms, Abbreviations and Interpretations Listed Alphabetically 8
S Understanding UDI (Unique Device Identification) Other US Medical Device Regulations 10
T Understanding USP <1112> Water Activity as applicable to Medical Devices Other Medical Device and Orthopedic Related Topics 4
K Understanding Risk Management Requirements according to AS9100 AS9100, IAQG 9100, Nadcap and related Aerospace Standards and Requirements 11
S MIL-HDBK-217 - Understanding the various Environmental Conditions Reliability Analysis - Predictions, Testing and Standards 1
D What is your understanding or interpretation of TS16949 7.4.1.2 IATF 16949 - Automotive Quality Systems Standard 6
C Understanding the relationship between 62304 and the MDD ER IEC 62304 - Medical Device Software Life Cycle Processes 7
S Understanding Subgroup Size - Multi Cavity (Minitab) Statistical Analysis Tools, Techniques and SPC 4
R Understanding clause 15.4.2.1 d) of amendment 1:2012? IEC 60601 - Medical Electrical Equipment Safety Standards Series 7
M Understanding accreditation, MoUs, certifications Other ISO and International Standards and European Regulations 28
L Mobile Medical App - Understanding 21 CFR Part 820 Requirements 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 3
D Understanding and implementing ISO 17025 ISO 17025 related Discussions 9
M Understanding Versions of Collateral and Particular Standards IEC 60601 - Medical Electrical Equipment Safety Standards Series 7
S Understanding, Analysis and Monitoring Quality Defects on Composite Components Statistical Analysis Tools, Techniques and SPC 3
S Understanding PMS (Post Market Surveillance) and PMCF (Vigilance and PMCF) Quality Manager and Management Related Issues 1
B Understanding why my CpK and PpK are low, and LCL Statistical Analysis Tools, Techniques and SPC 20
S Understanding Quality Objectives, Metrics and KPI ISO 13485:2016 - Medical Device Quality Management Systems 15
Q Beginner's Understanding - The Purpose and Applications of QMS/ISO Standards Philosophy, Gurus, Innovation and Evolution 12
Q Understanding Configuration Management AS9100, IAQG 9100, Nadcap and related Aerospace Standards and Requirements 16
W Understanding PPAP Appearance Approval APQP and PPAP 22
V Understanding Automotive Coating for Seating Mechanism Components Manufacturing and Related Processes 1
4 Understanding ILAC policy P14:12/2010 6.3 part a) General Measurement Device and Calibration Topics 28
H Understanding 8.2.3 M&M of Processes for our Internal Audit ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 6
P Understanding ISO 26262 Road Vehicle Functional Safety Other ISO and International Standards and European Regulations 2
arios Understanding adoption of a product to an existing Sterilization Cycle Other US Medical Device Regulations 1
M Learning ISO 13485 - Getting a better understanding of the requirements ISO 13485:2016 - Medical Device Quality Management Systems 6
S Understanding FDA rules regarding MDDS Status and Clinical Trials 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 2
G Understanding Identification of Design in QSR 21 CRF Part 820.30 Design Control (f) 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 2
Q Understanding the general Philosophy with Complaints and CAPAs ISO 13485:2016 - Medical Device Quality Management Systems 7
M Understanding DC Patient Leakage Failure from 0.3VDC IEC 60601 - Medical Electrical Equipment Safety Standards Series 3
E Understanding TS 16949 Clause 8.2.3 and How to Audit according to it IATF 16949 - Automotive Quality Systems Standard 7
Similar threads


















































Top Bottom