# Interpreting Linear Regression Results from Minitab

S

#### stats_beginner

Hi,

I am trying to apply linear regression to a limited data set relating the number of injuries to the number of fatalities in building collapse. There are further variables (e.g. building type, extent of collapse etc) that will influence this relationship, but due to a lack of information I am not in a position to account for these yet.

I have performed a linear regression using Minitab on the full data set and got the following relationship (the minitab results and related plots are in Regression I.pdf attached):

Injuries = 14.6 + 1.20 Fatalities

However, the residual plots don't appear to met the normaillity assumptions (I think) so I have tried repeating the regression with observations 4 and 11 omitted (which were identified as having large standard residuals). This resulted in the following adjusted equation (with minitab results and related plots in Regression II.pdf attached):

Now the residual plots appear to be more 'normal' but two observations are still highlighted as having large standard residuals. This also increased the R-sq value from 67.1% to 71.6%.

Then as a separate attempt to improve the results I tried deleting observations 2 and 4 (both had a much larger number of fatalities than the rest of the data). This resulted in the following adjusted equation (with minitab results and related plots in Regression III.pdf attached):

In this case the residual plots better than for case I but not as good as case II. While, the R-sq value is reduced to 50.2% (the lowest of the three values).

From this I would consider regression II to be the best but I am wondering if I am missing something/interpreting the results incorrectly?

I know the data isn't great but I want to have an approximate relationship which I can improve as more data becomes available.

I'd appreciate any advice on this as my stats experience is extremely limited!

V

#### Attachments

• Regression I.pdf
147.6 KB · Views: 136
• Regression II.pdf
149.1 KB · Views: 111
• Regression III.pdf
148.4 KB · Views: 114

#### Jen Kirley

##### Quality and Auditing Expert
Re: Interpretting linear regression results

Welcome to the Cove!

Are you a student? If so, is this an assignment from a textbook? if so, which one and which edition?

N

#### NumberCruncher

Re: Interpretting linear regression results

Hi Stats_Beginner

You are right, the data are not Normally distributed. You need to transform the data. There isn't "THE" correct transformation. A bit of trial and error suggests that square root of both sets of data works well.

I don't have Minitab, but I can guarantee that there will be a data transformation or data calculation menu that will allow you to create a two new columns with the square root of the data. You can then calculate a new regression curve which will fit far better.

NC

S

#### stats_beginner

Re: Interpretting linear regression results

Jennifer, I am not a student. I'm an engineer and this is a set of data I've collected. However, I've forgotten all of the stats I did in college.

NumberCruncher, ok so first problem seems to be that the data is not normally distributed. I have taken your advice and taken the square root of the data and reran the regression. This improves the residual plots considerably but the histogram still doesn't have the bell-curve shape I would expect to need (I've attached the results again). Can I say the data is normal with this histogram?

I'm going to try a few different transformations in the meantime but assuming the square root of the data gives the 'most normal' plots how should I proceed? Is it correct to delete different values until the histogram improves?

Thanks

V

#### Attachments

• Regression IV.pdf
146.6 KB · Views: 140

#### Miner

##### Forum Moderator
Re: Interpretting linear regression results

The normal probability plot looks very normal. It is a better test than the histogram. If you go into Options, you can set the Residuals plot to display the Andersen-Darling test results.

N

#### NumberCruncher

Re: Interpretting linear regression results

Hi stats_beginner

Actually, I don't think the results look at all bad. The normal probability plot is almost perfect and the residuals vs fitted value shows no obvious structure.

The histogram doesn't look too bad either. Remember, you only have 17 data points. That's not a lot to make a histogram out of.

Personally, I would go with the double square root transformation for the moment, but don't stick too dogmatically to it. Collect more data and see if the model still fits.

No, the fit isn't perfect, but you have a very messy data set (sorry about the rather sick pun, it wasn't intentional, honest!). The assumption behind a model is that one variable can be used to generate another. In the hard sciences, this is a very good assumption. The experiments can be controlled and the physical model assumes cause and effect.

With happenstance data such as this, there is no physical reason why 4 people being killed should cause 14 people to be injured. It's actually the incident that causes both, but the mathematical model ignores that.

As I am ignorant of this kind of study, I am slightly surprised that you get such a good fit with so few data.

No, you shouldn't keep deleting points to improve the fit.
1) without a valid statistical reason, this is bad practice.
2) The outliers may suggest something important. If a particular incident produced an unusually low or high number of injuries or fatalities, perhaps there is something in the detail of the incident report that tells you why this happened.

NC

S

#### stats_beginner

Re: Interpreting linear regression results

Super, thank you so much for your help.

NumberCruncher, I delighted to hear you're surprised at the goodness-of-fit. This is a first estimate at developing some sort of model to predict the number of injuries which might be expected following a building collapse, something it's quite difficult to get data for. I'm hoping to collect more information and eventually account for differences in the types of building, the severity of injuries observed etc. But as a general first guess the regression equation makes sense, my concern was that the data was appropriate to allow me use this equation.

B

#### bbarbee

Re: Interpreting linear regression results

My 2c is this: I'm not sure the X and Y are correctly stated. Don't injuries cause fatalities? I'm not a doctor, but...

And deleting data in search of normality is generally A Bad Thing to do unless you have a really good reason to believe that the particular points you don't like really don't belong in that model. (Maybe there was a convention of hardy yet injury prone people in the building, or perhaps they were all very old, etc)

More data will certainly help, but you can't knock down buildings to get it. The sqrt transform certainly makes the R^2 better, but you don't want to over-fit your data, either.

N

#### NumberCruncher

Re: Interpreting linear regression results

My 2c is this: I'm not sure the X and Y are correctly stated. Don't injuries cause fatalities? I'm not a doctor, but...

Hi bbarbee.

Sorry to contradict you but from a reporting point of view, there is no reason why injuries should cause fatalities. If you are injured and live you are recorded as injured. If you are injured and subsequently die, you are removed from the injured category and placed in the fatalities category.

It's safe to assume that just about everyone who dies in an incident, dies as a result of injuries, excepting the one person who died of an entirely unrelated heart attack at the exact moment of building collapse.

Let's take a simple numerical example.

There are 10 people involved in an incident. 10 are injured and 3 die. So that's 13 people then.

???

I stated in my previous post:

"No, the fit isn't perfect, but you have a very messy data set (sorry about the rather sick pun, it wasn't intentional, honest!). The assumption behind a model is that one variable can be used to generate another. In the hard sciences, this is a very good assumption. The experiments can be controlled and the physical model assumes cause and effect.

With happenstance data such as this, there is no physical reason why 4 people being killed should cause 14 people to be injured. It's actually the incident that causes both, but the mathematical model ignores that."

In physical reality, the building collapse causes both the injuries and the fatalities. The plot is simply the two dimensional projection of a three dimensional data set. The third dimension is something that I will call "Seriousness of building collapse". I have absolutely no idea what that is or how you measure it.

Yes, you can have overlapping categories. The classic example is infant mortality and child mortality. Child mortality (under 5 years) includes infant mortality (under 1 year).

This is a different point to the one above which is about correlation and cause.

NC

#### Jen Kirley

##### Quality and Auditing Expert
My mind was going, I think, to the same place as bbarbee.

I am wondering what is the relationship between x and y. In regression analysis we are looking for a trended effect of something, or trying to predict something given data we know. Are you using regression analysis for predicting fatalities based on injuries?

I have found a site from Princeton called Interpreting Regression Output.

I hope this helps!

Interpreting Tensile Charts General Measurement Device and Calibration Topics 0
Interpreting China Medical Device regulations/standards China Medical Device Regulations 1
ISO 80369-7 standard - Interpreting which Parts should be in scope Other Medical Device Related Standards 7
PACS - interpreting MDD and Borderline Guidance CE Marking (Conformité Européene) / CB Scheme 5
M List of Packaging Contents - Medical Devices - Interpreting EU Directive 93/42/EEC CE Marking (Conformité Européene) / CB Scheme 3
D Interpreting Normal vs Weibull Capabilities Capability, Accuracy and Stability - Processes, Machines, etc. 4
P Interpreting Span Measurement - P95-P5 Six Sigma 1
Interpreting Deviations 5 & 6 in Annex ZA in ISO 14971:2012 ISO 14971 - Medical Device Risk Management 1
Interpreting "misuse" when assessing Hazardous Situations ISO 14971 - Medical Device Risk Management 2
S Interpreting clause 4.1.5 in ISO 13485:2016 ISO 13485:2016 - Medical Device Quality Management Systems 9
Interpreting Process Controls - 21 CFR Part 820.70(a) 21 CFR Part 820 - US FDA Quality System Regulations (QSR) 5
Help interpreting 21 CFR Part 806 (corrections and removals) Other US Medical Device Regulations 1
G Interpreting Type 1 MSA (Measurement Systems Analysis) Results Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 4
Help interpreting MIL-STD-105E Single Sampling Plans Tables Inspection, Prints (Drawings), Testing, Sampling and Related Topics 5
M Biocompatibility in Respiratory Products - Help interpreting whitepaper IEC 60601 - Medical Electrical Equipment Safety Standards Series 22
I Interpreting Product Realization (Clause 7) in ISO 9001:2008 for Service Industry ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 5
V GD&T Interpreting Datums in Two Single Segmented Position Tolerances Inspection, Prints (Drawings), Testing, Sampling and Related Topics 5
M Interpreting X bar and R Control Charts Statistical Analysis Tools, Techniques and SPC 2
B Help with interpreting stock market terminologies Coffee Break and Water Cooler Discussions 4
V Interpreting Zinc Plating Specification GMW 3044 - 6K96/48 APQP and PPAP 2
Interpreting and Applying 7.3.2 Design and Development Inputs ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 4
S Interpreting the Calibration Report for my Dial Indicator General Measurement Device and Calibration Topics 3
J Interpreting Process Capability results and ratios Capability, Accuracy and Stability - Processes, Machines, etc. 1
B Length Measure - Interpreting Calibration Results and Measurement Uncertainty (MU) Measurement Uncertainty (MU) 3
S Interpreting Level of Confidence - Round Robin for Tensile Testing - Help needed Statistical Analysis Tools, Techniques and SPC 4
M Interpreting AS9100 Clause 8.2.2 Internal Audit Requirements Internal Auditing 28
T Interpreting my t test in Minitab Using Minitab Software 2
Interpreting LMC for Pattern of Slots (GD&T) Inspection, Prints (Drawings), Testing, Sampling and Related Topics 3
Interpreting Minitab Test For Equal Variances Using Minitab Software 3
V Interpreting Minitab Gauge R&R Results Using Minitab Software 6
A ISO 2081 - Interpreting Coating Thickness Other ISO and International Standards and European Regulations 3
L Interpreting Injection Molding Tooling Documents - Cavities Manufacturing and Related Processes 3
C Interpreting Gage R&R Results - 3 operators, 3 iterations and 10 parts Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 6
M Interpreting Measurement Uncertainty for Temperature PRT probes Measurement Uncertainty (MU) 2
N Interpreting Hygrometer Calibration Uncertainty - Temperature Coefficient Measurement Uncertainty (MU) 1
Interpreting Decimals in the mm state - Off Wall Question Inspection, Prints (Drawings), Testing, Sampling and Related Topics 14
J Interpreting clause 7.5.2.1 (validation of software used in production & service) ISO 13485:2016 - Medical Device Quality Management Systems 2
E Interpreting Partial Least Square Results Using Minitab Software 1
C Interpreting Outside Laboratory Calibration Certificate Measurement Uncertainty (MU) 3
M Interpreting MIL-PRF-19500 (Performance Spec, Semicon Dev) - What is LTPD of 20 - ??? Manufacturing and Related Processes 4
T Rules for interpreting control charts Statistical Analysis Tools, Techniques and SPC 2
Q Interpreting Responsibility and Authority Clause 5.5.1 ISO 13485 ISO 13485:2016 - Medical Device Quality Management Systems 18
H Interpreting 'Evaluation of compliance' in ISO 14001 Internal Auditing 4
P Micrometer Gage R&R study - Interpreting the data and suggestions Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 4
W Interpreting a Normal Probability Plot Statistical Analysis Tools, Techniques and SPC 4
"Downsizing" ramifications - Evaluating and Interpreting the News World News 0
Q Interpreting calibration result of Air Particle Counter on Counting efficiency General Measurement Device and Calibration Topics 1
A Particle Counter Results - Interpreting Data for a Class 100,000 Cleanroom Other Medical Device Related Standards 12
M GD&T Q&A session - Interpreting FCF (Feature Control Frame) Inspection, Prints (Drawings), Testing, Sampling and Related Topics 43
N TL 9000 - Help Interpreting Normalization Unit for Measurements TL 9000 Telecommunications Standard and QuEST 2