Discrete vs. Continuous Variables and Linear Regression.

W

Wicked

Re: Discrete vs Continuous variables and linear regression.

To clarify, the 1-30 is a scale with whole numbers only, so for example 2.5 is not an option.
 

Miner

Forum Moderator
Leader
Admin
Ordinal logistic regression would probably be more appropriate, though you really need to provide more information for us to be certain.

In certain circumstances, integer type data can be treated as continuous, but in your situation linear regression would provide nonsensical predictions such as 5.36 (non-integer) or 52.2 (beyond scale).
 
W

Wicked

Not quite sure what kind of information you need, feel free to specify. Will add some extra information below either way.

Linear regression gives me non-integer results, although within scale. The data seems to be of normal distribution, and the explanatory variables are binary. The response variable is an index created from 6 other variables which all are results from a survey, and on a scale from 1-5. The reason the scale is from 1-30 and not 6-30 is that some people has not answered all questions.

While I realize that ordinal logistic regression might make more sense, will the linear method be way off when you consider that the data is normally distributed, the residuals seem to fit the line and the condition of homoscedasticity is fulfilled?

If I am to use ordinal logistic regression, how do I interpret the minitab results? (Only the most important parts of the interpretation is needed).

Thanks in advance!

-Ted
 

Statistical Steven

Statistician
Leader
Super Moderator
Not quite sure what kind of information you need, feel free to specify. Will add some extra information below either way.

Linear regression gives me non-integer results, although within scale. The data seems to be of normal distribution, and the explanatory variables are binary. The response variable is an index created from 6 other variables which all are results from a survey, and on a scale from 1-5. The reason the scale is from 1-30 and not 6-30 is that some people has not answered all questions.

While I realize that ordinal logistic regression might make more sense, will the linear method be way off when you consider that the data is normally distributed, the residuals seem to fit the line and the condition of homoscedasticity is fulfilled?

If I am to use ordinal logistic regression, how do I interpret the minitab results? (Only the most important parts of the interpretation is needed).

Thanks in advance!

-Ted

A few general comments:

1. If some responders did not always all questions, then combining them via a sum is incorrect. Since a responder with 6 responses all equal to 1 is the same as a responder with 2 responses of 3. This makes for poor inference.

2. Why combine the variables? Use multiple regression to predict the impact of each variable independently.

3. As has been stated in your other posts, is the difference between 1-2 the same as the difference between 4-5 in the scale.

Just trying to give you some food for thought.
 
W

Wicked

Food for thought is always welcome.
Comments on your points:
1) I realize that including the respondents which have not answered one or more questions might make for poor inference, but they account for about 0.1% of the total respondents, so I just thought it wouldn't matter much.

2) Combining the variables to an index was a prerequisite in the assignment, I used factor analysis and Cronbach's alpha to determine what variables to include in the index.

3) The scale goes from 1-5, but represents to what degree the respondents agree with a number of statements, hence 1 equals "does not agree at all" and 5 is "totally agrees". This means we cannot state that the difference between 1 and 2 is the same as between 3 and 4, or that 4 is twice as good as 2.

I'm leaning towards binary logistic regression, as I'm looking to find IF the explanatory variables has an impact on my response variable (which I then split in 2 to make it binary), and not to what degree the explanatory variables has an impact.
 
W

Wicked

An additional question: how to I remove rows of data that contains respondents who has not answered one or more questions?
 

Statistical Steven

Statistician
Leader
Super Moderator
Food for thought is always welcome.
Comments on your points:
1) I realize that including the respondents which have not answered one or more questions might make for poor inference, but they account for about 0.1% of the total respondents, so I just thought it wouldn't matter much.

It does not matter much. Just realize that you have that issue in the data.

2) Combining the variables to an index was a prerequisite in the assignment, I used factor analysis and Cronbach's alpha to determine what variables to include in the index.

Not questioning the index requirement. Try using the average response as the response. This will account for missing data.

3) The scale goes from 1-5, but represents to what degree the respondents agree with a number of statements, hence 1 equals "does not agree at all" and 5 is "totally agrees". This means we cannot state that the difference between 1 and 2 is the same as between 3 and 4, or that 4 is twice as good as 2.

I'm leaning towards binary logistic regression, as I'm looking to find IF the explanatory variables has an impact on my response variable (which I then split in 2 to make it binary), and not to what degree the explanatory variables has an impact.


You can use ANOVA making the independent variables as categorical. This will let you know if any of the levels have an impact on the response.

See my thoughts in blue.
 
Top Bottom