Choosing a Statistical Test for dissertation results!

L

Luke Wilkinson

Hi all

You’ll have to forgive my rubbish stats knowledge and the length of this post but I would be very grateful of any help.

I’m conducting a study on the urban heat island effect of my hometown for my undergrad dissertation. To give a bit of background, city centres and urbanised places are typically warmer than suburbs/rural zones due to many factors such as greater population densities (heat through appliance use and metabolism), greater proportion of heat retentive materials (asphalt etc.), and decreased vegetation (less heat dissipated through evapotranspiration).
I’m at the analysis stage now. I’ve already established a relationship between temperature and distance from the town centre which was not a problem because they were both continuous variables. The second part of my analysis is examining how the causal variables for which distance acts as a surrogate (i.e. vegetation, population density, land use, building height, building density etc.) are related to temperature. The problem is that I recorded these variables in the field via qualitative methods. An example question from my data collection booklet is “How dense are the buildings in the area?” and pre-determined responses were “Dense”, “Intermediate”, “relatively sparse” , “sparse” and “no buildings”. So I captured data in this way for many sites whilst simultaneously recording temperature. When I finished data collection I wanted to establish statistical relationships between the causal variables and temperature in excel. I decided that I’d have to give the qualitative responses numerical values. Sticking with the example of building density above, a typical attribution might have been this: no buildings (1) sparse (2), relatively sparse (3), intermediate (4), and dense (5). I’d then list these numbers next to the temperatures with which they co-existed. Day 3 looked like this:

Density
Temperature​
5
19.46​
4
18.66​
4
17.33​
2
15.34​
1
16.03​

When I handed in my draft analysis I had done loads of correlations, scatters and regressions between data like that shown above. My advisor wasn’t sure if this was right though and questioned if this could be done when one of the variables was categorical. This brings me, finally, to my questions. Are my causal variables (i.e. building density) definitely categorical when presented in this way or is there continuation between them? If they are categorical, what would be the best tests to use to establish the strength and significance of their relationship with temperature? I’ve read a bit about dummy variables but that seems very complicated when there are so many categories within the one variable. Could I use a t-test instead or would I have to change to a binary code even with that??
 

Miner

Forum Moderator
Leader
Admin
The manner in which you have set these up have created ordinal variables. I recommend trying ordinal logistic regression.

Note: One problem that I see with studies of this type is the use of excessively large sample sizes. When sample sizes are extremely large, any test will show significance. The correct approach is to select the test that will be used prior to data collection. The delta, or the size of a difference of practical significance, should be determined prior to data collection. This is then used to determine the appropriate sample size. That is why you see so many studies that say eating this or that will increase your chance of cancer by 1%, to which you yawn and turn the page.
 
Last edited:
N

NumberCruncher

Hi Luke

There is a problem with carrying out simple pairwise comparisons with data like this.

Multicoliniarity.

Scary word, simple(ish) meaning.

You have plotted the relationship between temperature and distance from urban centre. Good.

Next you plot a relationship between temperature and building density. Good, except...

Doesn't building density almost by definition, depend on the distance from the urban centre?

So what is your relationship between temperature and building density telling you? Is it that temperature goes down with building density? Or is the relationship telling you that the further from the centre you are, the lower the temperature AND the lower the building density.

I'll take a counter example. For a moment, suppose that you plotted a correlation between density of television arials and temperature. I strongly suspect that you would find a positive correlation. Why? Because the number of tv arials is about 1 per building, and the building density goes down with distance from urban centre. However, the temperature also goes down with distance from urban centre. If you ignore that fact, you conclude that tv arial density affects the temperature.

You need to check that you are not just plotting the same thing twice, but with different names on the 'independent' axis (Building density or distance from urban centre).

NC
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
ANOVA, using the 5 categories as 5 "treatments" would probably work statistically though you would not be able to take advantage of paired testing.

Another reasonable option is to plot and analyze the delta from the average temperature for the 5 treatments in a given set of data to the actual temperature for that specific treatment. Analyze across the days using ANOVA.

Still, be aware of the warnings of the previous two postings
 
L

Luke Wilkinson

Thanks a million guys, you've saved me from statistical purgatory. I've taken all these comments on board and will get back to the drawing board (SPSS) shortly.

Luke
 
Top Bottom