L
Luke Wilkinson
Hi all
You’ll have to forgive my rubbish stats knowledge and the length of this post but I would be very grateful of any help.
I’m conducting a study on the urban heat island effect of my hometown for my undergrad dissertation. To give a bit of background, city centres and urbanised places are typically warmer than suburbs/rural zones due to many factors such as greater population densities (heat through appliance use and metabolism), greater proportion of heat retentive materials (asphalt etc.), and decreased vegetation (less heat dissipated through evapotranspiration).
I’m at the analysis stage now. I’ve already established a relationship between temperature and distance from the town centre which was not a problem because they were both continuous variables. The second part of my analysis is examining how the causal variables for which distance acts as a surrogate (i.e. vegetation, population density, land use, building height, building density etc.) are related to temperature. The problem is that I recorded these variables in the field via qualitative methods. An example question from my data collection booklet is “How dense are the buildings in the area?” and pre-determined responses were “Dense”, “Intermediate”, “relatively sparse” , “sparse” and “no buildings”. So I captured data in this way for many sites whilst simultaneously recording temperature. When I finished data collection I wanted to establish statistical relationships between the causal variables and temperature in excel. I decided that I’d have to give the qualitative responses numerical values. Sticking with the example of building density above, a typical attribution might have been this: no buildings (1) sparse (2), relatively sparse (3), intermediate (4), and dense (5). I’d then list these numbers next to the temperatures with which they co-existed. Day 3 looked like this:
Density
When I handed in my draft analysis I had done loads of correlations, scatters and regressions between data like that shown above. My advisor wasn’t sure if this was right though and questioned if this could be done when one of the variables was categorical. This brings me, finally, to my questions. Are my causal variables (i.e. building density) definitely categorical when presented in this way or is there continuation between them? If they are categorical, what would be the best tests to use to establish the strength and significance of their relationship with temperature? I’ve read a bit about dummy variables but that seems very complicated when there are so many categories within the one variable. Could I use a t-test instead or would I have to change to a binary code even with that??
You’ll have to forgive my rubbish stats knowledge and the length of this post but I would be very grateful of any help.
I’m conducting a study on the urban heat island effect of my hometown for my undergrad dissertation. To give a bit of background, city centres and urbanised places are typically warmer than suburbs/rural zones due to many factors such as greater population densities (heat through appliance use and metabolism), greater proportion of heat retentive materials (asphalt etc.), and decreased vegetation (less heat dissipated through evapotranspiration).
I’m at the analysis stage now. I’ve already established a relationship between temperature and distance from the town centre which was not a problem because they were both continuous variables. The second part of my analysis is examining how the causal variables for which distance acts as a surrogate (i.e. vegetation, population density, land use, building height, building density etc.) are related to temperature. The problem is that I recorded these variables in the field via qualitative methods. An example question from my data collection booklet is “How dense are the buildings in the area?” and pre-determined responses were “Dense”, “Intermediate”, “relatively sparse” , “sparse” and “no buildings”. So I captured data in this way for many sites whilst simultaneously recording temperature. When I finished data collection I wanted to establish statistical relationships between the causal variables and temperature in excel. I decided that I’d have to give the qualitative responses numerical values. Sticking with the example of building density above, a typical attribution might have been this: no buildings (1) sparse (2), relatively sparse (3), intermediate (4), and dense (5). I’d then list these numbers next to the temperatures with which they co-existed. Day 3 looked like this:
Density
Temperature
5 19.46
4 18.66
4 17.33
2 15.34
1 16.03
When I handed in my draft analysis I had done loads of correlations, scatters and regressions between data like that shown above. My advisor wasn’t sure if this was right though and questioned if this could be done when one of the variables was categorical. This brings me, finally, to my questions. Are my causal variables (i.e. building density) definitely categorical when presented in this way or is there continuation between them? If they are categorical, what would be the best tests to use to establish the strength and significance of their relationship with temperature? I’ve read a bit about dummy variables but that seems very complicated when there are so many categories within the one variable. Could I use a t-test instead or would I have to change to a binary code even with that??