Correlation and Causation - Causation seems to be Ambiguous and Slippery


A Sea of Statistics
Good Day To All,

I've searched with little success for a post which lays out the method or methods for establishing causation.

Correlation, is not the problem, rather it is establishing causation.

I know this is one of the possible abuses of conducting correlation, as well as extending the correlation of inputs and outputs outside the observed data range.

Causation seems to be a more ambiguous and slippery effort.

Any insight would be appreciated.



My blue coffee cup is an excellent bear repellant. I have been using it for a year and have never seen a bear, therefore it must be keeping them out of my office. It also shows promise at preventing asteroid strikes, tornadoes, earthquakes and alien invasions.

You need carefully designed experiments to establish causation. Search for "Design of Experiments". Sadly most of the world does not get it as you can see by picking up any newspaper.


Lol, tomvehoski!

My coffee cup is equally effective against bears but fails at repelling 'bares' at the beach which are frequently repellant all on their own.

Seriously, aside for DOE to catch interactions, one of the good contributions over the last few years has been the GM drill deep / drill wide discipline for root cause analysis and deployment of corrective action to similar product/process lines. This is an interative 5y question discipline (old school) broken out by categories of Predict / Prevent / Protect and incorporation into Lessons-Learned.

The value of the contribution is that we need to understand why our systems failed any of the categories though for some organizations, the last one should be re-titled Lessons-ReLearned, and ReLearned....

Bev D

Heretical Statistician
Staff member
Super Moderator
John Tukey laid out 3 requirements for claiming causation:
  1. consistency: more than 1 pair of independent samples (a min of 3 pairs)
  2. responsiveness: The cause must explain both the good & the bad result (or the high and the low) and you must be able to turn it on & off at will.
  3. mechanism: logical reason why the factor causes changes in The Y

I actually find that there are 3 large constraints to successfully determining causes and not just correlation:
  1. very few people understand the concept of independent samples and therefore too many experiments are run by holding too many factors 'constant'. They achieve a great p value and even a grat r^2 value but the variation they see in their experimental results is small compared to the total variation so the completely miss that the factor they 'chose' isn't the primary contributor.
  2. experimenters forget that you can never prove anything: you can only disprove things. Experiments should be structured with the intent of disproving things. From a practical standpoint, once you have disproven all but one of the suspect factors the factor that remains must be the causal factor
  3. the function of the process and/or product and/or system is not well understood and therefore the causal mechanism can't be understood.


A Sea of Statistics
To All The Excellent Respondents:

Where might I find this bear or bare repellent...thanks for the well-crafted and timely levity on a dismal February day in MI.

So, it is experience, intellect, an intimate understanding of ones process inputs/outputs and above all independent samples in a properly constructed experiment...voila success!

I get the message...the keen insight and rye humor on these forums is welcome.

Finally, Tomvehoski, will this "ACME" Coffee Cup/Bear Repellent end the “Lions Legacy of Losing? Will it take them to the big dance?



Staff member

I might also suggest that one can support causation through theory. Given some level of face validity, does it make sense that the two measured items are related?

Is there some theory where you are trying to support your case? I suggest the theory, as it will allow you to better identify if there is another variable (confounding variable) at work. Too, you might have some moderating/mediating variables.

Also, how many data points are you working with? One data point can yield really high correlation. :tg:


  1. Build your theoretical case. What are the items of interest?
  2. Gather as much relevant data as possible.
  3. Lay it out and see what you get.
  4. Repeat 1-3 as much as possible, improving your measures (and theory) as you go along.
Given the data, see if you can develop a useful, predictive tool for your purpose. And.... away you go!:D


optomist1;420643[COLOR=black said:
Finally, Tomvehoski, will this "ACME" Coffee Cup/Bear Repellent end the “Lions Legacy of Losing? Will it take them to the big dance? Regards,
Actually I think that one comes close to being able to prove causation. Players, coaches, stadiums, cities, uniforms, turf, anthem singers,.... have all been varied over the years. The one constant has been ownership - which I consider the cause of the problem. Can't prove it though.


A Sea of Statistics
Well put Tom...we live in a "Land of Stark Contrasts"; the Red Wings arguably the best professional sports franchise....and the lions with little doubt over the past forty-years the worst professional sports franchise.

The common denominator...ownership, sell the Lions nows.

Top Bottom