A Tale of Two Papers

I’m on my way back from the World Epi Congress in Anchorage, where causation and causal inference have been central topics of discussion. I wrote previously about a paper (Hernan and Taubman 2008) suggesting that obesity is not a cause of mortality. There is another, more recent paper published in July of this year, suggesting, more or less, that race is not a cause of health outcomes – or at least that it’s not a cause that can feature in causal models (Vanderweele and Robinson 2014). I can’t do justice to the paper here, of course, but I think this is a fair, if crude, summary of the strategy.

This paper is an interesting comparator for the 2008 obesity paper (Hernan and Taubman 2008). It shares the idea that there is a close link between (a) what can be humanly intervened on, (b) what counterfactuals we can entertain, and (c) what causes we can meaningfully talk about. This is a radical view about causation, much stronger than any position held by any contemporary philosopher of whom I’m aware. Philosophers who do think that agency or intervention are central to the concept of causation treat the interventions as in-principle ones, not things humans could actually do.

Yet feasibility of manipulating a variable really does seem to be a driver in this literature. In the paper on race, the authors consider what variables form the subject of humanly possible interventions, and suggest that rather than ask about the effect of race, we should ask what effect is left over after these factors are modelled and controlled for, under the umbrella of socioeconomic status. That sounds to me a bit like saying that we should identify the effects of being female on job candidates’ success by seeing what’s left after controlling for skirt wearing, longer average hair length, shorter stature, higher pitched voice, female names, etc. In other words, it’s very strange indeed. Perhaps it could be useful in some circumstances, but it doesn’t really get us any further with the question of interest – how to quantify the health effects of race, sex, and so forth.

Clearly, there are many conceptual difficulties with this line of reasoning. A good commentary was published with the paper (Glymour and Glymour 2014) which really dismantles the logic of the paper. But I think there are a number of deeper and more pervasive misunderstandings to be cleared up, misunderstandings which help explain why papers like this are being written at all. One is confusion between causation and causal inference; another is confusion between causal inference and particular methods of causal inference; and a third is a mix-up between fitting your methodological tool to your problem, and your problem to your tool.

The last point is particularly striking. What’s so interesting about these two papers (2008 & 2014) is that they seem to be trying to fit research problems to methods, not trying to develop methods to solve problems – even though this is ostensibly what they (at least VW&R 20114) are trying to do. To me, this is strongly reminiscent of Thomas Kuhn’s picture of science, according to which an “exemplary” bit of science occurs, and initiates a “paradigm”, which is a shared set of tools for solving “puzzles”. Kuhn was primarily influenced by physics, but this way of seeing things seems quite apt to explain what is otherwise, from the outside, really quite a remarkable, even bizarre about-turn. Age, sex, race – these are staple objects of epidemiological study as determinants of health; and they don’t fit easily into the potential outcomes paradigm. It’s fascinating to watch the subsequent negotiation. But I’m quite glad that it doesn’t look like epidemiologists are going to stop talking about these things any time soon.

References

Glymour C and Glymour MR. 2014. ‘Race and Sex Are Causes.’ Epidemiology 25 (4): 488-490.

Hernan M and Taubman S. 2008. ‘Does obesity shorten life? The importance of well-defined interventions to answer causal questions.’ International Journal of Obesity 32: S8–S14.

VanderWeele TJ and Robinson WR. 2014. ‘On the Causal Interpretation of Race in Regressions Adjusting for Confounding and Mediating Variables.’ Epidemiology 25(4): 473-484.

Snakes, statistics, and goals for the goal-setters

Cesar Victora gave a very interesting talk earlier today concerning the International Epidemiology Association’s position paper on the UN’s Sustainable Development Goals, which are currently being drafted (to replace the Millennium Development Goals post-2015). Victora is President of the IEA, for a few more hours at least (the new President takes office this evening). Many of his points were reiterated by the next speaker, Theodor Abelin, and in questions from the floor. There were no audible voices of dissent. (The talk reflects a fuller position paper, available here.)

The point that stayed with me most from Victora’s rich talk was the importance of relating goals to appropriate measurement techniques. My own interest in epidemiology has tended to focus on efforts to identify causes (“analytic” epidemiology), since causation is a natural magnet for philosophical interest. But measurement is also a focus of philosophical interest, and Victora nicely pointed out that “descriptive” epidemiology – the business of measuring things like maternal mortality rate, for example – is extremely important if these Sustainable Development Goals are to be effective. A country cannot be held to a goal that cannot be measured, and it cannot be fairly be held to a goal when progress towards that goal is estimated rather than measured.

For example, I was not surprised to learn that in many countries where maternal mortality is high, data on maternal mortality rates (MMRs) are scarce. What did surprise me was hearing about the calculations that some august international organisations perform in the absence of data. A calculation is performed involving GDP per capita, general fertility rate and skilled birth attendance. MMR is estimated as a function of these and perhaps some other similar variables. This means that if the country goes through a recession, the estimated MMR will automatically go up. – Perhaps is really will go up, but it seems strange to think of that calculation as a measurement, at least in the absence of extremely good evidence for the reliability of the estimating equation – evidence which, of course, we don’t have.

MMR is measurable, of course. The problem with MMR is simply a lack of data, and this problem afflicts a large class of conditions. As Victora put it in relation to snakebite: “Where we have snakes, we don’t have statistics, and where we have statistics, we don’t have snakes.”

However, Victora’s most penetrating critique of the SDGs concerned the setting of goals in the absence of clear ideas about how progress towards the goals will be measured. The health-related goal is as follows:

Goal 3. Ensure healthy lives and promote well-being for all at all ages” (from the Outcome Document)

This overarching goal is broken down into 13 subgoals, some of which are very loosely specified. For instance, how are we to tell whether a country has managed to “strengthen prevention and treatment of substance abuse, including narcotic drug abuse and harmful use of alcohol”? Ironically, those goals that are most clearly specified are wildly unattainable, such as halving global deaths and injuries from road traffic accidents by 2020. Those that are not well specified present measurement challenges for epidemiologists.

This made me wonder whether a body like the IEA could itself set some “goals for the goal-setters” – that is, criteria which any health-related goal must meet if, in the professional opinion of the IEA, they are to be useful. The simplest such criterion would be that outcomes must be specified in terms of a recognised epidemiological measure (mortality, for instance). Another might be to accompany each goal with information (perhaps in a corresponding entry in an appendix) concerning the trend over the past similar period: so if the goal is the halve road traffic deaths in 15 years, or 25, information on the growth of road traffic deaths over the past 15 or 25 years might be included. Goals of this kind will always be political, but there might be agreement on a set of simple rules for setting such goals, and if such rules existed, this might pull epidemiologists closer in to the goal-setting process – a kind of politicking which, as one of the questioners pointed out, is not part of standard epidemiological training.

 

Potential Outcomes: Separating Insight from Ideology

I’m in Anchorage, preparing for the World Congress of Epidemiology. One of the sessions I’m speaking at is a consultation for the next edition of the Dictionary of Epidemiology. It’s a strange and delightful document, this Dictionary: since it sets out to define not only individual words but also the discipline of epidemiology as a whole. Thus it contains both mundane and metaphysics entries, from “death certificate” to “causality”. I’m billed to talk about “Defining Measures of Causal Strength”. There’s a lot to say: the current entries under causal-related terms could use some disciplining. But I’m particularly interested in orienting myself with regards to the “potential outcomes” view of causation, which seems to be the current big thing among epidemiologists.

The potential outcomes view is associated in particular with Miguel Hernan, a very smart epidemiologist at Harvard, and he has a number of nice papers on it. (I hope I don’t need to say that what follows is not a personal attack: I have great respect for Hernan, and am stimulated by his work. I’m just taking his view as exemplary of the potential-outcomes approach, in the way that philosophers typically do.)

In particular I’ve been engaged in a close reading of a paper on obesity by Hernan and Taubman (2008). Their view, as expressed in that paper, is an interesting mix of pragmatism and idealism. On the one (pragmatic) hand, they argue that causal questions are often ill-formed, and thus unanswerable. There is no answer to the question “What is the effect of body-mass index (BMI) on all-cause mortality?” because the different ways to intervene on BMI may result in different effects on mortality. Diet, exercise, a combination of diet and exercise, smoking, chopping off a limb – these are all ways to reduce BMI. Until we have specified which intervention we have in mind, we cannot meaningfully quantify the contribution of BMI to mortality.

This much is highly reminiscent of contrastivist theories of causation in philosophy. Contrastivist theories take causation to consist in counterfactual dependence, but differ from counterfactual theories in taking the form of causal statements to be implicitly contrastive: not “c causes e” but “c rather than C* causes e rather than E*”, where C* and E* are classes of events that could occur in the absence of c and e respectively. Against this background, Hernan and Taubman’s point is simply that, for an epidemiological investigator, it matters what contrast class we have in mind when we seek to estimate the size of an effect. This is a good point, especially in a context where one hopes to act on a causal finding. One had better be sure that one knows, not only that there is a causal connection between a given exposure and outcome, but also what will happen if a given intervention replaces the factor under investigation. I have called the failure to appreciate this point The Causal Fallacy and linked it to easy errors in prediction (see this previous post and Broadbent 2013, 82).

But there is another more troubling side to the view as it is expressed in this paper: that randomized controlled trials offer a protection against this error, and somehow force us to specify our interventions precisely. The argument for this claim is striking, but on reflection I fear it is specious.

Hernan and Taubman make a striking point: they say that an observational study might appear to be able to answer the question “What is the effect of BMI on all-cause mortality?” via a statistical analysis of data on BMI and mortality, while randomized controlled trials would not be able to answer this question directly: they would only be able to answer questions like: “What is the effect of reducing BMI via dietary interventions? / via exercise? / via both?” This apparent shortcoming of RCTs is, of course, a strength in disguise: the observational study is in fact not so informative, since it does not distinguish the effects of different ways of reducing BMI; while the RCTs do give us this information.

This argument is fallacious, however, for the following reasons.

  1. An observational study that includes the same information as the RCTs on the methods of reducing BMI would also be able to distinguish between the effects of these interventions.
  2. It is true that one could conduct an observational study which ignored the possibility that different methods of reducing BMI might themselves have affect mortality. But that would be a bad study, since it would ignore the effects of known confounders. A good study would take these things into account.
  3. Conversely, it is a mistake to suppose that RCTs offer protection against this sort of error. The BMI case is a special one, precisely because there are so many ways to intervene to reduce BMI and we know that these could affect mortality. In truth, there are many ways to make any intervention. One may take a pill or a capsule or a suppository, on the equator or in the tropics, before or after a meal, and so on. Even in an RCT, the intervention is not fully specified. Rather, we simply assume that the differences don’t matter, or that if they do, they are “cancelled out” by the randomisation process.
  4. Randomized controlled trials are not controlled in the manner of true controlled experiments; rather, randomization is a surrogate for controlling. We hope that all the many differences between the circumstances of each intervention in the treatment group will either have no effect or, if they do, will have effects that are randomly distributed so as not to obscure the effect of the treatment. But in principle, it is still possible that this hope is not fulfilled. At a p-value of 0.05 this will happen in one RCT in 20; and perhaps more often in published RCTs, given publication bias (i.e. the fact that null results are harder to publish).

These are familiar points in the philosophical literature on randomised controlled trials (see esp. Worrall 2002). The point I wish to pull out is this. On the one hand, Hernan’s emphasis on getting a well-defined contrastive question is insightful and important. But on the other hand, it is wrong to think that RCTs solve the problem. True, in an RCT you must make an intervention. But it does not follow that one’s intervention is well-specified. There might be all sorts of features of the particular way that you intervene that could skew the results. And conversely, plug the corresponding “how it happened” info into a cohort study, and you will be able to obtain the same sorts of discrimination between these methods.

On top of all this, the focus on the methods of individual studies obscures the most important point of all: that convincing evidence comes from a multitude of studies. Just as an RCT allows us to assume that differences between individuals are evenly distributed and thus ignorable, so a multitude of methodologically inferior studies can provide very strong evidence if their methodological shortcomings are different. This is the kind of situation Hill responded to with his guidelines (NOT criteria!) for inferring causality (Hill 1965). Similarly, ad hoc arguments against each possible alternative explanation can add up to a compelling case, as in the classic paper by Cornfield and colleagues on smoking and lung cancer (Cornfield et al 1959). The recent insights of the potential outcomes approach are valuable and important, but they augment rather than replace these familiar, older insights.

References

Broadbent, A. 2013. Philosophy of Epidemiology. Basingstoke and New York: Palgrave Macmillan.

Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB and Wynder EL. 1959. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22: 173-203.

Hernan, MA and Taubman, SL. 2008. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. International Journal of Obesity 32: S8-S14.

Hill, Austin Bradford. 1965. The environment and disease: association or causation? Proceedings of the Royal Society of Medicine 58: 259-300.

Worrall, J. 2002. What Evidence in Evidence-Based Medicine? The British Journal of the Philosophy of Science 58: 451-488.

 

Stability: an epidemiological ingredient in the realism debate?

I’m preparing a talk on stability for the New Thinking in Scientific Realism Conference that opens in Cape Town tomorrow. I introduced the notion of stability in my book, defined like this:

“A result, claim, theory, inference, or other scientific output is stable if and only if

(a) in fact, it is not soon contradicted by good scientific evidence; and

(b) given best current scientific knowledge, it would probably not be soon contradicted by good scientific evidence, if good research were done on the topic.” (Broadbent 2013, 63)

The introduction of this notion was a response to the perceived difficulties around “translating” epidemiological (or more generally biomedical) findings into good health policy. At Euroepi in Porto, 2012, I argued that translation was not the main or only difficulty for using epidemiological results, and that stability – or rather, the lack of it – was important. After all, one cannot comfortably rely on a result if one cannot be confident that the next study won’t completely contradict it, and that seems to happen pretty often in at least some areas of epidemiological investigation.

Thus the reasons for introducing the notion were thoroughly practical. More recently, though, I have been trying to tighten up the philosophical credentials of the notion, and that’s what I’m going to be talking about in Cape Town. Is stability epistemically significant? Can it be shown to be epistemically significant without collapsing into approximate truth? Can it be distinguished from approximate truth without collapsing into empirical adequacy? These are the questions I will seek to answer.

What’s interesting for me is that, as far as I can see, it’s pretty easy to answer these questions affirmatively. If I’m right about that, then this will be a nice case where studying actual science gives rise to new philosophical insights. The desire to make public health policy that will not have to be revised six months down the line is eminently practical; yet the proposal of a status that scientific hypotheses might have, distinct from truth and empirical adequacy and all the rest, is eminently abstract. If stability really is both defensible and novel, then it will illustrate the oft-repeated mantra that philosophers of science would benefit from looking more closely at science. I am personally put on guard when I hear that said, not because I disagree in principle, but because experience has taught me to suspect either lip service, or an excuse for poor philosophy. Perhaps I’m also guilty of one or both of these; I will be interested to see what Cape Town says.