ML in epidemiology: thoughts on Keil and Edwards 2018

Found a great paper (that I should already have known about, of course): Keil, A.P., Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur J Epidemiol 33, 437–440 (2018). https://doi.org/10.1007/s10654-018-0405-9

Here are some brief thoughts on this really enjoyable article, which I would recommend to philosophers of science, medicine, and epidemiology looking for interesting leads on the interaction between epidemiology and ML – as well as to the target audience, epidemiologists.

Here are some very brief, unfiltered thoughts.

  1. Keil and Edwards discuss an approach, “super learning”, that assembles the results of a bundle of different methods and returns the best (as defined by a user-specified, but objective, measure). In an example, they show how adding a method to that bundle can result in a worse result. Philosophically, this resonates with familiar facts about non-deductive reasoning, namely that as you add information, you can “break” and inference, whereas adding information to the premise set of a deductive argument does not invalidate the inference provided the additional information is consistent with what’s already there. Not sure what to make of the resonance yet, but it reminds me of counterexamples to deductive-nomological explanation – which is like ML in being formal.
  2. They point out that errors like this are reasonably easy for humans to spot, and conclude: “We should be cautious, however, that the billions of years of evolution and experience leading up to current levels of human intelligence is not ignored in the context of advances to computing in the last 30 years.” I suppose my question would be whether all such errors are easy for humans to spot, or whether only the ones we spot are easy to spot. Here, there is a connection with the general intellectual milieu around Kahneman and Tversky’s work on biases. We are indeed honed by evolution, but this leads us to error outside of our specific domain, and statistical reasoning is one well-documented error zone for intuitive reasoning. I’m definitely not disagreeing with their scepticism about formal approaches, but I’m urging some even-handed scepticism about our intuitions. Where the machine and the human disagree, it seems to me a toss-up who, if either, is right.
  3. The assimilation of causal inference to a prediction problem is very useful and one I’ve also explored. It deserves wider appreciation among just about everyone. What would be nice is to see more discussion about prediction under intervention, which, according to some, are categorically different from other kinds. Will machine learning prove capable of making predictions about what will happen under interventions? If so, will this yield causal knowledge as a matter of definition, or could the resulting predictions be generated in a way that is epistemically opaque? Interventionism in philosophy, causal inference in epidemiology, and the “new science of cause and effect” might just see their ideas put to empirical test, if epidemiology picks up ML approaches in coming years. An intervention-supporting predictive algorithm that does not admit of a ready causal interpretation would force a number of books to be rewritten. Of course, according to those books, it should be impossible; but the potency of a priori reasoning about causation is, to say the least, disputed.

From Judea Pearl’s blog: report of a webinar: “Artificial Intelligence and COVID-19: A wake-up call” #epitwitter @TheBJPS

Check the entry on Pearl’s blog which includes a write-up provided by the organisers

Video of the event is available too

“…regardless of government interventions [, after] around a two week exponential growth of cases (and, subsequently, deaths) some kind of break kicks in, and growth starts slowing down. The curve quickly becomes “sub-exponential”.

https://unherd.com/thepost/nobel-prize-winning-scientist-the-covid-19-epidemic-was-never-exponential/

Freddie Sayers of Unherd interviews Michael Levitt (a Nobel-prize-winning non-epidemiologist) on a purely statistical observations of the pattern of the epidemic. Given that the only way we have of measuring effectiveness of government interventions is statistical, that’s interesting. The fun stuff (epidemiological and statistical) comes in deciding whether the correlation is causal. But there’s been no progress with that, in my opinion; in fact for me it is here that the epidemiological profession has disappointed me – it is at if epidemiology has forgotten everything it ever taught itself about causal inference. Against that background, this is ought to give pause for thought.

Causal Inference: IJE Special Issue

Papers from the December 2016 special issue of IJE are now all available online. Several are open access, and I attach these.

Philosophers who want to engage with real life science, on topics relating to causation, epidemiology, and medicine, will find these papers a great resource. So will epidemiologists and other scientists who want or need to reflect on causal inference. Most of the papers are not written by philosophers, and most do not start from standard philosophical starting points. Yet the topics are clearly philosophical. This collection would also form a great starting point for a doctoral research projects in various science-studies disciplines.

Papers 1 and 2 were first available in January. Two letters were written in response (being made available online around April) along with a response and I have included these in the list for completeness. The remaining papers were written during the course of 2016 and are now available. Many of the authors met at a Radcliffe Workshop in Harvard in December 2016. An account of that workshop may be forthcoming at some stage, but equally it may not, since not all of the participants felt that it was necessary to prolong the discussion or to share the outcomes of the workshop more widely. At some point I might simply write up my own account, by way of part-philosophical, part-sociological story.

  1. Causality and causal inference in epidemiology: the need for  a pluralistic approach‘ Jan P Vandenbroucke, Alex Broadbent and Neil Pearce. doi: 10.1093/ije/dyv341
  2. ‘The tale wagged by the DAG: broadening the scope of causal inference and explanation for epidemiology.’ Nancy Krieger and George Davey-Smith. doi: 10.1093/ije/dyw114
    1. Letter: Tyler J. VanderWeele, Miguel A. Hernán, Eric J. Tchetgen Tchetgen, and James M. Robins. Letter to the Editor. Re: Causality and causal inference in epidemiology: the need for a pluralistic approach.
    2. Letter: Arnaud Chiolero. Letter to the Editor. Counterfactual and interventionist approach to cure risk factor epidemiology.
    3. Letter: Broadbent, A., Pearce, N., and Vandenbroucke, J. Authors’ Reply to: VanderWeele et al., Chiolero, and Schooling et al.
  3. ‘Causal inference in epidemiology: potential outcomes, pluralism and peer review.’ Douglas L Weed. doi: 10.1093/ije/dyw229
  4. ‘On Causes, Causal Inference, and Potential Outcomes.’ Tyler VanderWeele. doi: 10.1093/ije/dyw230
  5. ‘Counterfactual causation and streetlamps: what is to be done?’ James M Robins and Michael B Weissman. doi: 10.1093/ije/dyw231
  6. ‘DAGs and the restricted potential outcomes approach are tools, not theories of causation.’ Tony Blakely, John Lynch and Rebecca Bentley. doi: 10.1093/ije/dyw228
  7. ‘The formal approach to quantitative causal inference in epidemiology: misguided or misrepresented?’ Rhian M Daniel, Bianca L De Stavola and Stijn Vansteelandt. doi: 10.1093/ije/dyw227
  8. Formalism or pluralism? A reply to commentaries on ‘Causality and causal inference in epidemiology.’ Alex Broadbent, Jan P Vandenbroucke and Neil Pearce. doi: 10.1093/ije/dyw298
  9. ‘FACEing reality: productive tensions between our epidemiological questions, methods and mission.’ Nancy Krieger and George Davey-Smith. doi: 10.1093/ije/dyw330

Epidemiology and Law: two publications

Recently published:

Forensic Epidemiology, Principles and Practice. 2016. Freeman M and Zeegers M (eds). Eslevier. http://store.elsevier.com/Forensic-Epidemiology/isbn-9780124046443/

(I have a paper on causation and epidemiology.)

Also, previously online but now in print:

‘Tobacco and Epidemiology in Korea: old tricks, new answers?’ Broadbent A and Hwang Ss. Journal of Epidemiology and Community Health 2016;70:527-528. http://jech.bmj.com/content/70/6/527.full doi:10.1136/jech-2015-206567 [open access]

Paper: Causality and Causal Inference in Epidemiology: the Need for a Pluralistic Approach

Delighted to announce the online publication of this paper in International Journal of Epidemiology, with Jan Vandenbroucke and Neil Pearce: ‘Causality and Causal Inference in Epidemiology: the Need for a Pluralistic Approach

This paper has already generated some controversy and I’m really looking forward to talking about it with my co-authors at the London School of Hygiene and Tropical Medicine on 7 March. (I’ll also be giving some solo talks while in the UK, at Cambridge, UCL, and Oxford, as well as one in Bergen, Norway.)

The paper is on the same topic as a single-authored paper of mine published late 2015, ‘Causation and Prediction in Epidemiology: a Guide to the Methodological Revolution.‘ But it is much shorter, and nonetheless manages to add a lot that was not present in my sole-authored paper – notably a methodological dimension that, as a philosopher by training, I was ignorant. The co-authoring process was thus really rich and interesting for me.

It also makes me think that philosophy papers should be shorter… Do we really need the first 2500 words summarising the current debate etc? I wonder if a more compressed style might actually stimulate more thinking, even if the resulting papers are less argumentatively airtight. One might wonder how often the airtight ideal is achieved even with traditional length paper… Who was it who said that in philosophy, it’s all over by the end of the first page?

Paper – Tobacco in Korea

Alex Broadbent and Seung-sik Hwang, 2016. ‘Tobacco and epidemiology in Korea: old tricks, new answers?’ Journal of Epidemiology and Community Health doi:10.1136/jech-2015-206567.

Now available online first, open access.

http://jech.bmj.com/content/early/2016/01/14/jech-2015-206567.full

For those at the recent CauseHealth workshop N=1, this relates to the same key topic (viz. the application of population evidence to an individual), but in the legal rather than clinical context.

 

America Tour: Attribution, prediction, and the causal interpretation problem in epidemiology

Next week I’ll be visiting America to talk in Pittsburgh, Richmond, and twice at Tufts. I do not expect audience overlap so I’ll give the same talk in all venues, with adjustments for audience depending on whether it’s primarily philosophers or epidemiologists I’m talking to. The abstract is below. I haven’t got a written version of the paper that I can share yet but would of course welcome comments at this stage.

ABSTRACT

Attribution, prediction, and the causal interpretation problem in epidemiology

In contemporary epidemiology, there is a movement, part theoretical and part pedagogical, attempting to discipline and clarify causal thinking. I refer to this movement as the Potential Outcomes Aproach (POA). It draws inspiration from the work of Donald Ruben and, more recently, Judea Pearl, among others. It is most easily recognized by its use of Directed Acycylic Graphs (DAGs) to describe causal situations, but DAGs are not the conceptual basis of the POA in epidemiology. The conceptual basis (as I have argued elsewhere) is a commitment to the view that the hallmark of a meaningful causal claim is that they can be used to make predictions about hypothetical scenarios. Elsewhere I have argued that this commitment is problematic (notwithstanding the clear connections with counterfactual, contrastive and interventionist views in philosophy). In this paper I take a more constructive approach, seeking to address the problem that troubles advocates of the POA. This is the causal interpretation problem (CIP). We can calculate various quantities that are supposed to be measures of causal strength, but it is not always clear how to interpret these quantities. Measures of attributability are most troublesome here, and these are the measures on which POA advocates focus. What does it mean, they ask, to say that a certain fraction of population risk of mortality is attributable to obesity? The pre-POA textbook answer is that, if obesity were reduced, mortality would be correspondingly lower. But this is not obviously true, because there are methods for reducing obesity (smoking, cholera infection) which will not reduce mortality. In general, say the POA advocates, a measure of attributability tells us next to nothing about the likely effect of any proposed public health intervention, rendering these measures useless, and so, for epidemiological purposes, meaningless. In this paper I ask whether there is a way to address and resolve the causal interpretation problem without resorting to the extreme view that a meaningful causal claim must always support predictions in hypothetical scenarios. I also seek connections with the notorious debates about heritability.

Workshop, Helsinki: What do diseases and financial crises have in common?

AID Forum: “Epidemiology: an approach with multidisciplinary applicability”

(Unfamiliar with AID forum? For the very idea and the programme of Agora for Interdisciplinary Debate, see www.helsinki.fi/tint/aid.htm)

DISCUSSED BY:

Mervi Toivanen (economics, Bank of Finland)

Jaakko Kaprio (genetic epidemiology, U of Helsinki)

Alex Broadbent (philosophy of science, U of Johannesburg)

Moderated by Academy professor Uskali Mäki

Session jointly organised by TINT (www.helsinki.fi/tintand the Finnish Epidemiological Society (www.finepi.org)

TIME AND PLACE:

Monday 9 February, 16:15-18

University Main Building, 3rd Floor, Room 5

http://www.helsinki.fi/teknos/opetustilat/keskusta/f33/ls5.htm

TOPIC: What do diseases and financial crises have in common?

Epidemiology has traditionally been used to model the spreading of diseases in populations at risk. By applying parameters related to agents’ responses to infection and network of contacts it helps to study how diseases occur, why they spread and how one could prevent epidemic outbreaks. For decades, epidemiology has studied also non-communicable diseases, such as cancer, cardiovascular disease, addictions and accidents. Descriptive epidemiology focuses on providing accurate information on the occurrence (incidence, prevalence and survival) of the condition. Etiological epidemiology seeks to identify the determinants be they infectious agents, environmental or social exposures, or genetic variants. A central goal is to identify determinants amenable to intervention, and hence prevention of disease.

There is thus a need to consider both reverse causation and confounding as possible alternative explanations to a causal one. Novel designs are providing new tools to address these issues. But epidemiology also provides an approach that has broad applicability to a number of domains covered by multiple disciplines. For example, it is widely and successfully used to explain the propagation of computer viruses, macroeconomic expectations and rumours in a population over time.

As a consequence, epidemiological concepts such as “super-spreader” have found their way also to economic literature that deals with financial stability issues. There is an obvious analogy between the prevention of diseases and the design of economic policies against the threat of financial crises. The purpose of this session is to discuss the applicability of epidemiology across various domains and the possibilities to mutually benefit from common concepts and methods.

QUESTIONS:

1. Why is epidemiology so broadly applicable?

2. What similarities and differences prevail between these various disciplinary applications?

3. What can they learn from one another, and could the cooperation within disciplines be enhanced?

4. How could the endorsement of concepts and ideas across disciplines be improved?

5. Can epidemiology help to resolve causality?

READINGS:

Alex Broadent, Philosophy of Epidemiology (Palgrave Macmillan 2013)

http://www.palgrave.com/page/detail/?sf1=id_product&st1=535877

Alex Broadbent’s blog on the philosophy of epidemiology:

https://philosepi.wordpress.com/

Rothman KJ, Greenland S, Lash TL. Modern Epidemiology 3rd edition.

Lippincott, Philadelphia 2008

D’Onofrio BM, Lahey BB, Turkheimer E, Lichtenstein P. Critical need for family-based, quasi-experimental designs in integrating genetic and social science research. Am J Public Health. 2013 Oct;103 Suppl 1:S46-55. doi:10.2105/AJPH.2013.301252.

Taylor, AE, Davies, NM, Ware, JJ, Vanderweele, T, Smith, GD & Munafò, MR 2014, ‘Mendelian randomization in health research: Using appropriate genetic variants and avoiding biased estimates’. Economics and Human Biology, vol 13., pp. 99-106

Engholm G, Ferlay J, Christensen N, Kejs AMT, Johannesen TB, Khan S, Milter MC, Ólafsdóttir E, Petersen T, Pukkala E, Stenz F, Storm HH. NORDCAN: Cancer Incidence, Mortality, Prevalence and Survival in the Nordic Countries, Version 7.0 (17.12.2014). Association of the Nordic Cancer Registries. Danish Cancer Society. Available from http://www.ancr.nu.

Andrew G. Haldane, Rethinking of financial networks; Speech by Mr Haldane, Executive Director, Financial Stability, Bank of England, at the Financial Student Association, Amsterdam, 28 April 2009: http://www.bis.org/review/r090505e.pdf

Antonios Garas et al., Worldwide spreading of economic crisis: http://iopscience.iop.org/1367-2630/12/11/113043/pdf/1367-2630_12_11_113043.pdf

Christopher D. Carroll, The epidemiology of macroeconomic expectations: http://www.econ2.jhu.edu/people/ccarroll/epidemiologySFI.pdf

Is the Methodological Axiom of the Potential Outcomes Approach Circular?

Hernan, VanderWeele, and others argue that causation (or a causal question) is well-defined when interventions are well-specified. I take this to be a sort of methodological axiom of the approach.

But what is a well-specified intervention?

Consider an example from Hernan & Taubman’s influential 2008 paper on obesity. In that paper, BMI is shown up as failing to correspond to a well-specified intervention; better-specifed interventions include one hour of strenuous physical exercise per day (among others).

But what kind of exercise? One hour of running? Powerlifting? Yoga? Boxing?

It might matter – it might turn out that, say, boxing and running for an hour a day reduce BMI by similar amounts but that one of them is associated with longer life. Or it might turn out not to matter. Either way, it would be a matter of empirical inquiry.

This has two consequences for the mantra that well-defined causal questions require well-specified interventions.

First, as I’ve pointed out before on this blog, it means that experimental studies don’t necessarily guarantee well-specified interventions. Just because you can do it doesn’t mean you know what you are doing. The differences you might think don’t matter might matter: different strains of broccoli might have totally different effects on mortality, etc.

Second, more fundamentally, it means that the whole approach is circular. You need a well-specified intervention for a good empirical inquiry into causes and you need good empirical inquiry into causes to know whether your intervention is well-specified.

To me this seems to be a potentially fatal consequence for the claim that well-defined causal questions require well-specified interventions. For if that were true, we would be trapped in a circle, and could never have any well-specified interventions, and thus no well-defined causal questions either. Therefore either we really are trapped in that circle; or we can have well-defined causal questions, in which case, it is false that these always require well-specified interventions.

This is a line of argument I’m developing at present, inspired in part by Vandebroucke and Pearce’s critique of the “methodological revolution” at the recent WCE 2014 in Anchorage. I would welcome comments.