What assumptions and methods allow us to turn observations into causal knowledge, and how can even incomplete causal knowledge be used in planning and prediction to influence and control our environment? In this book Peter Spirtes, Clark Glymour, and Richard Scheines address these questions using the formalism of Bayes networks, with results that have been applied in diverse areas of research in the social, behavioral, and physical sciences. The authors show that although experimental and observational study designs may not always permit the same inferences, they are subject to uniform principles. They axiomatize the connection between causal structure and probabilistic independence, explore several varieties of causal indistinguishability, formulate a theory of manipulation, and develop asymptotically reliable procedures for searching over equivalence classes of causal models, including models of categorical data and structural equation models with and without latent variables. The authors show that the relationship between causality and probability can also help to clarify such diverse topics in statistics as the comparative power of experimentation versus observation, Simpson's paradox, errors in regression models, retrospective versus prospective sampling, and variable selection. The second edition contains a new introduction and an extensive survey of advances and applications that have appeared since the first edition was published in 1993.
{Provisional pitch} ~ slow-read, re-reading, to-be-read-again | very important. All three authors are recognized as leading names in theoretical and inferential statistics (in addition to being widely referenced, also, for their contribution to machine learning).
It is a great/valuable approach to the estimation of causal effects, a topic whose core problems consist � in, perhaps, the humblest graphical model � of something from X to Y, that’s to say a probability function mapping like (Y | cause(X = x)). A bit of manipulation of this dummie system can be trivial in so far as it is reasonable to trace back the class X to some value x, measure Y, conceive some ‘cause()� operator, and hence obtain a distribution.
If you cannot make up this lab or observational experiment, then you get a joint distribution for some value of covariates Z � meaning P(X, Y, Z, Zn, �)... which by itself, of course, is not even slightly enough to get a P(Y| cause(X=x)). Joint distribution, y’kno, doesn't identify a causal effect.
The treatement on which this work on causality&inference relies is such that a causal model can be identified, whenever possibile, for some varbile X that is said to be a cause of a variable Y if and only if Y depends on X for its value in some mathematically _explicit_ sense. Which in turn can be expanded in many ways (there’s a few efforts at catering to non professional or non-mathematicians), but the definition just exactly enacts that X is a cause of Y if Y decides its value in response to that X. Causation is then said to be transitive, irreflexive, and antisymmetric. This whole so far, as a broad concept, might be understandable in the end or even obvious, but in fact it’s essential to formally enstablish a realm where there’s no need to resorting to counterfactuals.
(Accordingly, say, you want to know the effect of X on Y and you’ve gotten around to find a set of V as control variables, then if (1.) V collides all paths-vector from X to Y with a pointer into X and (2.) if no node in V is a successor of X, then you’re sure all the items in your model are observational conditional probabilities. Meaning that V satisfies a back-door criterion; no counterfactuals required, indeed.)
*
Past the first two-three chapters on axioms, statistical indistiguishability and Causally Sufficient Structures, the authors attempt to scrutinize - and, more importantly, test - the notion that even correlations, while not implying causation, should nevertheless have some causal explaining power as unambiguous as possible (an idea, to be sure, as old as Simon's papers on computational complexity in decision making, i.e., ‘bounded rationality�), which is equal to adressing the following non-trivial task: define some classes of correlations, also multi-variable correlations, in order to add some restraints on them within the domain where a pattern of said correlations is legal.
This logic pretty much boils down to the twin notions of Markov equivalence and distribution equivalence. For instance, given an X = {X, Y, Z}, this model may generate a structure such that X � Y � Z, (its opposite) X � Y � Z, X � Y � Z, and X � Y � Z, from which one gathers that they represents only the statement that X and Z are conditionally independent given some Y. The first three structures, in fact, consists of an Y separating X from the third variable, and said variables are therfore independent of each other, conditional on Y. Drawing on Verma and Pearl, 1990, two model structures for some variable X are known to be Markov equivalent (meaning fairly indiscernible) if they can exhibit the same set of conditional-independence assertions for X (and the absence of conditional independency definitely suggests which way the causal links lie). So those model structures must be taken as equivalent. This very process of ‘orienting� correlation chains can be mathematically difficoult, tough to digest, but indeed the greatest value of this book is showing what a causal discovery can be, in all its complexity, as a computational and an inferential problem.
Other significant sections of the textbook are also devoted to the chief notion of partial identification - widely elaboretad by Manski, even thoug not openly mentioned - and partial correltation, a breath-of-fresh-air that always comes into play, because, no matter how unadulterated&perfect and informative data you have, very oftent it isn't yet enough to track some parameter down to an estimate-point value; what is feasible, on the other hand, is to charge some boundary conditions in order to see whether that parameter is at least theoretically identifiable (depending on the models one is willing to buy into and the gradient of ‘strenght� of the assumptions one is willing to make).
This is, needless to say, tremendously important in non-natural science and policy making problems. The greatest the number of parameters, the less they can be identified with credible assumptions; however, the authors happen to show that some partial identification limits, which are based on not so hard assumptions, can exist and be formally explicit (there are a wide lot of examples from a corpus of traditional assumption parameters - for instance, linear homogeneous curves in economics or instrumental variables or the like, which relies on relatively hard assumptions and leaves virtually no space to uncertainty).
Reading it mostly for consultation, this was a rather scattered and incoherent appraisal. I confess.
This book sets clear which parts of statistical causality are contributions of the CMU profs, Peter Spirtes, Clark Glymour, and Richard Scheines; which parts are of Judea Pearl; and which are other researchers. It is a book suitable for statistical background, and also it contains lots of details about causal discovery algorithms. E.g., both the SGS and PC algorithms are named after the authors.
Despite the amazing capability to discover causation from correlation under some conditions, the vision ahead is to loosen the assumptions or aligning the assumptions more with real-world scenarios.
Video lecture of the first author to introduce causal discovery at MIT: