欧宝娱乐

Athan Tolis's Reviews > The Book of Why: The New Science of Cause and Effect

The Book of Why by Judea Pearl
Rate this book
Clear rating

by
18185192
's review

it was amazing
bookshelves: math-science, favorites

My son George鈥檚 first language is Japanese.

His first annoying habit, which raised its head very soon after he was granted the gift of speech, was to answer every request / question / casual comment with 鈥渄oshte?鈥�

鈥淒oshte,鈥� you guessed it, is Japanese for 鈥渨hy?鈥�

This, Judea Pearl argues very persuasively in this book, is 鈥揻or the time being-- the biggest difference between thinking men and thinking machines.

I LOVED this book. Loved it, loved it, loved it.

You can read 鈥淭he Book of Why?鈥� as a popular science book. So I started reading it that way and, some thirty pages in, I thought to myself 鈥渧ery weird, I have not lost this guy yet!鈥� You could not really say that about 鈥淏rief History of Time,鈥� could you? The funny thing is, I eventually I made it all the way to page 370 (the last page) and I was still with the author! For me, that鈥檚 a first: a popular mathematics book that carries on introducing new (and I mean NEW) material, concepts that were not understood when I went to college, but regardless explains it clearly enough that I could carry on learning the whole way through.

To briefly summarize, the author explains that sometime a hundred years ago meaning in statistics was sacrificed at the altar of rigor: because the giants who defined the space were not personally comfortable putting a definition on the word 鈥渨hy,鈥� they not only repurposed the entire field to answer 鈥渨hen鈥� (thereby throwing the baby out with the bathwater), but also rendered it heretical to examine causation. In particular, the practice of identifying probability-altering interventions was proscribed by the mathematical mainstream.

To get the ball rolling, the author stakes his claim early on in the book and defines the 鈥渄o鈥� operator. (p.48) For example, if we know for fact that getting rid of the weeds (which we call action do(X)) results in a better crop Y, we can go ahead and write:

P(Y/do(X)) > P(Y)

This was deemed to be heresy because there was no clean mathematical meaning for 鈥渄o鈥� and no set of operations / conclusions that could derive from it. Cauchy was no longer around, I suppose. The orthodoxy was established that we only need care about association. From the full list of associations logical people were free to draw their own conclusions regarding causation. If cutting the weeds and a better crop are correlated, nobody is going to accuse you of making stuff up if you conclude one caused the other.

The benefit of the canonical approach, and it is an enduring benefit we should not disparage, is that, with minimal knowledge of mathematics, you can use a statistical package that slices and dices the 鈥渨hens鈥� and gives you a slew of pre-packaged answers: 鈥渟ons of 72-inch-tall fathers will on average be 71 inches tall, but sons of 68.5-inch-tall fathers will on average be 68.5 inches tall.鈥� My George is therefore predicted to be 68.5 inches tall, and I鈥檓 the average height of a male British subject in 1877, more disturbingly! Ah, but on the plus side, that probably also means George will be taller than me, because the mean has no doubt shifted up鈥�

Cool, but we can do better. A lot better!

Judea Pearl goes into enormous pains to give maximum credit to all his students / disciples, but it was he who singlehandedly forced mankind up a construct that he calls 鈥渢he ladder of causation.鈥� Here he invites you along!

First, he takes you one step up from 鈥渁ssociation鈥� to 鈥渋ntervention.鈥� To do so, you need to start drawing pictures. Graphs (causal diagrams, they鈥檙e called) that allow you to point from causes to effects. These charts need not be handed down by a higher being. You can sketch your own, you can test the conclusions versus the data and you can change your mind and draw them again.

These charts, once you鈥檝e drawn them, naturally force you to observe three important types of nodes / factors: 鈥渕ediators鈥� (example: tar in your lungs mediates between your smoking and you getting lung cancer), 鈥渃onfounders鈥� (example: a now-identified 鈥渟moking gene,鈥� rs16969968, both makes people likelier to smoke and makes them more susceptible to lung cancer, but clearly does not deposit tar in their lungs) and 鈥渃olliders鈥�(example: smoking and birth defects can both affect birth weight). The author goes on to explain what the 鈥渇ront-door path鈥� and the 鈥渂ack-door path鈥� is from potential cause X to potential result Y. (In a later chapter he expands this repertoire to 鈥渋nstrumental variables.鈥�)

Next (some 200+ pages deep into the book) comes the math, which is the first time you鈥檙e asked to actually believe the author, rather than find yourself invited to discover alongside him. And here鈥檚 what the math says:


suppose you鈥檝e drawn your causal diagrams;
suppose you鈥檝e expressed them in mathematical expressions, using the 鈥渄o鈥� operator;
then there are exactly three 鈥渓egitimate transformations鈥� you can apply to these equations that correspond to the diagrams in order to convert them into testable (or otherwise!) run-of-the-mill probabilistic statements of the kind a conventional statistician can abide:

1. If W is irrelevant to Y, then
P(Y / do(X), Z, W) = P(Y / do(X),Z)

2. If a set of variables Z blocks back-door paths from X to Y, then
P(Y / do(X), Z) = P((Y / X, Z)

3. If there are no causal paths from X to Y, then
P(Y / do(X)) = P((Y)

Not only that, but there are no other necessary rules. If there鈥檚 a way to convert your causal diagrams into classical probability statements, then there鈥檚 a way to do it with these three tricks.

These are, in short, the three rules of 鈥渄o calculus鈥� and they allow you to test your intuition regarding causation. You can put them to two separate uses:

1. You can now design better experiments
2. You can look at already existing data better, resolving a large number of 鈥減aradoxes鈥�

A 鈥渨orked example鈥� is provided on page 236, that takes you in six simple steps from

P(c / do(s)) = Sum over t [P(c / do(s),t) P(t / do(s))]

to the testable:
P(c / do(s)) = Sum over s鈥� [Sum over t [P(c / t, s鈥�)P(s鈥�)P(t / s)]]


The author next works his way through a couple of these paradoxes that the new method cuts into shreds: Berkson鈥檚 Paradox (smokers in a 1995 thyroid disease study have a higher survival rate than non-smokers), Simpson鈥檚 Paradox (most departments at Berkeley favor women in admissions, but women overall have a lower chance of getting into Berkeley than men) all fall under the weight of his new weapon.

(With that said, I still like my explanation more about why you should change the door in the famous TV game: (i) the chance you picked well to start with is 1/3, (ii) if you didn鈥檛 pick well you鈥檙e guaranteed to win when you change door! The author goes over the blah blah regarding how the game host show imparted one part of the decision tree with extra info鈥�)

Now I鈥檝e established I鈥檓 in awe of the author, and while I鈥檓 being a smart-alec, I鈥檒l point out the one issue I have a problem with:

Judea Pearl protests too much about his predecessors鈥� notion that 鈥渋t鈥檚 all in the data.鈥� Yes, his tools help you design better experiments. Away from that fact, however, (and yeah, that鈥檚 pretty major and would in itself be enough of a contribution to mankind) this new calculus of causation in the end amounts to a set of new 鈥済oggles鈥� we can wear to look at data better. To my taste, then, he complains a bit too much. To a great extent, it IS all in the data, it鈥檚 just that thanks to him we now know better where to look.

(note to the reader: this may be the correct place to tell me I鈥檝e understood nothing)

The astounding thing is that this is only the first step we鈥檙e invited to climb alongside the author on the 鈥渓adder of causation.鈥� And so it is that you climb one more step, from 鈥渋ntervention鈥� to 鈥渃ounterfactuals.鈥�

This is, finally, the 鈥渨hy鈥� step that lends its name to the book.

Example: when an angry coach tells a player he should have passed the ball to a teammate rather than try to dribble the goalie, the player knows why: his teammate would have scored! That is the counterfactual! It is the state of the world that did not come to be, but against which his actions have been judged.

Believe it or not, a second calculus has been invented by the author and his associates in the space of the past couple decades, with the explicit purpose of putting some mathematical meat on the bones of this syllogism.

The main problem solved is the one where a fire and a blocked fire escape combine to cause somebody鈥檚 death. The combination of the factors guarantees the outcome, but how bad you feel about the blocked fire escape depends on your estimate of what the chances of death would have been had the fire escape not been blocked.

Needless to say, this is a simplified example and the calculus helps you deal with continuous outcomes, not only binary outcomes.

The author defines three quantities: total effects, Net Direct Effects and Net Indirect Effects.

Let us say an extra year of education leads to higher salary through two paths, one because people pay up for better-educated people and one because the stuff you鈥檝e learnt may help you perform better.

The author defines two quantities:

The Net Direct Effect of a year of education is how much more you will be paid if you skip the studying and go to the Bahamas, a friend sits the test in your lieu, you come back exactly as skilled and motivated as you left, but nobody finds out and as a result your employer pays up just because you got the degree.

The Net Indirect Effect is how much more than your pre-degree self people get paid who never got the degree but somehow have the same skills as you will have post degree.

Theorem:

鈥淭he total effect of an extra year of education is equal to the Net Direct Effect of an extra year of education MINUS the Net Indirect Effect of SKIPPING a year of education鈥�

(not plus the Net Direct Effect of having it)

This equation is applied to the problem of the smoking gene and cancer and demolishes the excuses of anybody who has the infamous gene: they鈥檇 better quit smoking, bottom line, and the rest is talk!

Which lands you safely on chapter 10, the last chapter of the book, the one regarding artificial intelligence: can we teach a computer morals and should we do so?

It is, comfortably, the best chapter in the book!

Armed with the tools you鈥檝e just mastered, you have no problem following the author鈥檚 argument: if we can teach a machine to think like a child and consider the consequences of taking or not taking actions and if we additionally give it license to test (again, like a child) the consequences of its actions, then we can answer our questions with a "yes."

For a machine that is equipped to ask "why" is a machine that we can count on to do the right thing and act as our moral compass.

Wow!
12 likes ·  鈭� flag

Sign into 欧宝娱乐 to see if any of your friends have read The Book of Why.
Sign In 禄

Reading Progress

Started Reading
October 8, 2018 – Shelved
October 8, 2018 – Shelved as: math-science
October 8, 2018 – Shelved as: favorites
October 8, 2018 – Finished Reading

No comments have been added yet.