First, my old review from reading it back in 2007 before my MS in Statistics. Below that, my newer review from re-reading it in 2013 before my PhD.
~~~2007~~~ I loved the overviews of fascinating philosophical problems surrounding the use of statistics. (For example, hypothesis testing is used pretty much everywhere but even the mathematicians who came up with it had doubts about its validity and usefulness in most situations... What does a 95% confidence interval REALLY mean, in terms of real life, when you come right down to it?)
The author also paints a great picture of how rich and varied the field of statistics is, and what interesting people have contributed to it. As someone who wishes I were a real mathematician, it's a little depressing for me to read about a genius like Kolmogorov who came up with interesting results in every field he touched ever since childhood... But it's also heartening to hear about the many non-"genius" people who, unable to find an existing technique to evaluate their data, ended up creating original, useful statistical tools that became widely used and spawned whole new fields of mathematical research.
I wish the author had used at least a few equations or graphs - it's hard to explain complicated math in natural English - but it's still a great and informative read.
~~~2013~~~ I'm getting a lot more out of it on the 2nd reading, now that I know a lot of the theory he describes in handwavy terms for lay readers.
This book is definitely a history of statistics through anecdotes: the author's stories about the time he heard Savage lecture or was a discussant for Neyman, his colleagues' memories of Fisher and Pearson, etc. It's especially fascinating if you already recognize these names from their papers and are curious about their personalities, what inspired their work and how it fit into their historical context, etc. But you won't learn the technical details here. If you want something from a more formal historian of statistics, Stigler seems to be the guy: and . One book I wish someone would write is the history of statistics education and stats textbooks. If Bayes hid his famous theorem, Fisher discouraged unthinking use of alpha=0.05, Neyman disavowed the hypothesis tests he created, etc. ... then how did these things come to be standard methods applied mechanically by scientists everywhere?
Specific points:
* p.4-5: Before Fisher's , "experiments were idiosyncratic to each scientist." Although Fisher obviously didn't invent experimentation, he put it on a consistent and rigorous footing. And researchers only published their conclusions with a small supportive subset of the data, not the whole experiment. (Gregor Mendel dropped inconvenient data from his famous pea experiments!) Even today we often see researchers hesitant to share not only their data but their code, despite the reproducible research movement. * p.7: "any useful experiment has to be one that allows for estimation of those outcomes," i.e. Fisher realized that your parameters will always be estimated imperfectly, but if you design your experiment and choose your estimator carefully, the estimates can be precise enough to be useful. * p.10: Pearson's is apparently still worth reading today. (I also recall that it inspired young Neyman.) * A brief academic family tree of early statisticians working in the UK: Francis Galton came up with correlation and regression to the mean. He then (taught? managed? influenced?) Karl Pearson, who promoted the idea that distributions are the thing of interest to science, and developed his family of skew distributions. Karl Pearson influenced many in the next generation, including Fisher, Gosset, Neyman, and Pearson's own son Egon. From there on, the field of statistics exploded. * p.17: Pearson proposed that what's "real" (or of interest to science) is not each measurement we take, but the abstract distribution they come from. But it took Fisher to make clear the distinction between a parameter's true value, and your estimator for that parameter (the formula you feed data into), and your estimate of the parameter (the actual value you got using a particular dataset). He also clarified that even if you collect a ton of data, your estimates of that distribution will still be just estimates---and that Pearson's estimators do not have good properties (the resulting estimates won't tend to be as close to the truth as they could be), but better estimators can be derived. * p.19: Galton and K. Pearson's Biometrika was originally founded with the goal of measuring distributions of biological measurements about species around the world, showing that those distributions change over time, and thus providing proof of Darwin's theories. It was very expensive to print, containing complicated typesetting for math formulas and being the first journal to include full color photos. Its correspondents sound like Indiana Jones, traversing jungles and deserts and rain forests to measure native tribes and little-known animal species. It sounds like the primarily-mathematical articles (which it is now known for) were used mostly as filler at first. * p.28-29: I love the story of Gosset, or "Student," working at Guinness and keeping his research a trade secret. Hotelling tried to meet Gosset in the 1930s and "arrangements were made to meet him secretly, with all the aspects of a spy mystery." We also often forget that Gosset did not derive the mathematical t-distribution (that was Fisher) but rather he ran an early and very laborious Monte Carlo experiment, shuffling stacks of cards with numbers on them and recording the averages etc. Gosset was driven by the need to work with small samples. As he told K. Pearson: "If I am the only person that you've come across that works with too small samples, you are very singular." (Paul Velleman gave debunking some of the myths around the Gosset story; I hope to find his slides somewhere.) * p.39: Cramer's book seems to be a key link in the history of stats textbooks: Fisher wrote his as a practical manual, without proofs. Then Cramer wrote to fill in gaps and write entire proofs as needed. "Cramer's book was used to teach a generation of new mathematicians and statisticians, and his redaction of Fisher became the standard paradigm." This is not unlike the quote from economist Paul Samuelson: "Let those who will write the nation's laws if I can write its textbooks." * p.49: Maybe I find the "degrees of freedom" concept confusing because it "was Fisher's discovery and was directly related to his geometric insights and his ability to cast the mathematical problems in terms of multidimensional geometry"---sadly not one of my strongest areas. * p.51: Fisher gave a 1947 series of talks about science on the BBC. I would love to find recordings but googling does not help, although some transcripts might be in The Listener magazine if I can find a library with access to this in its database. * p.59: Gumbel's "is a magnificently lucid presentation of a difficult subject, filled with references to the development of the subject. The first chapter ... alone is an excellent introduction to the mathematicals of statistical distribution theory. ... Although I first read the book after I had received my Ph.D. in mathematical statistics, I learned a great deal from that first chapter." * p.66: "Since the statistic is random, it makes no sense to talk about how accurate a single value of it is. ... What is needed is a criterion that depends upon the probability distribution of the statistic" and Fisher was the one who first proposed a few such criteria (Consistency, Unbiasedness, Efficiency). * p.70-71: "In the late 1960s, I had a programmable desk calculator. ... One afternoon, I programmed the machine, checked the first few steps to make sure I had not made an error in my program, turned off the light in my office, and left for home. Meanwhile, the programmable calculator was adding and subtracting, multiplying and dividing, silently, mumbling away in its electronic innards. Every once in a while it was programmed to print out a result. The printer on the machine was a noisy impact device that made a loud sound like "BRRRAAAK." The nighttime cleaning crew came into the building and one of the men took his broom and wastepaper collector into my office. There in the darkness, he could hear a humming. He could see the blue light of the calculator's one eye waxing and waning as it added and subtracted over and over again. Suddenly, the machine awoke. "BRRAAK," it said, and then, "BRRAAK, BRRAAK, BRRAAK, BRRRRAAAAK!" He told me later that it was a terrifying experience and asked that I leave some sort of sign up the next time, warning that the computer was at work." This delightful story reminds me of , former Director of the US Census Bureau. * p.75: "The reader may recall those terrible moments in high school algebra when the book shifted into word problems. Mr. A and Mr. B were set rowing in still water or against a steady current, or maybe they were mixing water with oil, or bouncing a ball back and forth. Whatever it was, the word problem would propose some numbers and then ask a question, and the poor student had to put those words into a formula and solve for x. The reader may recall going back through the pages of the textbook, desperately seeking a similar problem that was worked out as an example and trying to stuff the new numbers into the formulas that were used in that example. In high school algebra, someone had already worked out the formulas. The teacher knew them or could find them in the teacher's manual for the textbook. Imagine a word problem where nobody knows how to turn it into a formula, where some of the information is redundant and should not be used, where crucial information is often missing, and where there is no similar example worked out earlier in the textbook. This is what happens when one tries to apply statistical models to real-life problems." * p.84: "The central limit theorem states that this distribution can be approximated by the normal probability distribution regardless of where the initial data came from." Well, not quite! There are very important constraints on the original data that must be met before you can apply a CLT. For example, the mean of iid Cauchy random variables is another Cauchy, not approximately Normal. See some other CLT counterexamples in . * p.95-96: Nice example of how statistics differs from another mathematical approach, chaos theory, which can also be used to describe the world and make predictions(?), but (unlike statistics) has no measure of how well the model fits reality. * p.98: The word "significant" used to mean "that the computation signified or showed something"---not necessarily something very important! Sadly, a shift in the English language changed the general usage of this word, making it a confusing term for students and users of statistics. * p.99: Fisher's succinct explanation of significance, from 1929: "An observation is judged significant, if it would rarely have been produced, in the absence of a real cause of the kind we are seeking." And from the same paper: "The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained." In other words, a nonsignificant result doesn't mean there is no effect, just that the effect wasn't measured well enough in this experiment. And a single significant result doesn't mean you've proven the effect exists---you must be able to "design an experiment so that it will rarely fail to give a significant result." * p.100: Salsburg's summary of Fisher's guidelines: "If the p-value is very small (usually less than .01), he declares than an effect has been shown. If the p-value is large (usually greater than .20), he declares that, if there is an effect, it is so small that no experiment of this size will be able to detect it. If the p-value lies in between, he discusses how the next experiment should be designed to get a better idea of the effect." I love this advice: if it's not significant, then you design a better experiment, not assume that the effect doesn't exist! Sadly this is not the way p-values are used in much of science nowadays. * p.102: I'd love to read the letters between Neyman and Egon Pearson, but they don't seem to be collected and published as far as I can tell. * p.108: Again from Fisher: "tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but ... they are never capable of establishing them as true" * p.112: I know of Keynes as an economist, but didn't realize he also studied probability and wrote which "demolishes [the frequentist definition of probability] as a useful or even meaningful interpretation, showing that it has fundamental inconsistencies that make it impossible to apply the frequentist definition in most cases where probability is invoked." * p.113-115: Unfortunately, Neyman found frequentism the easiest way to build a mathematically tractable & consistent theory of hypothesis testing, and that's the version that became entrenched in textbooks everywhere, even though "as early as 1935 ... he raised serious doubts" and "Neyman seldom made use of hypothesis tests directly." It seems that hypothesis tests became popularized through Wald's work on decision theory and through Lehmann's textbook . * p.115-116: I greatly admire Neyman and am proud to share his first name, but all this about how nice he was is a bit of a hagiography. He was a pretty nice guy but could be a jerk too and had some serious troubles at home, estranged from his wife and distant from his son. His biography is very good. * p.118: Nice explanation of how interval estimates help us decide whether the estimate is precise enough (i.e. the resulting policy decisions be the same whether the truth is near the lower or higher bound) or whether we need more data and better precision (i.e. the right decision would differ based on whether the lower or upper bound is true). * p.123: "Fisher never got far with his fiducial distributions" and I thought this was an abandoned dead-end after Fisher died, but it turns out people still study fiducial inference, including CMU's own . * p.142: "Godel once said that the gist of human genius is the longevity of one's youth." * p.143: I used to wonder why we bother studying measure theory and foundations of probability---it's just proving things that seem obvious, right?---but before Kolmogorov put this all in order, all these "obvious" results were very much ad hoc, instead of being rigorously tied together. Likewise with Lebesgue's work on the foundations of calculus. Although the links seem obvious to us know, it was very different before Lebesgue and Kolmogorov, and I can't imagine what a change it must have been to read their work for the first time (without having already been exposed to the fruits of their labor like we have today). * p.146-147: Kolmogorov tried tackling the real-life interpretation of probability, in a very different way from his work on measure theoretical foundations, but apparently did not complete this project before his death and sadly nobody has been able to figure out where he was going with it. * p.148-150: Statistics can be highly political! It seems laughable today to think that Soviet planners would dismiss statistical work because "random variable" translates as "accidental magnitude" and the central planners felt insulted that their work could be considered accidental... But this lack of proper experimentation and evidence-based decisions led to massive starvation and economic weakness. Even apart from such extremes, governments have always tightly controlled the release of national statistics. Soviet statisticians were being threatened during the Cold War, and even today there are reports of for publishing damning inflation estimates.
I think this should be required reading for every young statistician. All the other majors seem to have some sort of History of [insert program name here], but I don't remember one from when I was working on my major (in statistics). I felt this book was exactly what it claimed to be--a description of how statistics revolutionized science in the 20th century. Some people seem to think that this book is supposed to describe statistical methods like an introductory textbook. If you want that, you should go read an introductory statistics textbook. It's not like there aren't plenty of those out there. This is a historical/philosophical look at how statistics has influenced science and vice versa.
The book is organized according to topics in statistics including biographical sketches of the people important to the development and application of each of the topics. This can make it a little difficult to keep everything in perspective for how it fits in the timeline. But I don't think there would have been a better way to organize it anyway.
I loved reading this book and found it entertaining, witty, and enlightening.
I really wanted to like this book. I love science history books, and while I am not a technical person, I appreciate the "Physics for Poets" level description that are a feature of many science history books. My problem with this book, and ultimately why I gave up is precisely due to the author's inability to handle the technical details. He says that he wife reminded him not to be too detailed, and ultimately he wasn't detailed enough. He described major changes in statistics, but it was hard to tell what those changes were. And frankly, many of the people described were not that interesting. They were statisticians after all.
Un recorrido a trav茅s de la revoluci贸n estad铆stica que tuvo lugar en el siglo XX. Salsburg escoge a cerca de una treintena de estad铆sticos (con alguna estad铆stica) y nos cuenta algo de su vida y sus aportaciones al campo. Su intenci贸n es que sea un libro que pueda leer cualquiera, al margen de sus conocimientos en matem谩ticas, pero para disfrutarlo de verdad creo que es necesario conocer algo del tema, aunque sea al nivel de una estad铆stica de 2潞 de bachillerato de Ciencias Sociales (mejor si es un poco m谩s). Y, a煤n superando ese nivel (tampoco por mucho), hay cosas que no terminado de pillar.
Pero eso no ha evitado que sea una lectura muy amena, que me ha dado un contexto que agradezco sobre cosas que, como profesor, he explicado alguna vez en clase (aunque eso se limite a los primeros cap铆tulos). Tambi茅n ha sido una fuente de an茅cdotas curiosas, como al raz贸n por la que William S. Gosset firmaba sus art铆culos con el pseud贸nimo de Student y su relaci贸n con la cerveza Guinness. Una lectura, en fin, que recomiendo si ten茅is alg煤n inter茅s en conocer algo m谩s sobre la historia de la Estad铆stica.
A leitura 茅 t茫o interessante quanto a do livro "o andar do b锚bado", contudo, uma senhora que toma ch谩 茅 um pouco mais acad锚mico. Saber como e em qual situa莽茫o surgiram as distribui莽玫es t de student, o f de fisher, as distribui莽玫es n茫o param茅tricas, al茅m de conhecer a import芒ncia de diversos estat铆sticos matem谩ticos (homens e mulheres) para o desenvolvimento da ci锚ncia ao longo dos anos com uma leitura bastante leve e did谩tica.
As a statistician, I found much of this book fascinating. Many of the anecdotes shared about some of the field's most famous characters gave me an even greater appreciation for their achievements. I also especially appreciated the forays into the philosophical underpinnings of statistics and the focus on the big picture as Salsburg traced and connected many of the breakthroughs in statistics throughout the 20th century.
My critiques are all about the writing, which came across as very stilted. Transitions, both within and across chapters, were frequently abrupt or nonexistent. Roughly the first third of the book seemed to have very little organizational structure, and while I enjoyed some of these individual chapters, it was very hard to follow. Later in the book, the structure became clearer: each chapter focused broadly on one or a few related statisticians and their contributions to the field. This was fine but had the disadvantage of not being chronological, so that later chapters frequently had to remind the reader that the statistician being discussed was unaware of the breakthroughs covered in previous chapters. There were also too many personal anecdotes about Salsburg meeting this or that statistician at a conference.
Finally, the book seemed ill suited for the target audience. Salsburg intended the book for a non-mathematical audience, deliberately using plain English as opposed to mathematical notation to describe the statistical methods discussed. Often, however, the descriptions of these methods were vague and I have no confidence that someone not already familiar with statistics would be able to understand these descriptions. I do, however, think it is a great read for statisticians and perhaps other scientists who use and at least generally understand statistics.
This was so interesting! It was so cool to hear about the actual people behind all of the names my stats training taught me - Pearson, Fisher, Tukey, Box, Cox, and more. It also served to show how young this field of statistics is in some ways, but how classic it is in others.
This book does suffer from the law of misonomy that Salisbury mentions often - 鈥渢he lady tasting tea鈥� is in about three lines. I鈥檓 not sure what the reasoning was there, and it threw me for a bit early on.
The author claims this is for non-technical, non-mathematical people, and he doesn鈥檛 include any formulas or proofs for that reason.
That being said, I think statisticians, particularly of my own generation, are the best audience here, as I found the development of my chosen field super interesting.
I love what David Salsburg attempts to do here: explain the basic concepts of statistics by guiding the reader through the history of its development as a discipline. Too often we learn concepts and methods that are popular today without understanding why we use them or how they developed. But however much I appreciate Salsburg's approach, I cannot recommend his book. It is inconsistently paced, lacking in any real explanations of the statistics, and peppered with "when I met [so-and-so famous person]" and "when I invented this statistical term with [so-and-so famous person]" name-dropping.
The first chapters offer a mangled, difficult-to-follow history of the genesis of statistics. Salsburg introduces some basics of statistics, such as regression to the mean and skew distributions, but he wedges them into the narrative as afterthoughts. He literally spends a mere one to two sentences to explain a concept. I understand that this is not a statistics textbook, but a breakthrough new idea has no meaning to me unless I halfway understand what the idea is. Needless to say, there was a dissonance between Salsburg's excitement and my dull incomprehension of what was so exciting.
In later chapters, the book becomes more of a biography per chapter, which was easier for me to take in but not how I would have chosen to organize a book on the development of statistics. My overall impression is that Salsburg made an outline of thoughts he jotted down, rearranged a few of the points, then fleshed out his half-baked outline into a book. The result is a book that isn't explanatory enough for a beginner and isn't detailed enough for an expert.
It's a book about statistics... but it doesn't actually talk about how to do stats. Instead, it's about the evolution of the practice of statistics told by someone who was in the front lines of its evolultion. Each chapter is dedicated to a person or development so that we see the field evolve over time. It's really a fantastic meditation on what we can do -- and should do -- with stats and what we can't.
My favorite part relates to the lowly p-vale. Where on earth did this thing come from? Salsburg gives us a hint (p.99) -- "The closest [Fisher] came to defining a specific p-value that would be significant in all circumstances occurred in an article printed in the Proceedings of the Society for Psychical Research in 1929." He states in this article: "It is a common practice to judge a result significant, if it is of such a magnitude that it would been produced by chance not more frequently than once in twenty trials. This is an arbitrary, but convenient, level of significance for the practical investigator, but it does not mean that he allows himself to be deceived once in every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained. He should only claim that a phenomenon is experimentally demonstrable when he knows how to design an experiment so that it will rarely fail to give a significant result."
The first three chapters were the best. He started out with some really good stuff that was both biographically interesting and statistically informative. But is seemed like he lost steam. That said, there were still some good chapters and interesting anecdotes, and I generally enjoyed the book. I had to read about two-thirds for a class, and I finished the rest of it after the class was over, so that says something.
5-6 different partially written books combined into a single manuscript that turns out to be a mostly shallow biographical survey of early statisticians. there are some gems in here about the milieu of early statistics, but doesn't really deliver anything more substantive than an interesting footnote or two
It was very interesting to read about the people behind the known statistical methods! Also, the author has a nice writing style, it does not feel dry. Sometimes, he even builds up the expectation for the next chapter so I just wanted to know what happened...
An excursion through history of statistics in the 20th century. It's interesting to learn how the ideas related to each other, as well as a little about the statisticians' personal lives.
The most popular image of Statistics we have is from Mark Twain鈥檚 re-tweet of the quote attributed to Benjamin Disraeli, "There are three kinds of lies: lies, damned lies, and statistics.". With the advent of computers and vast amount of storage, ever more data is available for crunching by scientists. Consequently, we have ever more conclusions based on data, not all of them unbiased. Politicians, environmentalists, businesses and scientists have all been guilty of selectively choosing data to push their agendas under the garb of 鈥榮cientific conclusions based on real statistical data鈥�. However, if we reflect carefully, we see that our well-being itself nowadays is understood only in terms of numbers and indices given to us by the science of Statistics. Without numbers like GDP growth rates, Consumer Price Index, Inflation rates, Unemployment rates, currency exchange rates etc, life as we know today would be a stumble in darkness. So, it is important to understand the role of Statistics in the modern world, what it means to us and how we can productively use it to improve our lives. This book by David Salsburg takes us through the important ideas and developments in Statistics during the past hundred years and more. It shows us the towering figures of this discipline, their contributions, their collaborations with one another as well as their profound disagreements and how it fundamentally changed the way science itself looked at understanding Nature.
Statistics was one my subjects in the University. I used to have an understanding of it as a branch of Mathematics/Science where one collects and analyzes numerical data in large quantities. I understood its purpose to be the extraction of values, called parameters, out of this mass of data so that we can make sense of the reality that this data represents. The preface to this book, by the author himself, showed me how primitive this understanding is. He shows how Statistics moved the philosophical vision of Science away from a deterministic model of the Universe to a probabilistic model. In the nineteenth century, Science viewed the Universe as working on clockwork precision. A small number of Mathematical formulas, like Darwin鈥檚, Newton鈥檚 and Boyle鈥檚 laws, could be used to describe reality and predict future events. What was needed were a set of such formulas and measurements with precision. But in practice, measurements lacked precision. The more the instruments were refined, the more scientists became aware of greater variations. The differences between what the models predicted and what was observed and measured grew bigger. The picture of the 鈥榗lockwork universe鈥� lay in shambles. Science started moving towards a new paradigm - the statistical model of reality. Because statistical models of reality are mathematical, we can understand reality through the ideas of randomness, probability and statistics. In the twentieth century, the rise of Quantum Mechanics reinforced it substantially. I found this view of the evolution of Science in the twentieth century fresh and insightful.
The book is mainly a selective history of statistics. Giants like Ronald Fisher, Karl Pearson, William Gosset, Francis Galton, Jerzy Neyman and W.E. Deming are all extensively covered for their seminal work as well as the struggles they had to wage to get their ideas accepted and at times, rejected. We see extensive biographical information and some gossip, at times. The work of many scientists is set in the social context of their times, because their work was carried out in totalitarian and post-colonial societies. For example, in the USSR, during the 1930s, communist orthodoxy was hostile to applied statistics. It affected the work of eminent scientists like Arnold Kolmogorov, who founded the axioms of probability. Indian statistical giants like P.C. Mahalanobis and C.R. Rao found themselves in more exciting times in a newly independent India in the 1950s, collecting and sorting important demographic data on the Indian population for the benefit of planning by the Nehru administration, which believed in using Science for development. W.E. Deming鈥檚 work on Quality control was given short schrift in his native USA, but the Japanese embraced it to emerge as the premium automakers of high quality in the 1980s. There is a special chapter in the book covering the contributions of women scientists like Stella Cunliffe, Judith Goldberg and others.
The book details their advancements in various fields, which include more reliable pharmaceuticals, higher quality beer, econometrics, superior quality control in manufacturing, social policy and medical diagnostic tests. There are interesting discussions on whether there is a direct link between recidivism and the length of sentence of a prisoner. The accepted wisdom is that 鈥榯he longer the sentence, the less the recidivism鈥�. The author discusses Stella Cunliffe鈥檚 analysis of this question which exploded the myth of this association. The chapter 鈥橳he Man who remade Industry鈥� has compelling details on the great contributions of W.E. Deming on quality control and how it revolutionized the Japanese automobile industry. However, I shall just touch upon one fundamental insight the author outlines in the chapter, 鈥楽moking and Cancer鈥�, which captured my imagination.
The chapter on 鈥楽moking and Cancer鈥� is centered on a philosophical and analytical discussion on what is 鈥榗ause and effect鈥�. Author Salsburg says that Prof. Bertrand Russell effectively showed in the early 1930s that there is no such valid scientific concept as 鈥榗ause and effect鈥�! It is a vague, common notion that does not stand up to pure reason. It contains an inconsistent set of contradictory ideas and is of little or no value in scientific discourse! If it is so, what does it mean for us in society? Did Agent Orange cause those health problems in Vietnam and after? Does smoking cause cancer? The statistics giant, Ronald Fisher, a pipe smoker himself, did not believe smoking caused cancer. He pointed out that studies showed that people who did not inhale the smoke had a higher incidence of lung cancer than those who inhaled. This is inconsistent with the conclusion. Additionally, he mused, suppose that there was something genetic that induced some people to smoke than others. Suppose this same genetic disposition involved the occurrence of lung cancer. It was well known that many cancers have a familial component. Suppose this link between smoking and cancer was due to the same genetic disposition. To prove his case, Fisher assembled data on identical twins and showed that there was a strong familial tendency for both twins to be either smokers or non-smokers. He challenged others to show that lung cancer was not similarly genetically influenced. Fisher鈥檚 objections are motivated by science. Studies of smoking use data from what is called opportunity samples, or people who were smoking already. Ideally, one must do a study that asks half the participants to start smoking two packs or more a day and make observations. This is known as a double-blind study to prevent bias. But this is ethically untenable. Though a lot of evidence exists that smoking is bad, each one of them is in some way flawed as well.
I found this analysis fascinating because we routinely accept so many 鈥榗ause and effect鈥� claims by environmentalists and other social scientists without much scrutiny. Is the thinning of arctic sea ice really the cause of polar bears dying of starvation? Did DDT really cause cancer? Did the CFCs from refrigerators really cause the Ozone layer to vanish over the Antarctic?
The final chapter titled 鈥楾he idol with feet of clay鈥� is a philosophical look at the future. Salsburg says that the progress of Science implies that eventually the statistical revolution also will be overthrown in favor of a better one. Science produces a model that fits available data and uses it to predict results of new experiments. But, no model is fully accurate. So, more and more data results in more and more complicated models and their exceptions. At some point, it no longer serves the purpose and new thinkers emerge to create a revolution. One can see the Einsteinian revolution as one such event. In this sense, Salsburg says that the science of statistics also stands on feet of clay and that the revolution which may overthrow it, is perhaps already germinating amongst us.
I found it an enjoyable book to read. I learnt a lot as well.
The full title here is The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. This book by David Salsburg is pretty much what the title suggests: part history of the rise of statistical methods in scientific research and part biography about the people responsible for it. This probably isn't a book for anyone not already versed in inferential statistics and related subjects. It won't, for example, teach you much about statistics, so you'll be pretty lost or at best unimpressed by most of the stories and adulations the book contains. I would have appreciated a bit more exposition and explanation, but for those of us with a background in stats, it keeps things at a sufficiently high level so that we're not forced to pull out our old textbooks just to know what's going on.
And it's pretty interesting stuff. While Salsburg lacks (or at least holds in reserve) the panache and wit necessary to make this a really entertaining read, he does give glimpses into both the absurdity and mundanity of scientific process in this area. I was amused to learn, for example, that many august statistical techniques like analysis of variance were created so that someone could figure out how much artifical cow poop to spread over an acre of farm land. The book also tracks some of the more interesting personalities in the field, relating tales about how William Gossett created a now common and relatively simple procedure known as "Student's t-test" while working for a beer brewery (Guiness, no less) whose strict policies about sharing research forced him to publish under the (perhaps unimaginative) psudonym "Student." And then there were the cat fights and irrational, career-long grudges that these men and women slung around at each other. Though not quite on the level of say Bill Bryson's A Short History of Nearly Everything, this book does a decent job of layering those pedestrial and alltogether human eccentricities over the enormity of the scientific accomplishments they created.
So while not exactly light reading and not for the uninitiated, it's a pretty interesting read.
I saw the book as divided into the early chapters where he covers the formative history of modern statistics, focusing on Karl Pearson and Fischer, the middle chapters, in which he gives a series of biographical sketches of important contributors to statistics and finally the last chapter in which he discusses the philosophical implications and problems of statistics. I enjoyed the first and last part of the book, but I really wonder whether the short biographical sketches would interest anyone not already familiar with the statisticians involve. The chapter dedicated to Deming was of great interest to me, but that's because I already knew something of him--from a reader's perspective he deserved it. But did the Lady in Black deserve a whole chapter? Overall, the book does help the layperson understand that statistics is in fact a controversial field which skirts some philosophical topics as well. The last chapter in particular will hopefully spur readers into finding out more about the deeper problems and interpretations of probability and statistics, as it has for me. I realize the book is intended for laypersons, but I feel the author could have at least tried to make some of the ideas more concrete. For example, he mentions the forbidding topic of measure theory without the slightest effort at giving some verbal explanation of what it involves. What I had in mind is some more explanations such as the one he does in fact give of a probability space, using the probability of rain as an example, breaking down the different interpretations of this seemingly simple statement. I wish he had divided the material more clearly by the impact of statistics on science, business, politics and the military.