During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
Well, it was one of the most channeling books I've read in my career. It is a rigorous and mathematically dense book on machine learning techniques. Be sure to refine your understanding of linear algebra and convex optimization before reading this book. Nonetheless, the investment will totally worth it.
Excellent book. Has repaid multiple rereadings and is a wonderful springboard for developing your own ideas in the area. Currently I'm going through Additive Models again which I breezed by the first few times. The short section on the interplay between Bias, Variance and Model Complexity is one of the best explanations I've seen.
After retiring, I developed a method of learning a variation of regression trees that use a linear separation at the decision points and a linear model at the leaf nodes (and subsequently used them to forecast the behavior of hurricanes). In them I used a heuristic measure for growing and shrinking the trees, but thanks to this book I can see there is a theoretically sound basis for the measure. Which is nice.
If you have the mathematical background (standard calculus, linear algebra and some familiarity with statistical notation) this is a wonderful introduction to Machine Learning and covers most, but not all, of the major models in use today. The Second Edition does not cover recent topics such as many of the deep learning schemes, but you wouldn't expect it to. More generally, some of the exposition of ideas is very compact. I would have really welcomed more explanation on the relationship between Projection Pursuit Regression and neural networks.
I could have also wished for a bit more linkage between some topics. For example, k-nearest neighbor and standard tree-based classifiers and regression schemes both depend on the strategy of dividing the universe into smaller areas that are presumed to be sufficiently homogeneous that the simplest possible model for that area is completely sufficient. As such, many of the same techniques for improving one can be used for improving the other. Random Forests for example would have as an analogy Random Compact Neighborhoods or Random Prototype Collections with some of the same advantages. As another example, the coverage of PRIM, the bump hunting algorithm, is excellent and the only real coverage of the topic I've seen, but then gives the rather vague step of using cross-validation to select a particular box from the sequence and never once mentions the obvious relationships to either dimensional considerations or the measures considered in the chapter on unsupervised Learning, and this last oversight verges on criminal.
Because the radically different assumptions underlying global models such as linear regression and additive models which also inherently assume independence between parameters, and local based models such as regression trees and K-nearest neighbors, with PRIM solidly in the middle and excellent at picking up local parameter interaction, I'm thinking that my next set of experiments will be with a multipronged approach, first doing the best job one can with a global model, following that with a through job of bump hunting (but both high and low boxes unlike PRIM) to pick up the local parameter interactions and then see if there are any pieces left for nearest neighbor or regression/classification-tree methods to pick up. Following those experiments, I'm thinking of a generalization of an artificial neuron which instead of being a hard-limiting non-linearity applied to a linear model would be a hard-limiting (or soft-limiting as the case may be) non-linearity applied to an additive-model. In all of these investigations I expect Elements of Statistical Learning to be a constant companion.
This book usually seems to be relegated second in the Machine Learning area after Bishop's Pattern Recognition and Machine Learning, but I would put it first.
This book surveys many modern machine learning tools ranging from generalized linear models to SVM, boosting, different types of trees, etc. The presentation is more or less mathematical, but the book does not provide a deep analysis of why a specific method works. Instead, it gives you some intuition about what a method is trying to do. And this is the reason I like this book so much. Without going into mathematical details, it summarizes all necessary (and really important) things you need to know. Sometimes you understand this after doing a lot of research in that subject and coming back to the book. Nevertheless, the authors are great statisticians and know what they are talking about!
A word of caution: I am not sure if this is a good book for self-study if you don't have any background in machine learning or statistics.
For the mathematician - this book is too terse and hard to learn from to the point of pretentiousness. For the software engineer - the algorithms presentation in this book is poor. A bunch of phrases with no clear state change, step computations, etc. In general - a lot of pompous presentations and hand waiving material. Something positive: the paper is top quality.
I read this book for work, during work, but I'm falling behind my yearly goal so I'm including it on goodreads :P
This book has a lot in it, and is incredibly dense. However, it's well worth it. It contains not quite everything about statistics and machine learning that someone needs to know to do data science, but it comes close.
The drawback is that this book is hard to understand. You need to know a lot, or be willing to learn a lot from other resources, to actually get a lot from this book. Even as someone with a good stats and ML background, there were some parts where I had to find online sources to get explanations of even how to start thinking about what's in the book.
Now that I've gone through it once, I know I'll be going back to this time and time again since it is such a good resource. I also plan on going back and re-reading at least some of the chapters as necessary.
An extremely well-written introduction to machine learning. I now understand why this is the universal textbook for machine learning classes.
The math is described at a reasonably high level, but the authors do a fantastic job emphasizing the conceptual differences between different learning algorithms. A major focus of this text is on conditions which favor some algorithms over others in minimizing variability for different learning exercises. While this book is not a very pragmatic text (does not hold your hand through implementation), it does a fantastic job laying conceptual foundations. I highly recommend this to any student serious in statistical thinking.
A classic text in machine learning from statistical perspective. No matter you're a novice machine learning practitioner, undergrad or hardcore PhD you can't miss out on this one. Overall, a good nontrivial broad intro to machine learning without loss of technical depth.
This book has been the referential authority for current users of supervised and unsupervised ML. Having already an econometrics and probability background, this book was quite accessible and enjoyable to read. I appreciate the methodical and careful style, though at times it feels terse. I guess the reason is that the book is already quite long and is not meant to be a deep dive into methodology or theory. That said, the book is very good as an introduction and a reference to ML methods. I think a semester course using this book should be part of the standard graduate curriculum in economics.
A more detailed companion piece to the introductory ISLR, this is an excellent introduction. The only critique would be that, it is too even-handed to influence the mindset of the reader much.
This is an excellent second or third book on statistical modeling, after you have read something with code examples and done a few real projects. It is mathematically deeper and more comprehensive than and does more to tie together how and why algorithms work. It provides no code examples, and it is also correspondingly more demanding in the mathematical background of the reader. Even if you never read all of it, it's worthwhile owning as a reference, and a PDF is even available for free from the author:
It's a classic, but it's not my favorite text at this level for either teaching or self-study. Coverage of core methods is relatively good, but the content sometimes veres between highly mathematical and formulaic, missing important conceptual areas. I wouldn't consider a statistics/ML/bioinformatics/... library complete without ESL, but I think is a better overall resource and aid to teaching this content.
Although covering wide range of topics, the book, especially towards the end, reads as a thick overview article, rather than a textbook. Yes, there're many problems to work on at the end of any chapter, but most concepts, ideas and algorithms presented would require the reader to refer to "original papers" if he attempts to implement them in computer code. So, while theoretically informative, the book is seriously lacking on practical level. More of a review than a reference.
Nice as a reference or an overview, but not necessarily as a source for learning. So many approaches and techniques are described in this book, that out of necessity, their description is very general, very condensed and very mathematical.
É un libro que comecei en 2016, tiña impreso, coas súas 600 páxinas de fotocopias nun caixón de tela do IKEA coa botas de basket e negro futuro pero vida sen sobresaltos. E cada vez que entraba aquà pois lembrábame del e acabeino esta semana.
I liked the book when I read the first few chapters in 2017, so I decided to reread and finish it this year. But the second time is not the charm.
First, the book is too theoretical for non-academic readers. Sometimes, the theories can be useful. For example, I learned about the relationship between different models: LASSO can be thought as Bayesian regression with a Laplacian prior, and trees can be thought of special cases of regressions with basis functions, etc. But most of the time, I can't figure out how the theories would help me build a better machine learning model.
The book occasionally provides practical advice about model training (e.g., the best strategy for training GBM is to set the learning rate small and use early stopping to select the number of trees). But I doubt whether reading this book is the most efficient way to learn about such suggestions.
I also find the book poorly organized. I cannot understand the logic behind the ordering of chapters.
My recommendation for ML-application-focused readers is: read chapter 2 (which provides a great explanation of bias-variance tradeoff) only, and treat the rest of the book as reference.
I loved this book, its presentation is very nice, and the topics are very well reviewed, with beautiful examples and figures, always trying to unify the view towards the same basic linear models. It is a little biased towards Lasso because of the authors, but it is actually a good thing, as they present the intuition of many methods. It also makes clear distinction when hard mathematical details are to be presented, and have good exercises to understand this details better. The only thing that could be improved is that the context of some specific topics is not well understood, regarding why it is in a place of the book and no other, because many topics are related, and it is difficult to make the connections. I would really recommend this book as an introduction to machine learning, specially because of the intuitive explanations, and the examples.
it's the classic for good reason, well written and well organized, but this field is not as magical as people believe. And decorating machine-learning books with informative, colorful, frequent pictures is absolutely what mathematical educators everywhere should be doing, but unfortunately it's only the intellectually vacuous computer fields that ever seem to stick enough pretty pictures in their books.
I would like to say machine learning won't make you the money you think it will, but sadly it does make people money---just for the wrong reasons.
Plenty of pictures. But the field is bullshit. Picture-heavy books like this are wonderful _except_ that then hundreds of pages are spend making it look like a thing which shouldn’t actually be considered a thing, is actually a thing.
It’s far better laid out than stuffy academic journal articles, yet as irrelevant as a stuffy academic journal.
Buy and read this if you’re a math student and want some pictures and examples of "how polynomials might apply to the real world". Buy and read it if you’re a programmer who wants to learn some statistics.
A clear and not-so-heavy on the math side introduction to Data Science and Statistical Learning.
I did not finish the book on its entirety since I already was versed in some of the topics. Notwithstanding, even in such situations, a quick glance gave me more intuition and nuance regarding to what I already knew.
I also learned a lot of new concepts, every Data Scientist should read this book.
I love this book. It’s been my constant fallback last couple of years. Whenever a question sprung up in my head about the fundamentals of an algorithm, ESL was there with just the precise, succinct information I needed. I normally don’t write reviews for textbooks, but this one had to be done. I owe one to ESL.
This remains a great reference book for machine learning, although the chapter on neural networks has become very dated. I wouldn't recommend it as a textbook or for self-study though. A lot of things are in the wrong order, and you'll frequently find yourself having to refer to later chapters in the first half of the book.
Concise but not obscure. Comfortable but not facile. Ideal for quick-witted generalists. Authors are a little sloppy with notation, but that seems to be de rigeur with Stanford statistical learning professors.
what cs229 should be. important contents: curse of dimensionality, linear regression vs. k-nn. Frameworks for bias v. variance. chapters on cross-validation and methods for estimating true error are backbone for backtesting