An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree- based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
I took a Machine Learning class during my last semester. This is the book that was used for the course (we also used Elements of Statistical Learning as the secondary text). I loved it. I thought the explanations were great as well as the exercises. I took the online course offered through Stanford at the same time and got to watch Trevor Hastie & Rob Tibshirani themselves. The videos were hilarious and informative. I'd highly recommend reading the book as well as taking the online course.
The book explains concepts of Statistical Learning from the very beginning. The core ideas such as bias-variance tradeoff are deeply discussed and revisited in many problems. The included R examples are particularly helpful for beginners to learn R. The book also provides a brief, but concise description of functions' parameters for many related R packages.
My professor thinks this book is a "superficial" version of The Elements of Statistical Learning, but I disagree. Yes, it may be easy for the reader to understand, but isn't it true that a great educator is someone who can explain complicated concepts in simple terms? There should be no shame in reading such a book, one that does a wonderful job of breaking things down.
If one wishes to learn more about a particular topic, I'd recommend The Element of Statistical Learning. These two pair nicely together.
Somehow both dry and heavy on intuition, . Stuff which you'll actually use. I've brushed up against most of it before (: I've called most of it from the safe distance of a nice Python library before), but it took a second pass and doing all the exercises to click.
To actually learn (grok) something, you need
1. To do it, not just read about it 2. To read it several times 3. To feel challenged but not overwhelmed by it
And 2&3 conflict.
(Most books don't have a natural do-operator. How do you do a novel? I make do with these reviews; others do fanfiction and probably get the same benefit.)
Kind of annoying that the figures are never next to their discussion. And I was hoping this would make me like R but I can't and I don't. But good.
Clear, intuitive exposition of a subset of methods in statistical learning. Great illustrations and plenty of R code. My only complaint is that the R code is quite ugly looking, which is no surprise since it was written by statisticians, but the authors should be forgiven for this minor infraction. Overall I highly recommend this book.
Probably the most accessible machine/statistical learning textbook out there. Even understandable for people without rigorous training in statistics or mathematics. Very much based on intuition.
Pay attention to the videos by the authors that follow the chapters of the book (made for a Stanford MOOC but freely accessible on yt: ).
If you're going to read one book on statistical modeling, make it this one. When I teach data science to software engineers, this book is one of the cornerstones. I find that this book has just the right amount of theory for beginner, coupled with very useful R examples.
If you can read, understand, and practice this book, you will be employable as an entry-level data scientist. [caveat: this statement was true when I wrote it in 2017, but it may no longer be the case in 2023]
One caveat is that I do not recommend emulating the authors' software practices, such as using attach() on data frames or using base-R graphics.
Now reading the 2nd edition as I prepare to teach a course based on it. Skimming through it, I really like the discussion of double descent (in the deep learning / neural nets chapter).
~~~~~~
[Old notes from when I skimmed the 1st edition a few years ago:]
Skimmed just through Ch 3 (linear regression) so far. Hoped it'd be something I can recommend to a total novice, but it isn't. That's fine---it's just for a higher-level audience than I was hoping.
Based on my experience TA'ing statistical novices, I suspect the linear regression stuff is already too dense and rushed to help them really understand what's going on & why. They'll need a little more time on each aspect, a few more examples, a little deeper sense of why we do these things. On the other hand, if you've already taken a regression course, this chapter is a perfectly good review.
Also, pet peeve: while this book is aimed mostly at prediction (and that's great), they also chose to mention hypothesis testing briefly and didn't handle it well. Yes, statistical hypotheses are *phrased* as questions about parameters (say, is the regression slope beta1 = 0?). But they're really questions about the *dataset*, its design and size. The actual question isn't "Is the slope really exactly 0?" but rather "Did we *measure* the slope precisely enough?" If we don't reject the null, that doesn't mean we conclude something is zero. We conclude we don't have enough data to estimate it precisely, and that opens a distinct set of options: * collect more data to estimate it better; or * drop it from the model because the estimate is noisy even though it could have an important effect, to avoid overfitting; or * include it in the model even though the estimate is noisy, to avoid underfitting. We should be making it clear to students that we have a real decision between these options. If we sweep it under the rug of "always drop insigificant terms," we weaken their understanding and we lose the important connections with bias-variance tradeoff, over- vs under-fitting, etc.
Amazing book! A great intro to ML and statistical learning with some solid, clear and practical examples. Some of the concepts introduced appear so simple to the human mind, but getting the machine to learn these concepts is a whole different science. This book made me appreciate the wonders of ML. It also reinforced the notion that vast industries will be revolutionized, it is just a matter of time. In this book alone, I learned about the different techniques in supervised learning and unsupervised learning algorithms (ie. Bootstrap, bagging, random forest, boosting). - Bootstrap: technique to treat the sample as a population and repeatedly drawing new samples - Bagging: averaging the resulting predictions from X # of decision trees - Random Forest: As decision trees are split, only strong predictors are kept forcing the average to be less variable - Boosting: Each decision tree is built using information from old trees, so all the trees are a modified version of the initial dataset
It's my first step into data science but won't be my last on it. Looking forward to deepening my knowledge into this. Reading the book + watching the ISL youtube vids helped significantly in understanding the concepts.
A very good book of statistics that you can read after your Statistics 101 course, centered on machine learning. Very clear prose, very consistent notation, and in general everything that one asks from a good statistics book. I've read 95% of it and it's very good if you don`t know much. I found the exercises quite difficult, though. I have no knowledge of algebra or calculus, so I just could't do some of them. And many things I had to believe by faith. I'm ok with faith, but ocassionally the authors dug deeper and I became lost. But only 5% of the time. The good thing is that after you read it, you can do pretty cool things with R, thanks to the labs at the end of each chapter. I also liked the great importance the authors give to resampling methods (basically the way they test their models) and to the variance-bias tradeoff. Excellent second book for beginners. The non-simplified version (Elements of Statistical Learning) I found too advanced and unreadable from the first page.
A good introduction to the methods of statistical learning, presenting techniques in a clear way and showing some of the practical issues involved in real-world use of regression and classification models. While some math is unavoidable when defining the tools presented in this book, the formulas are kept at a level that might be suitable for those with less mathematical baggage than willingness to understand the concepts, and the R exercises can be very useful to the more practically-minded readers.
The book starts with a good introduction to basic classifiers, their differences, why we need each one of them or why we don't. It also mentions evaluators for each kind of classifier and explains how they are relevant in the beginning chapters. This is extremely helpful since it provides a holistic view of the flow which will be explained in further chapters. Much better intro to machine learning compared to other books. Loads of problems to work on which makes sure the understanding has seeped in.
An excellent introduction to statistical learning presenting the main algorithms for both regression and classification (linear regression, logistic regression, lasso, LDA, KNN, tree bagging and boosting, SVM, etc), as well as the important statistical tests (R^2, p-value, ROC, CV, concept of bias-variance tradeoff, etc...). Things are kept very simple with light-weight mathematics. The accompanying R labs help the reader consolidate his knowledge and get his hands dirty on real datasets. The exercises form an important complement but it is unfortunate that the answers are not given (one will manage to find most online though). It's important to always have exercises with answers: if one is able to answer all, he doesn't need to do the exercise in the first place. If one is stuck or just gets it wrong, only an access to the solution will make him progress. Note that the book focuses on classical methods and deep learning is not covered. Makes a nice complement to more technical ML manuals.
Overall, I loved it, especially the conceptual part: it straightforwardly explains a lot of ML/stats concepts. It's very useful for learning the theoretical background of ML/stats techniques, developing intuition of when to use each one, and what to expect from each of them. It progressively develops understanding starting with regression and draws parallels+contrasts between the methods.
However, I think that the "Applications with R" part could improve a lot by using some framework like caret or tidymodels that provides a common interface to the different techniques taught in the book. In ISLR, each model is taught using different packages that often have different APIs and code styles, which can be distracting (and also, those who read the book would benefit from immediately learning a more powerful/global tool like caret/tidymodels). On top of that, the code taught sometimes doesn't follow best practices (eg, attach () datasets). And since the book tries to introduce readers to R, it could be argued that this would be best accomplished using the tidyverse.
Clear and gentle introduction to non-neural net-based machine learning. Suitable for undergrads, it covers a useful collection of topics that aren't always given emphasis in introductory texts, like resampling methods and model selection. Very clear, very non-pretentious discussion of support vector machines. This text is the smooth chaser to its bristly, discourteous cousin "The Elements of Statistical Learning", a text that should only be consulted much later in life and even then only under duress. The reason I can't give this one 5 stars is the tragic choice of R for all Labs and examples; sadly, the authors bet on the wrong horse. A new edition with python using scikit-learn would totally kick this one's ass.
gotta be honest, I did not read this whole textbook, but I've read enough over the year for it to count towards my reading goal thank you very much.
I've never reviewed a textbook before... um... very informative, helpful with assignments (thank you Gareth, Trevor, Robert, and Daniela), will actually be one of the few textbooks I hold onto so that's got to count for something, right?
This book was a supplementary resource to DS Bootcamp. I read Chapters 1-5, 8-10.
Comprehending the concepts and mathematics behind supervised and unsupervised learnings for regression, classification, and clustering is the goal of this book, and it did a really great job. I wouldn't suggest reading it unless you practice the ideas and algorithms on appropriate data sets either in Python or R.
This is a masterfully written book. There is of course, no better way to start with statistical learning than the brilliant tour-de-force of ISLR and ESL. I do personally find myself enjoying ESL more in some cases. This is easy to recommend, and a good introduction to statistics, especially in that it provides an even handed, "try things out first" approach. However, it would be a disservice to the community and authors to never delve into the details of the methods and techniques described.
It goes into great detail with the statistical part and glances a bit superficially at the R language part. An interesting read cover to cover since statistics has always been an intriguing science to me. But don't go into it expecting to become a great programmer after it.
A good book on statistical/machine learning. Focus is on intuition and practical sides. Also, the book is really comprehensive in terms of coverage of algorithms. Just would be nice to have Python implementations of labs on top of R.