Å·±¦ÓéÀÖ

Jump to ratings and reviews
Rate this book

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists

Rate this book
These days it seems like everyone is collecting data. But all of that data is just raw information -- to make that information meaningful, it has to be organized, filtered, and analyzed. Anyone can apply data analysis tools and get results, but without the right approach those results may be useless. Author Philipp Janert teaches you how to think about how to effectively approach data analysis problems, and how to extract all of the available information from your data. Janert covers univariate data, data in multiple dimensions, time series data, graphical techniques, data mining, machine learning, and many other topics. He also reveals how seat-of-the-pants knowledge can lead you to the best approach right from the start, and how to assess results to determine if they're meaningful.

530 pages, Paperback

First published January 1, 2010

75 people are currently reading
1,723 people want to read

About the author

Philipp K. Janert

8Ìýbooks5Ìýfollowers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
120 (38%)
4 stars
113 (36%)
3 stars
63 (20%)
2 stars
12 (3%)
1 star
4 (1%)
Displaying 1 - 18 of 18 reviews
Profile Image for Louis.
226 reviews29 followers
December 28, 2010
This is a book that is how to think about data analysis, not only how to perform data analysis. Like a good data analysis, Janert's book is about insight and comprehension, not computation. And because of this it should be a part of any analysts bookshelf, set apart from all the books that merely teach tools and techniques.

The practice of data analysis can get a bad rap, especially by those who think that data analysis is only statistics. Most books on data analysis don’t help because they focus on using the features of a particular tool, leading to the view that data analysis is following a recipe from a cookbook. This book subverts this by being principally of how to think about data analysis, and providing examples using different tools (primarily R and Python, but he uses other examples as well)

Among other topics, Janert covers graphing, single and multi-variable analysis, probability, data modeling, statistics, simulation, component analysis, reporting, financial modeling and predictive analytics. In each section he starts by explaining the concepts, what it is for, and (just as important) what each topic is not. Working through it you get a sense of not just what and how of the various tools and methods discussed, but why they are used as well as some ways these techniques are misapplied.

Janert also illustrates the methods using some data analysis environments. Principally R and Python (with Numpy, Scipy and Matplotlib), but also other tools such as Gnuplot and the Gnu Scientific Library. What is helpful here is the focus is on what techniques and capabilities are needed in the tool, not the tool itself. Instead of being a cheerleader for a particular tool, Janert discusses in his appendix the qualities that make environments such as Matlab, R and Python good data analysis environments. However, this focus means that he does not teach any particular tool. If you want to learn how to use a particular tool for data analysis, you are better off getting a book on R or Python (or Matlab, Excel, etc.)

I received an electronic copy of this book as part of the O’Reilly Blogger program. But this was a book that was on my to buy list even if I did not get it from them. The book page at O'Reilly.com is here:
Profile Image for Igor.
109 reviews22 followers
April 6, 2016
Book is focused not on tools (mostly outdated by now), but on methods, skills and underlying math, with many examples of applications in real-life context. I really liked this approach, even though some equations were a bit too hard and some examples are too far from my area of interest.
Profile Image for John Orman.
685 reviews32 followers
February 18, 2013
I used this book in my online Data Analysis class, in which we used the open source language R. The book also mentions Octave, a clone of Matlab. I used Octave for my online Machine Learning class.
Python and Java are also given a brief description.
The reader is also asked to investigate Perl, Ruby, regular expressions, databases, and Unix.

This book has some very good sections on graphical analysis of data. Also includes probability modeling, and an interesting chapter on using statistics in mythbusting. How to mine data, perform simulations, and create predictive models.
Useful appendices cover scientific software tools, and the handling of data. An excellent review of using open source software to analyze and graph data.
Profile Image for Romain.
876 reviews53 followers
May 15, 2020
Les reproches faits à ce livre sont de deux ordres. Le premier porte sur sa structure -- voire son contenu -- qui n'est pas conventionnelle pour un livre intitulé Data analysis. C'est vrai que l'on s'attend à suivre une méthodologie, à être guidé et il faut bien reconnaître que ce n'est pas le cas. Si vous cherchez ce type d'ouvrage, je vous conseille de vous plonger dans [Book: Practical Data Science with R] qui est un excellent ouvrage tout à fait dans ce registre. Cette approche non conventionnelle n'est pas gênante et au contraire car elle aide à ouvrir la réflexion à voir autrement et surtout à réfléchir tout simplement. Il est aussi plus théorique et va au fond de choses -- dit autrement il y a des maths, tout ce qui l'avance est démontré et l'auteur s'efforce de faire passer deux messages:

- Il faut rester simple: back of envelope
- Il faut comprendre ce que l'on fait

Et il vrai qu'aujourd'hui -- je l'ai vu de mes yeux -- il est facile d'oublier ces deux fondamentaux et de bourrer des modèles compliqués d'un tas de données pour en sortir quelque chose que l'on ne saura pas expliquer et qui n'apportera donc rien -- valeur = 0.

En fait on dirait qu'il a mis dans ce livre une grande partie de ses connaissances, de son savoir faire et de son expérience acquise en tant que consultant pour des grandes entreprises. Ce retour d'expérience d'une grande richesse adresse a peu près tous les sujets -- et va même au-delà je pense aux chapitres consacrés à la simulation à la modélisation et aux probabilités -- qu'un consultant peut avoir à utiliser. Il faut aussi dire que pour chacun des sujets il fournit une bibliographie sélective pour aller plus loin. C'est la même chose pour les outils et c'est ici qu'arrive le second reproche qui consiste à dire que l'outillage présenté est un peu daté. S'il existe de meilleurs outils maintenant c'est tant mieux et ils n'exemptent toujours pas -- me semble-t-il -- de comprendre ce que l'on fait.

Enfin, je voudrais souligner une dernière chose que l'on tend à négliger pour un livre technique. C'est le ton, la façon d'exposer les choses. Au bout de quelques pages et en lisant ensuite ce livre de bout en bout on rentre en résonance avec l'auteur et sa façon d'expliquer. Résultat on comprend bien mieux les choses. Son objectif n'est pas d'en mettre plein la vue avec des algorithmes et des techniques complexes au contraire, il s'efforce en permanence de démystifier et de revenir aux fondamentaux. Pour illustrer le ton, voici un extrait du chapitre intitulé "What You Really Need to Know About Classical Statistics".

Basic classical statistics has always been somewhat of a mystery to me: a topic full of obscure notions such as t-tests and p-values, and confusing statements like "we fail to reject the null hypothesis" -- which I can read several times and still not know if it is saying yes, no, or maybe.


Initialement publié sur mon
Profile Image for K. Permyakova.
4 reviews
February 27, 2025
If you’re getting into data analysis and want to use open-source tools, this book is nice. It’s practical, hands-on, and explains concepts clearly without unnecessary fluff. Perfect for beginners who want to learn real-world data skills!
Profile Image for Earo.
23 reviews
January 1, 2013
Author keeps placing emphasis on insights instead of numbers while working with data. The ultimate goal of data analysis is to understand how the system works, not to show off how proficient you are at Math. That's the true spirit of professionalism. Some annoying jargon are well explained in a plain manner. Little sections on R.
157 reviews3 followers
August 7, 2012
This book focuses on methods and experience, using tools only for demonstrating on the topic.

Where many books already cover tools, this book covers what many don't, insights and experience. While many topics with enough explanation on the method and where to use it.

Highly recommended
Profile Image for Donn Lee.
365 reviews6 followers
October 12, 2016
I love most O'Reilly books and this one doesn't disappoint. There is plenty of great material in here for [aspiring] data scientists; very nicely chapterised with a nice mix of tools. Was doing a forecasting project and had this book constantly by my side for inspiration - wonderful reference.
Profile Image for Vuk Trifkovic.
522 reviews55 followers
February 17, 2011
Very good. Focus is firmly on the methods, but with just enough tooling or practical data. I'd start from this, and then dive into specific toolsets, say R or some Python libs..
629 reviews11 followers
January 3, 2012
I haven't touched this for a while, so it seemed more appropriate to move it to the on-hold category. I was really intrigued by the sections that I did read though, so I plan to get back to it soon.
Profile Image for Andries Burger.
22 reviews2 followers
May 5, 2011
Just started on this book. Have lots of data to turn into information. Some obscure, some blatant.

Will add to this review as I work my way through the book...
Profile Image for Matt Heavner.
1,060 reviews14 followers
October 24, 2011
This is a good thought provoking read. It is a reminder of lots of techniques I've already "learned" -- but a great practical review and refresher.
160 reviews4 followers
April 4, 2012
Good refresher on data analysis for more detailed work
Profile Image for Sefa.
57 reviews
Read
December 7, 2021
Data analysis book using Python/R and focusing on more methods in a not math-heavy way, rather than implementation details.
Profile Image for Daniel.
AuthorÌý3 books38 followers
February 1, 2016
Pleasantly focussed on methods, so that the somewhat dated view on technologies doesn't hurt too much.
Displaying 1 - 18 of 18 reviews

Can't find what you're looking for?

Get help and learn more about the design.