Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.
This is a very useful book, available . It introduces the major concepts of IR in a clear way. At the end of most chapters is an optional discussion of advanced topics. I wish the book went a little bit into examples of programming, but that's for someone else to take on, I guess.
Librarians should be forced to read at least parts of this book in school, to better understand why Google is eating our lunch.
Great book that provides introduction information retrieval + related topics, such as, elements of machine learning, etc. There is good balance between theoretical foundations & simplicity, but it doesn't as simple as other "popular" books, such as "Programming Collective Intelligence", etc. You can also find "pre-release" version of the book in the form of PDF (see links in other reviews)...
This is an excellent theoretical and foundational book on information retreival (AKA document search), and also covers some document classification. It does not cover any specific software packages or tools. You don't need this book to throw a bunch of documents into elasticsearch, but you do need it to understand why you're not getting the results you want back and how to fix it.
Note that this book was written in 2008 and it advocates support vector machines for classification; modern practice now largely favors neural networks instead.
The contents of this book have been made available for free online at
Another great and more conceptual book is the standard reference Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze, which describes fundamental algorithms in information retrieval, NLP, and machine learning.
يقدم الكتاب الأساس النظري اللازم لفهم التفاصيل المتعلقة ببناء محركات البحث، سواء المحلية أو تلك التي تعمل على فهرسة الويب معالجة المستندات قبل فهرستها القواميس والفهارس ضغط الفهارس تنقيط المستندات، Scoring تصنيف المستندات ranking تقييم نتائج البحث Evaluating بناء الفهارس للويب وغيرها من المواضيع
الكتاب ممتع، التفاصيل الجديدة كثيرة جدًا، لكن أسلوب الشرح والتدرج في المواضيع ممتاز درسنا الكتاب ضمن مقرر CS 3308
A classic book that is still relevant (hehe) today.
It was interesting to see the ideas and methods that powered initial search engines before the distributed representations with embedding vectors appeared.
Many of these are still unbeatable today, like BM25, and we can use them together with newer semantic matching algorithms.
Recommended reading for anyone working on or interested in understanding search systems.
after bought a hard copy of this book, I tried to implement a simple IR system in python using the data structure and algorithm as outlined in the book in an effort to learn more in-depth. I think most of the core concepts can be translated to code without much challenge. On the other hand, the writing of the book sometimes makes it a bit challenging to fully appreciate it. Also sometimes it is difficult to see which of those are still relevant in today's IR context and which are already outdated.
A strong review from plus the free pdf (hurrah!) at equals an excited dankling, especially as I'm pretty weak overall on search (we leave the n-dimensional SVM's -- and machine learning in general -- to our scientists, whereas my job is to take their data and operator schemata and set it to soaking every last cycle and flooding every last bus), but it's becoming pretty much a canonical element of the "advanced implementation" suite; ie, I expect any decent applicant to reasonably and volubly discourse upon:
- high-level design of interpreters, compilers, debuggers, profilers - basic PLC (scoping choices, typing choices, OO theory fundamentals) - how network stacks and high-speed servers are constructed - combinatorial logic, fundamentals of processor design, memory hierarchies - parallelism as applied to threads, instructions, and memory - algorithms for discrete sort/search (knuth book 3), complexity - design of many-way simulation software, basic numeric computing - arbitrary graph theory / combinatorics, to keep 'em on their toes
this might need to expand to include less discrete sort/search, in which i'd include SVM and NLP-y problems, as well as randomized methods like skip lists and smaller-space stochastics like Bloom filters and random walks.
Christopher Manning is a rock star in both the NLP and information retrieval fields.
I used this book as a guide and source for the course in IR in Sofia University. It is well-written, gradual and observes most aspects of IR, with some machine learning, computational linguistics and algorithmic flavours. I recommend it to anyone interested in the field.
Amazing book. An information retrieval book written by computer scientists/engineers rather than librarians. That shouldn't be something worth getting excited about, but unfortunately all of the other good people apparently just went to work for Google instead of writing good textbooks.
Scanned through quickly. A bit disappointed since I don't feel like learning much new things. Most important concepts are either in the lines of Data management or data mining techniques. Maybe I've missed the main part.
It's a very good book for beginners who enter in data science specialization. Gives good knowledge for beginners in IR concepts and search engine basic architecture.