Alex's bookshelf: ir-dm-nlp-ml-search en-US Tue, 26 Dec 2023 02:10:49 -0800 60 Alex's bookshelf: ir-dm-nlp-ml-search 144 41 /images/layout/goodreads_logo_144.jpg AI-Powered Search 50311847 325 Trey Grainger 161729697X Alex 4 ir-dm-nlp-ml-search
P.S. I was technical proofreader for this book, and read the prerelease version, not final one.]]>
3.94 AI-Powered Search
author: Trey Grainger
name: Alex
average rating: 3.94
book published:
rating: 4
read at: 2023/12/20
date added: 2023/12/26
shelves: ir-dm-nlp-ml-search
review:
Good overview of ML-based approaches to improve search quality. One big selling points is that book was updated to include latest developments in area of LLMs, and how they could be applied to search-related stuff.

P.S. I was technical proofreader for this book, and read the prerelease version, not final one.
]]>
<![CDATA[Introducing MLOps: How to Scale Machine Learning in the Enterprise]]> 56169347
This book introduces the key concepts of MLOps to help data scientists and application engineers not only operationalize ML models to drive real business change but also maintain and improve those models over time. Through lessons based on numerous MLOps applications around the world, nine experts in machine learning provide insights into the five steps of the model life cycle--Build, Preproduction, Deployment, Monitoring, and Governance--uncovering how robust MLOps processes can be infused throughout.

This book helps you:


Fulfill data science value by reducing friction throughout ML pipelines and workflows
Refine ML models through retraining, periodic tuning, and complete remodeling to ensure long-term accuracy
Design the MLOps life cycle to minimize organizational risks with models that are unbiased, fair, and explainable
Operationalize ML models for pipeline deployment and for external business systems that are more complex and less standardized]]>
308 Mark Treveil 1098116437 Alex 4 ir-dm-nlp-ml-search 3.73 Introducing MLOps: How to Scale Machine Learning in the Enterprise
author: Mark Treveil
name: Alex
average rating: 3.73
book published:
rating: 4
read at: 2021/12/31
date added: 2023/03/02
shelves: ir-dm-nlp-ml-search
review:
good intro into MLOps - I'm not new to it, but some aspects were well formulated
]]>
<![CDATA[Natural Language Processing with Python and spaCy: A Practical Introduction]]> 51344063 An introduction to natural language processing with Python using spaCy, a leading Python natural language processing library.Natural Language Processing with Python and spaCy will show you how to create NLP applications like chatbots, text-condensing scripts, and order-processing tools quickly and easily. You'll learn how to leverage the spaCy library to extract meaning from text intelligently; how to determine the relationships between words in a sentence (syntactic dependency parsing); identify nouns, verbs, and other parts of speech (part-of-speech tagging); and sort proper nouns into categories like people, organizations, and locations (named entity recognizing). You'll even learn how to transform statements into questions to keep a conversation going.You'll also learn how   �  Work with word vectors to mathematically find words with similar meanings (Chapter 5)  �  Identify patterns within data using spaCy's built-in displaCy visualizer (Chapter 7)  �  Automatically extract keywords from user input and store them in a relational database (Chapter 9)  �  Deploy a chatbot app to interact with users over the internet (Chapter 11)"Try This" sections in each chapter encourage you to practice what you've learned by expanding the book's example scripts to handle a wider range of inputs, add error handling, and build professional-quality applications.By the end of the book, you'll be creating your own NLP applications with Python and spaCy.]]> 217 Yuli Vasiliev 171850053X Alex 3 ir-dm-nlp-ml-search 3.75 Natural Language Processing with Python and spaCy: A Practical Introduction
author: Yuli Vasiliev
name: Alex
average rating: 3.75
book published:
rating: 3
read at: 2021/08/29
date added: 2021/08/29
shelves: ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Deep Learning: A Practitioner's Approach]]> 25753498
Authors Adam Gibson and Josh Patterson present the latest relevant papers and techniques in a non­academic manner, and implement the core mathematics in their DL4J library. If you work in the embedded, desktop, and big data/Hadoop spaces and really want to understand deep learning, this is your book.]]>
200 Josh Patterson 1491914254 Alex 0 3.74 2015 Deep Learning: A Practitioner's Approach
author: Josh Patterson
name: Alex
average rating: 3.74
book published: 2015
rating: 0
read at:
date added: 2020/01/04
shelves: ir-dm-nlp-ml-search, own-ebook, to-read
review:

]]>
<![CDATA[Learning Spark: Lightning-Fast Big Data Analysis]]> 17318146
Written by the developers of Spark, this book will have you up and running in no time. You’ll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoop’s raw Java API.


Quickly dive into Spark capabilities such as collect, count, reduce, and save
Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
Learn how to run interactive, iterative, and incremental analyses
Integrate with Scala to manipulate distributed datasets like local collections
Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming]]>
274 Holden Karau 1449358624 Alex 4 ir-dm-nlp-ml-search, big-data 3.87 2013 Learning Spark: Lightning-Fast Big Data Analysis
author: Holden Karau
name: Alex
average rating: 3.87
book published: 2013
rating: 4
read at: 2015/04/01
date added: 2019/11/09
shelves: ir-dm-nlp-ml-search, big-data
review:
Quite good introduction into the Spark - covers all components, and not so outdated - book covers 1.1 + parts of 1.2
]]>
<![CDATA[Data Mining: Concepts, Models, Methods, and Algorithms]]> 974463 360 Mehmed Kantardzic 0471228524 Alex 0 to-read, ir-dm-nlp-ml-search 3.89 2002 Data Mining: Concepts, Models, Methods, and Algorithms
author: Mehmed Kantardzic
name: Alex
average rating: 3.89
book published: 2002
rating: 0
read at:
date added: 2019/08/04
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Bayesian Reasoning and Machine Learning]]> 10144695 735 David Barber 0521518148 Alex 0 ir-dm-nlp-ml-search, to-read 4.08 2012 Bayesian Reasoning and Machine Learning
author: David Barber
name: Alex
average rating: 4.08
book published: 2012
rating: 0
read at:
date added: 2018/11/11
shelves: ir-dm-nlp-ml-search, to-read
review:

]]>
<![CDATA[Ensemble Methods (Chapman & Hall/CRC Machine Learning & Pattern Recognition)]]> 14616969 236 Zhi-Hua Zhou 1439830037 Alex 0 ir-dm-nlp-ml-search, to-read 4.11 2012 Ensemble Methods (Chapman & Hall/CRC Machine Learning & Pattern Recognition)
author: Zhi-Hua Zhou
name: Alex
average rating: 4.11
book published: 2012
rating: 0
read at:
date added: 2018/11/11
shelves: ir-dm-nlp-ml-search, to-read
review:

]]>
<![CDATA[Deep Learning (Adaptive Computation and Machine Learning series)]]> 30422361 An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.

Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.]]>
778 Ian Goodfellow 0262035618 Alex 0 4.49 2016 Deep Learning (Adaptive Computation and Machine Learning series)
author: Ian Goodfellow
name: Alex
average rating: 4.49
book published: 2016
rating: 0
read at:
date added: 2018/11/11
shelves: ir-dm-nlp-ml-search, own-pbook, to-read
review:

]]>
Deep Learning for Search 37482070 325 Tommaso Teofili 1617294799 Alex 0 to-read, ir-dm-nlp-ml-search 4.24 Deep Learning for Search
author: Tommaso Teofili
name: Alex
average rating: 4.24
book published:
rating: 0
read at:
date added: 2018/03/18
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Advanced Analytics with Spark: Patterns for Learning from Data at Scale]]> 25314169 In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.

You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

Patterns include:

Recommending music and the Audioscrobbler data set Predicting forest cover with decision trees Anomaly detection in network traffic with K-means clustering Understanding Wikipedia with Latent Semantic Analysis Analyzing co-occurrence networks with GraphX Geospatial and temporal data analysis on the New York City Taxi Trips data Estimating financial risk through Monte Carlo simulation Analyzing genomics data and the BDG project Analyzing neuroimaging data with PySpark and Thunder ]]>
278 Sandy Ryza 1491912723 Alex 4 big-data, ir-dm-nlp-ml-search 3.78 2015 Advanced Analytics with Spark: Patterns for Learning from Data at Scale
author: Sandy Ryza
name: Alex
average rating: 3.78
book published: 2015
rating: 4
read at: 2015/11/03
date added: 2018/01/04
shelves: big-data, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Serving Machine Learning Models]]> 36490565
This practical report demonstrates a more standardized approach to model serving and model scoring. Author Boris Lublinsky, Principal Architect at Lightbend, introduces architecture for serving models in real time as part of input stream processing. This approach would also enable data science teams to update models without restarting existing applications.

Using Python, Beam, Flink, Spark, Kafka streams and Akka code examples (available on GitHub), Lublinsky examines different ways to build this model-scoring solution, using several popular stream processing engines and frameworks.

You’ll explore:
- Methods for exporting models, using Predictive Model Markup Language (PMML) and TensorFlow as examples
- Implementing Lightbend’s architecture with stream processing engines: Spark, Flink, and Beam
- Implementing the same solution with stream processing libraries: Kafka Streams and Akka Streams
- Methods for monitoring the architecture with queryable state]]>
104 Boris Lublinsky 1492024066 Alex 3 ir-dm-nlp-ml-search 2.25 Serving Machine Learning Models
author: Boris Lublinsky
name: Alex
average rating: 2.25
book published:
rating: 3
read at: 2017/11/09
date added: 2017/11/09
shelves: ir-dm-nlp-ml-search
review:

]]>
Deep Learning with Python 33986067
In particular, Deep learning excels at solving machine perception problems: understanding the content of image data, video data, or sound data. Here's a simple example: say you have a large collection of images, and that you want tags associated with each image, for example, "dog," "cat," etc. Deep learning can allow you to create a system that understands how to map such tags to images, learning only from examples. This system can then be applied to new images, automating the task of photo tagging. A deep learning model only has to be fed examples of a task to start generating useful results on new data.]]>
350 François Chollet 1617294438 Alex 5
Very good practical introduction into deep learning based on the Keras. Author explains all major architectures of neural networks without digging into mathematics, on practical examples that are easy to adapt to your own tasks.
]]>
4.56 2017 Deep Learning with Python
author: François Chollet
name: Alex
average rating: 4.56
book published: 2017
rating: 5
read at: 2017/10/15
date added: 2017/10/15
shelves: ir-dm-nlp-ml-search, own-ebook
review:
(maybe close to 9/10)

Very good practical introduction into deep learning based on the Keras. Author explains all major architectures of neural networks without digging into mathematics, on practical examples that are easy to adapt to your own tasks.

]]>
<![CDATA[Machine Learning: A Probabilistic Perspective]]> 15857489
Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.

The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.]]>
1104 Kevin P. Murphy 0262018020 Alex 0 ir-dm-nlp-ml-search, to-read 4.37 Machine Learning: A Probabilistic Perspective
author: Kevin P. Murphy
name: Alex
average rating: 4.37
book published:
rating: 0
read at:
date added: 2017/09/27
shelves: ir-dm-nlp-ml-search, to-read
review:

]]>
<![CDATA[Boosting: Foundations and Algorithms (Adaptive Computation and Machine Learning)]]> 13392506 An accessible introduction and essential reference for an approach to machine learning that creates highly accurate prediction rules by combining many weak and inaccurate ones.

Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.� A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.

This book, written by the inventors of the method, brings together, organizes, simplifies, and substantially extends two decades of research on boosting, presenting both theory and applications in a way that is accessible to readers from diverse backgrounds while also providing an authoritative reference for advanced researchers. With its introductory treatment of all material and its inclusion of exercises in every chapter, the book is appropriate for course use as well.

The book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics. Numerous applications and practical illustrations are offered throughout.]]>
526 Robert E. Schapire 0262017180 Alex 0 ir-dm-nlp-ml-search, to-read 3.95 2012 Boosting: Foundations and Algorithms (Adaptive Computation and Machine Learning)
author: Robert E. Schapire
name: Alex
average rating: 3.95
book published: 2012
rating: 0
read at:
date added: 2017/09/27
shelves: ir-dm-nlp-ml-search, to-read
review:

]]>
<![CDATA[Foundations of Machine Learning]]> 15844113
This book is a general introduction to machine learning that can serve as a textbook for graduate students and a reference for researchers. It covers fundamental modern topics in machine learning while providing the theoretical basis and conceptual tools needed for the discussion and justification of algorithms. It also describes several key aspects of the application of these algorithms. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics.

Foundations of Machine Learning is unique in its focus on the analysis and theory of algorithms. The first four chapters lay the theoretical foundation for what follows; subsequent chapters are mostly self-contained. Topics covered include the Probably Approximately Correct (PAC) learning framework; generalization bounds based on Rademacher complexity and VC-dimension; Support Vector Machines (SVMs); kernel methods; boosting; on-line learning; multi-class classification; ranking; regression; algorithmic stability; dimensionality reduction; learning automata and languages; and reinforcement learning. Each chapter ends with a set of exercises. Appendixes provide additional material including concise probability review.

This second edition offers three new chapters, on model selection, maximum entropy models, and conditional entropy models. New material in the appendixes includes a major section on Fenchel duality, expanded coverage of concentration inequalities, and an entirely new entry on information theory. More than half of the exercises are new to this edition.]]>
480 Mehryar Mohri 026201825X Alex 0 ir-dm-nlp-ml-search, to-read 4.15 Foundations of Machine Learning
author: Mehryar Mohri
name: Alex
average rating: 4.15
book published:
rating: 0
read at:
date added: 2017/09/27
shelves: ir-dm-nlp-ml-search, to-read
review:

]]>
Semantic Web Programming 5140760 656 Hebeler 047041801X Alex 4 3.62 2009 Semantic Web Programming
author: Hebeler
name: Alex
average rating: 3.62
book published: 2009
rating: 4
read at: 2010/03/01
date added: 2017/08/26
shelves: own-pbook, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine]]> 21557290 721 Clinton Gormley 1449358543 Alex 5 4.27 2014 Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine
author: Clinton Gormley
name: Alex
average rating: 4.27
book published: 2014
rating: 5
read at: 2015/09/10
date added: 2017/08/26
shelves: big-data, ir-dm-nlp-ml-search, own-ebook
review:
Great book on Elasticsearch. Besides introduction of main Elasticsearch's features, it covers also many relevant areas, including Geo-search, selection/customization of analyzers, etc. Chapter about putting ES into production, with DO/DON'T DO list could be very helpful for many people who starting to use ES.
]]>
Solr in Action 17364147
Solr in Action is the definitive guide to implementing fast and scalable search using Apache Solr 4. It uses well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. Readers will gain a deep understanding of how to implement core Solr capabilities such as faceted navigation through search results, matched snippet highlighting, field collapsing and search results grouping, spell checking, query auto-complete, querying by functions, and more.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.]]>
664 Trey Grainger 1617291021 Alex 4
Real I would give it 9/10]]>
4.08 2013 Solr in Action
author: Trey Grainger
name: Alex
average rating: 4.08
book published: 2013
rating: 4
read at: 2015/04/09
date added: 2017/08/26
shelves: own-ebook, ir-dm-nlp-ml-search
review:
Very detailed description of Solr with many useful examples.

Real I would give it 9/10
]]>
Hadoop: The Definitive Guide 8788370 Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework — an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase, Hadoop’s database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

"Now you have the opportunity to learn about Hadoop from a master — not only of the technology, but also of common sense and plain talk."

—Doug Cutting, Cloudera

]]>
624 Tom White 1449389732 Alex 4
I actually read 1st edition as well, but I found many new & useful additions in new edition]]>
3.18 2009 Hadoop: The Definitive Guide
author: Tom White
name: Alex
average rating: 3.18
book published: 2009
rating: 4
read at: 2011/10/08
date added: 2017/08/26
shelves: ir-dm-nlp-ml-search, programming, own-ebook
review:
Good book on basics of Hadoop (HDFS, MapReduce & other related technologies). This book provides all necessary details to start work with Hadoop, program using it, administer, etc.

I actually read 1st edition as well, but I found many new & useful additions in new edition
]]>
<![CDATA[Network Security Through Data Analysis: Building Situational Awareness]]> 17623613
The book is divided into three large sections: data collection, analysis, and taking action. These can be iterative, as each discovery alerts the administrator to data that should be collected. Several forms of analysis and visualization are covered. Topics include:


What data to capture on your systems
Data fusion
Structures and storage systems for data
Using R, SiLK, and Python for analysis
Visualization and exploratory data analysis
Graph analysis
Network mapping
Address forensics: determining where traffic originates
Handling malware]]>
348 Michael S. Collins 1449357903 Alex 0 3.80 2013 Network Security Through Data Analysis: Building Situational Awareness
author: Michael S. Collins
name: Alex
average rating: 3.80
book published: 2013
rating: 0
read at:
date added: 2017/08/26
shelves: to-read, ir-dm-nlp-ml-search, own-ebook, security
review:

]]>
<![CDATA[Neural Networks and Learning Machines]]> 589000 936 Simon Haykin 0131471392 Alex 0 4.28 1994 Neural Networks and Learning Machines
author: Simon Haykin
name: Alex
average rating: 4.28
book published: 1994
rating: 0
read at:
date added: 2017/08/26
shelves: to-read, ir-dm-nlp-ml-search, own-pbook
review:

]]>
<![CDATA[The Modern Algebra of Information Retrieval (The Information Retrieval Series, 24)]]> 8213330 344 Dominich 3540776583 Alex 0 4.00 2008 The Modern Algebra of Information Retrieval (The Information Retrieval Series, 24)
author: Dominich
name: Alex
average rating: 4.00
book published: 2008
rating: 0
read at:
date added: 2017/08/26
shelves: ir-dm-nlp-ml-search, math, to-read, own-pbook
review:

]]>
<![CDATA[Data Analysis Using Regression and Multilevel/Hierarchical Models]]> 737071 648 Andrew Gelman 052168689X Alex 0 to-read, ir-dm-nlp-ml-search 4.36 2006 Data Analysis Using Regression and Multilevel/Hierarchical Models
author: Andrew Gelman
name: Alex
average rating: 4.36
book published: 2006
rating: 0
read at:
date added: 2017/05/08
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Multilabel Classification: Problem Analysis, Metrics and Techniques]]> 30209908 � The special characteristics of multi-labeled data and the metrics available to measure them.� The importance of taking advantage of label correlations to improve the results.� The different approaches followed to face multi-label classification.� The preprocessing techniques applicable to multi-label datasets.� The available software tools to work with multi-label data.
This book is beneficial for professionals and researchers in a variety of fields because of the wide range of potential applications for multilabel classification. Besides its multiple applications to classify different types of online information, it is also useful in many other areas, such as genomics and biology. No previous knowledge about the subject is required. The book introduces all the needed concepts to understand multilabel data characterization, treatment and evaluation.]]>
210 Francisco Herrera 3319411101 Alex 4 ir-dm-nlp-ml-search
P.S. So the size my reading queue is really increased instead of decreasing ;-)]]>
4.00 Multilabel Classification: Problem Analysis, Metrics and Techniques
author: Francisco Herrera
name: Alex
average rating: 4.00
book published:
rating: 4
read at: 2016/09/16
date added: 2016/09/19
shelves: ir-dm-nlp-ml-search
review:
Very good survey on the topic of multilabel classification. If you need more details on each of the described algorithms, approaches, then you need to have access to the referred papers.

P.S. So the size my reading queue is really increased instead of decreasing ;-)
]]>
Machine Learning with Spark 25030367
- Combine various techniques and models into an intelligent machine learning system

- Use Spark's powerful tools to load, analyze, clean, and transform your data

Apache Spark is a framework for distributed computing that is designed from the ground up to be optimized for low latency tasks and in-memory data storage. It is one of the few frameworks for parallel computing that combines speed, scalability, in-memory processing, and fault tolerance with ease of programming and a flexible, expressive, and powerful API design.

This book guides you through the basics of Spark's API used to load and process data and prepare the data to use as input to the various machine learning models. There are detailed examples and real-world use cases for you to explore common machine learning models including recommender systems, classification, regression, clustering, and dimensionality reduction. You will cover advanced topics such as working with large-scale text data, and methods for online machine learning and model evaluation using Spark Streaming.]]>
329 Nick Pentreath 1783288515 Alex 4 big-data, ir-dm-nlp-ml-search 3.76 2015 Machine Learning with Spark
author: Nick Pentreath
name: Alex
average rating: 3.76
book published: 2015
rating: 4
read at: 2016/05/23
date added: 2016/05/25
shelves: big-data, ir-dm-nlp-ml-search
review:
Good content on the use of Spark for ML, but take into account that this book is slightly outdated as it covers Spark 1.2
]]>
<![CDATA[Evaluating Machine Learning Models]]> 26402733
In this overview, Zheng first introduces the machine-learning workflow, and then dives into evaluation metrics and model selection. The latter half of the report focuses on hyperparameter tuning and A/B testing, which may benefit more seasoned machine-learning practitioners.

With this report, you will:

Learn the stages involved when developing a machine-learning model for use in a software application
Understand the metrics used for supervised learning models, including classification, regression, and ranking
Walk through evaluation mechanisms, such as hold?out validation, cross-validation, and bootstrapping
Explore hyperparameter tuning in detail, and discover why it’s so difficult
Learn the pitfalls of A/B testing, and examine a promising alternative: multi-armed bandits
Get suggestions for further reading, as well as useful software packages]]>
59 Alice Zheng Alex 4 ir-dm-nlp-ml-search 4.05 2015 Evaluating Machine Learning Models
author: Alice Zheng
name: Alex
average rating: 4.05
book published: 2015
rating: 4
read at: 2015/10/01
date added: 2015/10/01
shelves: ir-dm-nlp-ml-search
review:
short, quite good book (freely available from O'Reilly) on how to evaluate quality of the machine learning models...
]]>
<![CDATA[Optimization for Machine Learning (Neural Information Processing Series)]]> 13153519 Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.]]> 494 Suvrit Sra 026201646X Alex 0 3.93 2011 Optimization for Machine Learning (Neural Information Processing Series)
author: Suvrit Sra
name: Alex
average rating: 3.93
book published: 2011
rating: 0
read at:
date added: 2014/08/20
shelves: to-read, ir-dm-nlp-ml-search, own-pbook
review:

]]>
<![CDATA[Semi-supervised Learning (Adaptive Computation And Machine Learning)]]> 739788 A comprehensive review of an area of machine learning that deals with the use of unlabeled data in classification problems: state-of-the-art algorithms, a taxonomy of the field, applications, benchmark experiments, and directions for future research.

In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research.Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction.]]>
598 Olivier Chapelle 0262033585 Alex 0 4.00 2006 Semi-supervised Learning (Adaptive Computation And Machine Learning)
author: Olivier Chapelle
name: Alex
average rating: 4.00
book published: 2006
rating: 0
read at:
date added: 2014/08/20
shelves: own-pbook, ir-dm-nlp-ml-search, to-read
review:

]]>
<![CDATA[Information Retrieval: Implementing and Evaluating Search Engines]]> 8147668 606 Stefan BĂĽttcher 0262026511 Alex 0 4.12 2010 Information Retrieval: Implementing and Evaluating Search Engines
author: Stefan BĂĽttcher
name: Alex
average rating: 4.12
book published: 2010
rating: 0
read at:
date added: 2014/08/20
shelves: to-read, ir-dm-nlp-ml-search, own-pbook
review:

]]>
<![CDATA[Large-Scale Kernel Machines (Neural Information Processing)]]> 1928938 Solutions for learning from large-scale datasets, including kernel learning algorithms that scale linearly with the volume of the data and experiments carried out on realistically large datasets.

Pervasive and networked computers have dramatically reduced the cost of collecting and distributing large datasets. In this context, machine learning algorithms that scale poorly could simply become irrelevant. We need learning algorithms that scale linearly with the volume of the data while maintaining enough statistical efficiency to outperform algorithms that simply process a random subset of the data. This volume offers researchers and engineers practical solutions for learning from large scale datasets, with detailed descriptions of algorithms and experiments carried out on realistically large datasets. At the same time it offers researchers information that can address the relative lack of theoretical grounding for many useful algorithms. After a detailed description of state-of-the-art support vector machine technology, an introduction of the essential concepts discussed in the volume, and a comparison of primal and dual optimization techniques, the book progresses from well-understood techniques to more novel and controversial approaches. Many contributors have made their code and data available online for further experimentation. Topics covered include fast implementations of known algorithms, approximations that are amenable to theoretical guarantees, and algorithms that perform well in practice but are difficult to analyze theoretically.

Contributors
L�on Bottou, Yoshua Bengio, St�phane Canu, Eric Cosatto, Olivier Chapelle, Ronan Collobert, Dennis DeCoste, Ramani Duraiswami, Igor Durdanovic, Hans-Peter Graf, Arthur Gretton, Patrick Haffner, Stefanie Jegelka, Stephan Kanthak, S. Sathiya Keerthi, Yann LeCun, Chih-Jen Lin, Ga�lle Loosli, Joaquin Qui�onero-Candela, Carl Edward Rasmussen, Gunnar R�tsch, Vikas Chandrakant Raykar, Konrad Rieck, Vikas Sindhwani, Fabian Sinz, S�ren Sonnenburg, Jason Weston, Christopher K. I. Williams, Elad Yom-Tov]]>
396 0262026252 Alex 0 to-read, ir-dm-nlp-ml-search 3.50 2007 Large-Scale Kernel Machines (Neural Information Processing)
author: Research Scientist Léon Bottou
name: Alex
average rating: 3.50
book published: 2007
rating: 0
read at:
date added: 2014/08/07
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
Bad Data Handbook 15808128 262 Q. Ethan McCallum 1449321887 Alex 3 Maybe I'm working too long with imperfect data,...

P.S. Between 2 & 3]]>
3.55 2012 Bad Data Handbook
author: Q. Ethan McCallum
name: Alex
average rating: 3.55
book published: 2012
rating: 3
read at: 2014/07/19
date added: 2014/07/19
shelves: own-ebook, ir-dm-nlp-ml-search, big-data
review:
Set of the essays on the data cleanup, preprocessing, etc. but many things are too obvious, too from the common sense.
Maybe I'm working too long with imperfect data,...

P.S. Between 2 & 3
]]>
Clojure for Machine Learning 22062479
It explores many machine learning techniques and also describes how to use Clojure to build machine learning systems. This book starts off by introducing the simple machine learning problems of regression and classification. It also describes how you can implement these machine learning techniques in Clojure. The book also demonstrates several Clojure libraries, which can be useful in solving machine learning problems.

Clojure for Machine Learning familiarizes you with several pragmatic machine learning techniques. By the end of this book, you will be fully aware of the Clojure libraries that can be used to solve a given machine learning problem.]]>
292 Akhil Wali 1783284358 Alex 4
Real rating is between 3 & 4 - 3 mostly for the not so good formatting - I'd read the mobi version of the book, and expected that referenced external resources/papers will be hyperlinked as in the most of the books (from Manning & O'Reilly, for example), providing the easy way to access them.

The book's content itself is ok, and organization is similar to the Programming Collective Intelligence: Building Smart Web 2.0 Applications or Machine Learning in Action - there is some amount of the theory (with links to external resources/papers) and after that - how to implement this task in Clojure, or how to use existing wrapper for Java library.

The book starts by covering basic matrix operations using the core.matrix, and after that describes most popular ML tasks/algorithms - linear regression, data categorization (providing naive bayes implementation), neural networks, SVMs, clustering & anomaly detection. It also shows how to perform cross-validation of the models using the spam classifier as an example.

The book is easy to follow - you need to know relatively small subset of the Clojure (but you need to know it already!), and basic understanding of the ML tasks. In some cases, book provides complete implementations using the Incanter, core.matrix & built-in Clojure functions, and in some cases (neural networks, SVMs, etc.) it uses wrappers for existing libraries like Weka, liblinear, etc. Code style sometime not consistent, and could be improved, but in general is ok.

I could recommend this book if you're interested in implementing ML tasks in Clojure - you can get some pieces of code from it. But don't expect that you get deep understanding of the theory behind ML - you need to take some other books.]]>
3.53 2014 Clojure for Machine Learning
author: Akhil Wali
name: Alex
average rating: 3.53
book published: 2014
rating: 4
read at: 2014/06/25
date added: 2014/06/30
shelves: func-prog, ir-dm-nlp-ml-search
review:
Disclaimer: I got this book from Packt Publishing, but I planned to read it anyway...

Real rating is between 3 & 4 - 3 mostly for the not so good formatting - I'd read the mobi version of the book, and expected that referenced external resources/papers will be hyperlinked as in the most of the books (from Manning & O'Reilly, for example), providing the easy way to access them.

The book's content itself is ok, and organization is similar to the Programming Collective Intelligence: Building Smart Web 2.0 Applications or Machine Learning in Action - there is some amount of the theory (with links to external resources/papers) and after that - how to implement this task in Clojure, or how to use existing wrapper for Java library.

The book starts by covering basic matrix operations using the core.matrix, and after that describes most popular ML tasks/algorithms - linear regression, data categorization (providing naive bayes implementation), neural networks, SVMs, clustering & anomaly detection. It also shows how to perform cross-validation of the models using the spam classifier as an example.

The book is easy to follow - you need to know relatively small subset of the Clojure (but you need to know it already!), and basic understanding of the ML tasks. In some cases, book provides complete implementations using the Incanter, core.matrix & built-in Clojure functions, and in some cases (neural networks, SVMs, etc.) it uses wrappers for existing libraries like Weka, liblinear, etc. Code style sometime not consistent, and could be improved, but in general is ok.

I could recommend this book if you're interested in implementing ML tasks in Clojure - you can get some pieces of code from it. But don't expect that you get deep understanding of the theory behind ML - you need to take some other books.
]]>
<![CDATA[Taming Text: How to Find, Organize, and Manipulate It]]> 10478952
Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. It explores how to automatically organize text, using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. This book gives examples illustrating each of these topics, as well as the foundations upon which they are built.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.]]>
322 Grant S. Ingersoll 193398838X Alex 4 The book itself is completely practical with references to articles & books for people interested in more detailed/theoretical information.
]]>
3.79 2011 Taming Text: How to Find, Organize, and Manipulate It
author: Grant S. Ingersoll
name: Alex
average rating: 3.79
book published: 2011
rating: 4
read at: 2013/12/15
date added: 2013/12/25
shelves: ir-dm-nlp-ml-search, own-ebook, short-pile
review:
Good overview of different systems (Solr, OpenNLP & Mahout), approaches & algorithms for working with (unstructured) text - search, analyze, cluster, etc.
The book itself is completely practical with references to articles & books for people interested in more detailed/theoretical information.

]]>
<![CDATA[Neural Networks And Learning Machines]]> 11720073 864 Simon Haykin 0131293761 Alex 0 3.97 1993 Neural Networks And Learning Machines
author: Simon Haykin
name: Alex
average rating: 3.97
book published: 1993
rating: 0
read at:
date added: 2013/06/01
shelves: ir-dm-nlp-ml-search, own-pbook
review:

]]>
<![CDATA[Clojure Data Analysis Cookbook: Over 110 Recipes to Help You Diver into the World of Practical Data Analysis Using Clojure]]> 17716085
"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.

You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.]]>
326 Eric Rochester 178216264X Alex 4
The book itself is classical cookbook - there are different recipes, combined by common topics into chapters, but almost independent from each other. You can select any of them, and experiment with it. But book doesn't described theoretical foundations of corresponding examples - it only gives solution & explain how it works, but you can find more information by using links to additional information. Neither this book describes Clojure itself - you need to grab some other book for this, such as "Programming Clojure" or "Clojure Programming"

Book covers wide range of problems, starting with data import (from different sources & formats), cleanup. After that, author shows how performance of processing could be improved by different tricks, starting with basic type hints, and continue by using Clojure's primitives for concurrent programming (agents, STM, pmap, reducers, etc.). Separate chapter is dedicated to analysis of data using Cascalog - Clojure-based framework for data transformation & analysis.

Several chapters covers Incanter - from operations with datasets, to statistical analysis and charting - this information is enough to start to use Incanter for data analysis. Book describes latest version of Incanter - 1.4.1, although some information (such as import of data from SQL databases) will become outdated in Incanter 1.5.0 (I hope that it will be released soon).

Besides Incanter, book describes how to use Clojure together with other programs and libraries, such as Weka, Mathematica & R. And the last chapter of the book describes how you can create charts in web applications using Clojure web frameworks, Clojurescript & existing Javascript libraries for charting.

I can recommend this book if you're interested in data analysis, and have some experience/interest in Clojure.

I can't say that this is perfect book - there are some problems with code formatting, and in electronic version, URLs aren't clickable (that would be very useful when you want to read additional information. But overall impression from this book is very good.

I want to thank author, Eric Rochester, that he wrote such useful book.
]]>
3.64 2013 Clojure Data Analysis Cookbook: Over 110 Recipes to Help You Diver into the World of Practical Data Analysis Using Clojure
author: Eric Rochester
name: Alex
average rating: 3.64
book published: 2013
rating: 4
read at: 2013/05/26
date added: 2013/05/26
shelves: func-prog, math, ir-dm-nlp-ml-search, own-ebook
review:
(Disclaimer: I've got this book from Packt Publishing for review - as one of Incanter's maintainers, I was interested how this package is described there).

The book itself is classical cookbook - there are different recipes, combined by common topics into chapters, but almost independent from each other. You can select any of them, and experiment with it. But book doesn't described theoretical foundations of corresponding examples - it only gives solution & explain how it works, but you can find more information by using links to additional information. Neither this book describes Clojure itself - you need to grab some other book for this, such as "Programming Clojure" or "Clojure Programming"

Book covers wide range of problems, starting with data import (from different sources & formats), cleanup. After that, author shows how performance of processing could be improved by different tricks, starting with basic type hints, and continue by using Clojure's primitives for concurrent programming (agents, STM, pmap, reducers, etc.). Separate chapter is dedicated to analysis of data using Cascalog - Clojure-based framework for data transformation & analysis.

Several chapters covers Incanter - from operations with datasets, to statistical analysis and charting - this information is enough to start to use Incanter for data analysis. Book describes latest version of Incanter - 1.4.1, although some information (such as import of data from SQL databases) will become outdated in Incanter 1.5.0 (I hope that it will be released soon).

Besides Incanter, book describes how to use Clojure together with other programs and libraries, such as Weka, Mathematica & R. And the last chapter of the book describes how you can create charts in web applications using Clojure web frameworks, Clojurescript & existing Javascript libraries for charting.

I can recommend this book if you're interested in data analysis, and have some experience/interest in Clojure.

I can't say that this is perfect book - there are some problems with code formatting, and in electronic version, URLs aren't clickable (that would be very useful when you want to read additional information. But overall impression from this book is very good.

I want to thank author, Eric Rochester, that he wrote such useful book.

]]>
<![CDATA[Artificial Intelligence: A Modern Approach]]> 27543 1080 Stuart Russell 0137903952 Alex 0 4.20 1994 Artificial Intelligence: A Modern Approach
author: Stuart Russell
name: Alex
average rating: 4.20
book published: 1994
rating: 0
read at:
date added: 2012/12/31
shelves: ir-dm-nlp-ml-search, own-pbook, favorites, to-read, short-pile
review:

]]>
<![CDATA[Modern Information Retrieval: The Concepts and Technology Behind Search]]> 6482209 913 Ricardo Baeza-Yates 0321416910 Alex 0 to-read, ir-dm-nlp-ml-search 3.90 1999 Modern Information Retrieval: The Concepts and Technology Behind Search
author: Ricardo Baeza-Yates
name: Alex
average rating: 3.90
book published: 1999
rating: 0
read at:
date added: 2012/12/28
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Kernel Adaptive Filtering: A Comprehensive Introduction]]> 7895289 240 Weifeng Liu 0470447532 Alex 0 to-read, ir-dm-nlp-ml-search 5.00 2010 Kernel Adaptive Filtering: A Comprehensive Introduction
author: Weifeng Liu
name: Alex
average rating: 5.00
book published: 2010
rating: 0
read at:
date added: 2012/12/27
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Scaling up Machine Learning: Parallel and Distributed Approaches]]> 13129107 492 Ron Bekkerman 0521192242 Alex 0 4.00 2011 Scaling up Machine Learning: Parallel and Distributed Approaches
author: Ron Bekkerman
name: Alex
average rating: 4.00
book published: 2011
rating: 0
read at:
date added: 2012/10/11
shelves: to-read, ir-dm-nlp-ml-search, big-data
review:

]]>
Machine Learning in Action 12404631

The ability to take raw data, access it, filter it, process it, visualize it, understand it, and communicate it to others is possibly the most essential business problem for the coming decades. "Machine learning," the process of automating tasks once considered the domain of highly-trained analysts and mathematicians, is the key to efficiently extracting useful information from this sea of raw data.

Machine Learning in Action is a unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. In it, the author uses the flexible Python programming language to show how to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.

]]>
384 Peter Harrington 1617290181 Alex 4 3.78 2011 Machine Learning in Action
author: Peter Harrington
name: Alex
average rating: 3.78
book published: 2011
rating: 4
read at: 2012/10/08
date added: 2012/10/08
shelves: ir-dm-nlp-ml-search, own-ebook, own-pbook
review:

]]>
<![CDATA[Introduction to Information Retrieval]]> 3278309 506 Christopher D. Manning 0521865719 Alex 5 You can also find "pre-release" version of the book in the form of PDF (see links in other reviews)...]]> 4.22 2008 Introduction to Information Retrieval
author: Christopher D. Manning
name: Alex
average rating: 4.22
book published: 2008
rating: 5
read at: 2012/10/07
date added: 2012/10/07
shelves: ir-dm-nlp-ml-search, own-pbook
review:
Great book that provides introduction information retrieval + related topics, such as, elements of machine learning, etc. There is good balance between theoretical foundations & simplicity, but it doesn't as simple as other "popular" books, such as "Programming Collective Intelligence", etc.
You can also find "pre-release" version of the book in the form of PDF (see links in other reviews)...
]]>
<![CDATA[Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition]]> 6298639 1024 Dan Jurafsky 0135041961 Alex 5 I really didn't read it completely, for example, I omitted the speech recognition part, but I plan to return to this book in the near future...]]> 3.89 2000 Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
author: Dan Jurafsky
name: Alex
average rating: 3.89
book published: 2000
rating: 5
read at: 2012/10/03
date added: 2012/10/03
shelves: ir-dm-nlp-ml-search, own-pbook
review:
Very interesting & useful book for everybody who is interested in natural language processing. I read it when I took online NLP class from prof. Jurafsky at Coursera, and found many interesting things there.
I really didn't read it completely, for example, I omitted the speech recognition part, but I plan to return to this book in the near future...
]]>
Mining of Massive Datasets 12818088 326 Jure Leskovec 1107015359 Alex 0 4.35 2011 Mining of Massive Datasets
author: Jure Leskovec
name: Alex
average rating: 4.35
book published: 2011
rating: 0
read at:
date added: 2012/09/05
shelves: to-read, big-data, ir-dm-nlp-ml-search, short-pile
review:

]]>
<![CDATA[Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications]]> 15808115 339 James Pustejovsky 1449306667 Alex 0 3.54 2012 Natural Language Annotation for Machine Learning: A Guide to Corpus-Building for Applications
author: James Pustejovsky
name: Alex
average rating: 3.54
book published: 2012
rating: 0
read at:
date added: 2012/09/04
shelves: to-read, ir-dm-nlp-ml-search, own-ebook
review:

]]>
<![CDATA[Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology]]> 145058 556 Dan Gusfield 0521585198 Alex 0 4.08 1997 Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
author: Dan Gusfield
name: Alex
average rating: 4.08
book published: 1997
rating: 0
read at:
date added: 2012/03/13
shelves: ir-dm-nlp-ml-search, own-pbook, to-read
review:

]]>
<![CDATA[Quantitative Corpus Linguistics with R: A Practical Introduction]]> 5084828 260 Stefan Th. Gries 0415962706 Alex 0 to-read, ir-dm-nlp-ml-search 4.64 2008 Quantitative Corpus Linguistics with R: A Practical Introduction
author: Stefan Th. Gries
name: Alex
average rating: 4.64
book published: 2008
rating: 0
read at:
date added: 2012/02/24
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
Tika in Action 11201938
Tika in Action is a hands-on guide for developers working with search engines, content management systems and other similar applications who want to exploit the information locked in digital documents. It introduces you to the world of mining text and binary documents and other information sources like Internet media types and Dublin Core metadata. The book shows where Tika fits within this landscape and how readers can use Tika to build and extend applications. The book's many case studies give real-world experience from domains ranging from search engines to digital asset management and scientific data processing.

In addition to the architectural overviews, developers will find more detailed information in chapters that focus on advanced features like XMP metadata processing, automatic language detection and custom parser extensions. The book also describes common file formats like MS Word, PDF, HTML, and ZIP and the open source libraries used to process files in these formats. The included code examples are designed support hands-on experimentation.

This book requires no previous knowledge of Tika or text mining techniques, and will be most valuable to readers with a working knowledge of Java. Tika in Action fits perfectly with other Manning books including Lucene in Action, Mahout in Action, Taming Text, Algorithms of the Intelligent Web, and Collective Intelligence in Action.]]>
225 Chris A. Mattmann 1935182854 Alex 4
Books provides comprehensive description of framework itself, how to use it for different tasks (file format & language detection, text/metadata extraction, etc.), how to extend it to support new file formats (both detection & data extraction). Besides this, there are several chapters dedicated to real world use-cases - how Apache Tika is used in different projects.

I would recommend this book for everybody who need to perform media type detection and/or text extraction, especially who're working with indexing & searching of heterogeneous documents.

P.S. I gave 4 stars only because I would like to have more detailed description of how to create complex signatures for file formats (although, this information could be found on project's pages).
]]>
3.88 2011 Tika in Action
author: Chris A. Mattmann
name: Alex
average rating: 3.88
book published: 2011
rating: 4
read at: 2011/04/25
date added: 2012/01/25
shelves: ir-dm-nlp-ml-search, programming, own-pbook, own-ebook
review:
Very good book on media type detection & content extraction using the Apache Tika framework. By using Tika for text & metadata extraction you can index & search documents in many existing formats. You can also extend Tika with support new formats that are need in your work. And its open source nature, makes it very attractive for both open source & corporate developers, allowing flexible integration with many different systems, like, ManifoldCF, Lucene, UIMA, etc.

Books provides comprehensive description of framework itself, how to use it for different tasks (file format & language detection, text/metadata extraction, etc.), how to extend it to support new file formats (both detection & data extraction). Besides this, there are several chapters dedicated to real world use-cases - how Apache Tika is used in different projects.

I would recommend this book for everybody who need to perform media type detection and/or text extraction, especially who're working with indexing & searching of heterogeneous documents.

P.S. I gave 4 stars only because I would like to have more detailed description of how to create complex signatures for file formats (although, this information could be found on project's pages).

]]>
<![CDATA[Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit]]> 6392569
Packed with examples and exercises, Natural Language Processing with Python will help



This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.]]>
502 Steven Bird 0596516495 Alex 0 4.10 2009 Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
author: Steven Bird
name: Alex
average rating: 4.10
book published: 2009
rating: 0
read at:
date added: 2012/01/21
shelves: to-read, own-ebook, ir-dm-nlp-ml-search
review:

]]>
Mahout in Action 9546513

When computers harness prior experience to improve future performance, a type of artificial intelligence called machine learning has been applied. The Apache Mahout project is focused on three types of machine learning that are of particular interest to modern web developers "recommendation systems, classification, and clustering.



Through real-world examples, Mahout in Action introduces the sorts of problems that these techniques are appropriate for, and then illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability, and how to apply these techniques at very large scale with the Apache Hadoop framework.

]]>
415 Sean Owen 1935182684 Alex 5
For each of class of these problems, description starts with base things, and continues with more complex examples, including complete solutions, that could be easily adapted for your machine learning problems. All examples that come with book were checked with actual release of Apache Mahout (version 0.5).

Book is written in succinct, but understandable language and provides many code snippets that make understanding of topics much easier. Interesting solution in e-book version of Mahout in Action, is inclusion of audio and video snippets, that explains and/or show "hard places". There is also interesting description of one of Mahout's deployments in real world, where it's used in e-commerce.

So I recommend this book if you're interested in solving machine learning problems that works with very large data sets.]]>
3.64 2011 Mahout in Action
author: Sean Owen
name: Alex
average rating: 3.64
book published: 2011
rating: 5
read at: 2011/06/26
date added: 2011/10/15
shelves: ir-dm-nlp-ml-search, own-ebook, own-pbook
review:
This book doesn't provide deep coverage of theoretical foundations of machine learning (I would recommend to look to other books, like "Introduction to Machine Learning (Adaptive Computation and Machine Learning series)", "Machine Learning in Action" or "Programming Collective Intelligence: Building Smart Web 2.0 Applications", etc., if you want to get more background), but concentrates on explanation on how to use to solve some of machine learning problems: making recommendations, data clustering and classification.

For each of class of these problems, description starts with base things, and continues with more complex examples, including complete solutions, that could be easily adapted for your machine learning problems. All examples that come with book were checked with actual release of Apache Mahout (version 0.5).

Book is written in succinct, but understandable language and provides many code snippets that make understanding of topics much easier. Interesting solution in e-book version of Mahout in Action, is inclusion of audio and video snippets, that explains and/or show "hard places". There is also interesting description of one of Mahout's deployments in real world, where it's used in e-commerce.

So I recommend this book if you're interested in solving machine learning problems that works with very large data sets.
]]>
<![CDATA[Lucene in Action, Second Edition: Covers Apache Lucene 3.0]]> 8597368
Some things remain the same, though. Lucene still delivers high-performancesearch features in a disarmingly easy-to-use API. Due to its vibrant and diverseopen-source community of developers and users, Lucene is relentlessly improving,with evolutions to APIs, significant new features such as payloads, and ahuge increase (as much as 8x) in indexing speed with Lucene 2.3.

And with clear writing, reusable examples, and unmatched advice on bestpractices, Lucene in Action, Second Edition is still the definitive guide todeveloping with Lucene.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.]]>
475 Michael McCandless 1933988177 Alex 5 Highly recommended for all, who are planning to use Lucene]]> 4.00 2004 Lucene in Action, Second Edition: Covers Apache Lucene 3.0
author: Michael McCandless
name: Alex
average rating: 4.00
book published: 2004
rating: 5
read at: 2010/09/29
date added: 2011/07/19
shelves: ir-dm-nlp-ml-search, own-ebook
review:
Very good, detailed book about Lucene, providing different levels of details - from quick start to detailed discussion of Lucene's internals, contrib modules and success stories.
Highly recommended for all, who are planning to use Lucene
]]>
ManifoldCF in Action 11201928
ManifoldCF in Action is a comprehensive tutorial and reference that shows you how to integrate search with enterprise-level document repositories using ManifoldCF. The book begins with an architectural overview of ManifoldCF and how it fits into your application infrastructure. After covering the basics, it dives into examples showing typical integration tasks, such as setting up connections, using ManifoldCF as an engine under the control of another enterprise system, and integrating ManifoldCF's user-based security model with a search engine.

Although ManifoldCF provides connectors for a large number of repositories and search technologies, including Solr, FileNet, Windows shares, JDBC, Documentum, Meridio, and SharePoint, there are many for which no ManifoldCF connector yet exists. As you explore the ManifoldCF architecture, you'll learn how ManifoldCF interacts with individual connectors so that you can design your own custom connectors.]]>
400 Karl D. Wright 161729019X Alex 4 Book provides good description of system's architecture, together with description of how to extend it to support new input and output connectors.
]]>
3.50 2012 ManifoldCF in Action
author: Karl D. Wright
name: Alex
average rating: 3.50
book published: 2012
rating: 4
read at: 2011/04/25
date added: 2011/05/05
shelves: ir-dm-nlp-ml-search, programming, own-ebook
review:
This book describes use of ManifoldCF framework, that allows to integrate different content repositories (Sharepoint, Documentum, etc.) with document processing systems, like indexers, etc.
Book provides good description of system's architecture, together with description of how to extend it to support new input and output connectors.

]]>
<![CDATA[Data-Intensive Text Processing with MapReduce (Synthesis Lectures on Human Language Technologies, 7)]]> 8346166 178 Jimmy Lin 1608453421 Alex 5 If you'll read this book, then you need to look onto Cloud9 library ()]]> 4.13 2010 Data-Intensive Text Processing with MapReduce (Synthesis Lectures on Human Language Technologies, 7)
author: Jimmy Lin
name: Alex
average rating: 4.13
book published: 2010
rating: 5
read at: 2010/10/27
date added: 2010/10/27
shelves: ir-dm-nlp-ml-search, programming, favorites
review:
Great book on development algorithms for map/reduce-based solutions (Hadoop is mentioned, but not required to understand this book). This book describes tweaks of algorithms for map/reduce for different tasks - graph processing, machine learning, and common questions of map/reduce design, including performance optimization and related questions.
If you'll read this book, then you need to look onto Cloud9 library ()
]]>
<![CDATA[Beautiful Visualization: Looking at Data through the Eyes of Experts]]> 7405941
This book examines the methods of two dozen visualization experts who approach their projects from a variety of perspectives -- as artists, designers, commentators, scientists, analysts, statisticians, and more. Together they demonstrate how visualization can help us make sense of the world.


Explore the importance of storytelling with a simple visualization exercise
Learn how color conveys information that our brains recognize before we're fully aware of it
Discover how the books we buy and the people we associate with reveal clues to our deeper selves
Recognize a method to the madness of air travel with a visualization of civilian air traffic
Find out how researchers investigate unknown phenomena, from initial sketches to published papers Contributors include:

Nick Bilton, Michael E. Driscoll, Jonathan Feinberg, Danyel Fisher, Jessica Hagy, Gregor Hochmuth, Todd Holloway, Noah Iliinsky, Eddie Jabbour, Valdean Klump, Aaron Koblin, Robert Kosara, Valdis Krebs, JoAnn Kuchera-Morin et al., Andrew Odewahn, Adam Perer, Anders Persson, Maximilian Schich, Matthias Shapiro, Julie Steele, Moritz Stefaner, Jer Thorp, Fernanda Viegas, Martin Wattenberg, and Michael Young.]]>
416 Julie Steele 1449379869 Alex 0 to-read, ir-dm-nlp-ml-search 3.80 2010 Beautiful Visualization: Looking at Data through the Eyes of Experts
author: Julie Steele
name: Alex
average rating: 3.80
book published: 2010
rating: 0
read at:
date added: 2010/08/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Beautiful Data: The Stories Behind Elegant Data Solutions (Theory In Practice, #31)]]> 6492189
With Beautiful Data , you That's only small sample of what you'll find in Beautiful Data . For anyone who handles data, this is a truly fascinating book. Contributors ]]>
384 Toby Segaran 0596157118 Alex 0 to-read, ir-dm-nlp-ml-search 3.67 2009 Beautiful Data: The Stories Behind Elegant Data Solutions (Theory In Practice, #31)
author: Toby Segaran
name: Alex
average rating: 3.67
book published: 2009
rating: 0
read at:
date added: 2010/08/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
Hadoop: The Definitive Guide 6308439 528 Tom White 0596521979 Alex 4 I'll recommend this book for all developers, who want to learn about Hadoop, it's usage and programming for it]]> 4.00 2009 Hadoop: The Definitive Guide
author: Tom White
name: Alex
average rating: 4.00
book published: 2009
rating: 4
read at: 2010/07/04
date added: 2010/07/04
shelves: ir-dm-nlp-ml-search, programming
review:
Very good book, that allows to get high level overview of Hadoop, and related projects, together with description of other Hadoop-related projects - Pig, HBase, and other.
I'll recommend this book for all developers, who want to learn about Hadoop, it's usage and programming for it
]]>
<![CDATA[Kernel Methods for Pattern Analysis]]> 1059951 478 John Shawe-Taylor 0521813972 Alex 0 to-read, ir-dm-nlp-ml-search 3.96 2004 Kernel Methods for Pattern Analysis
author: John Shawe-Taylor
name: Alex
average rating: 3.96
book published: 2004
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp]]> 83884 976 Peter Norvig 1558601910 Alex 0 4.33 1991 Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
author: Peter Norvig
name: Alex
average rating: 4.33
book published: 1991
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, func-prog, ir-dm-nlp-ml-search, own-pbook, cs
review:

]]>
<![CDATA[Neural Network Learning: Theoretical Foundations]]> 2651869 404 Martin Anthony 052157353X Alex 0 to-read, ir-dm-nlp-ml-search 0.0 1999 Neural Network Learning: Theoretical Foundations
author: Martin Anthony
name: Alex
average rating: 0.0
book published: 1999
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Handbook of Statistical Analysis and Data Mining Applications]]> 6056058 864 Robert A. Nisbet 0123747651 Alex 0 3.82 2009 Handbook of Statistical Analysis and Data Mining Applications
author: Robert A. Nisbet
name: Alex
average rating: 3.82
book published: 2009
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search, math
review:

]]>
<![CDATA[Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)]]> 626460 772 Jiawei Han 1558609016 Alex 0 to-read, ir-dm-nlp-ml-search 3.97 2000 Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)
author: Jiawei Han
name: Alex
average rating: 3.97
book published: 2000
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models (Complex Adaptive Systems)]]> 256398 608 Vojislav Kecman 0262112558 Alex 0 to-read, ir-dm-nlp-ml-search 3.75 2001 Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models (Complex Adaptive Systems)
author: Vojislav Kecman
name: Alex
average rating: 3.75
book published: 2001
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Introduction to Machine Learning (Adaptive Computation and Machine Learning)]]> 7895679 584 Ethem Alpaydin 026201243X Alex 0 3.50 2004 Introduction to Machine Learning (Adaptive Computation and Machine Learning)
author: Ethem Alpaydin
name: Alex
average rating: 3.50
book published: 2004
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search, own-pbook
review:

]]>
<![CDATA[Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning)]]> 213033 644 Bernhard Schölkopf 0262194759 Alex 0 to-read, ir-dm-nlp-ml-search 4.03 2001 Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning)
author: Bernhard Schölkopf
name: Alex
average rating: 4.03
book published: 2001
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[An Introduction to Support Vector Machines and Other Kernel-based Learning Methods]]> 264011 204 Nello Cristianini 0521780195 Alex 0 to-read, ir-dm-nlp-ml-search 3.97 2000 An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
author: Nello Cristianini
name: Alex
average rating: 3.97
book published: 2000
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[The Elements of Statistical Learning: Data Mining, Inference, and Prediction]]> 148009 552 Trevor Hastie 0387952845 Alex 0 4.41 2001 The Elements of Statistical Learning: Data Mining, Inference, and Prediction
author: Trevor Hastie
name: Alex
average rating: 4.41
book published: 2001
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search, math
review:

]]>
<![CDATA[Data Mining: Practical Machine Learning Tools and Techniques]]> 213031
Download Link :


ĚýĚýĚýĚýĚýĚý

ĚýĚýĚýĚýĚýĚý


0128042915 Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems) PDF by Ian H. Witten
Read Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems) PDF from Morgan Kaufmann,Ian H. Witten
Download Ian H. Witten's PDF E-book Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems)]]>
560 Ian H. Witten 0120884070 Alex 0 to-read, ir-dm-nlp-ml-search 3.92 1999 Data Mining: Practical Machine Learning Tools and Techniques
author: Ian H. Witten
name: Alex
average rating: 3.92
book published: 1999
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, ir-dm-nlp-ml-search
review:

]]>
<![CDATA[Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series)]]> 128033 352 David A. Grossman 1402030045 Alex 0 3.67 1998 Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series)
author: David A. Grossman
name: Alex
average rating: 3.67
book published: 1998
rating: 0
read at:
date added: 2010/05/15
shelves: to-read, own-pbook, ir-dm-nlp-ml-search
review:

]]>