Name: Data-Intensive Text Processing with MapReduce (Synthesis Lectures on Human Language Technologies, 7)
Rating: 4.14 (4 reviews)
ISBN: 9781608453429

Rate this book

Data-Intensive Text Processing with MapReduce

Jimmy Lin, Chris Dyer, Graeme Hirst��(Editor)

Rate this book

Our world is being revolutionized by data-driven access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well. Table of Introduction / MapReduce Basics / MapReduce Algorithm Design / Inverted Indexing for Text Retrieval / Graph Algorithms / EM Algorithms for Text Processing / Closing Remarks

GenresProgrammingTechnologyTechnicalComputer ScienceNonfiction

178 pages, Paperback

First published April 30, 2010

7 people are currently reading

69 people want to read

About the author

Jimmy Lin

6��books1��follower

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

14 (37%)

4 stars

16 (43%)

3 stars

5 (13%)

2 stars

2 (5%)

1 star

0 (0%)

Displaying 1 - 4 of 4 reviews

Kai Weber

505 reviews43 followers

June 23, 2018

This is a very concise text on the given topic: In the first half it breaks down the basic structure of the MapReduce algorithm, often with a short glance at implementations in Apache Hadoop or at Google's offices. There are already some simple, yet practical examples in this first part. The second half elaborates on some mathematically more complex problems, which are rather explained theoretically than practically: The main focus is EM (expectation maximization) on hidden Markov models, and though that topic is using some advanced mathematical notation, the presentation is still clear and followable. The book's rounded off with a few hints on what MapReduce cannot do.
If you want to understand the concepts first before you decide about an implementation, this is a good book for you.

56 reviews

Author��3 books208 followers

October 27, 2010

Great book on development algorithms for map/reduce-based solutions (Hadoop is mentioned, but not required to understand this book). This book describes tweaks of algorithms for map/reduce for different tasks - graph processing, machine learning, and common questions of map/reduce design, including performance optimization and related questions.
If you'll read this book, then you need to look onto Cloud9 library ()

favorites ir-dm-nlp-ml-search programming

Janno Teelem

50 reviews

August 9, 2015

Some really good examples, but also sidetracks (about half of the book in total) into relatively complex topics that are only loosely related to text processing.

Displaying 1 - 4 of 4 reviews