Å·±¦ÓéÀÖ

Jump to ratings and reviews
Rate this book

Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences

Rate this book
Recent years have witnessed a dramatic increase of interest in sophisticated string matching problems, especially in information retrieval and computational biology. This book presents a practical approach to string matching problems, focusing on the algorithms and implementations that perform best in practice. It covers searching for simple, multiple and extended strings, as well as regular expressions, and exact and approximate searching. It includes all the most significant new developments in complex pattern searching. The clear explanations, step-by-step examples, algorithm pseudocode, and implementation efficiency maps will enable researchers, professionals and students in bioinformatics, computer science, and software engineering to choose the most appropriate algorithms for their applications.

232 pages, Paperback

First published May 27, 2002

3 people are currently reading
56 people want to read

About the author

Gonzalo Navarro

16Ìýbooks4Ìýfollowers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
4 (44%)
4 stars
4 (44%)
3 stars
1 (11%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Nick Black.
AuthorÌý2 books865 followers
December 5, 2007
There is no better collection of state-of-the-art multipattern matching algorithms, and certainly no better review of the quite recent bit-parallel approaches (such as the Shift-OR algorithm of Baeza-Yates). Old favorites such as [Advanced] Aho-Corasick (with a detailed construction of its DFA and δ-function) and Backward Oracle / Backward Nondeterministic Dawg are covered and related, and the regex chapter's development of the Glushkov and Thompson automata is not unacceptable. Matching with errors has come a long way since Navarro hit his pattern-matching peak in 2001 or so (FPMiS was published 2002, but largely culled from previous journal articles) -- anyone looking for fuzzy matching would do well to look into recent bioinformatics research. The Wu-Manber coverage is far more readable than that team's original literature (see Wu 1994 and 1996).

The fourth chapter is a gem for practitioners, introducing several low-cost partial solutions to multiple regex matching in a DFA or bit-parallel approach (these techniques will. admittedly. be trivially derived by any competent theorist).

More fleshing-out of time- and space-costs, especially for wider sets of more diverse inputs, would be a fine addition. Average-case analyses are presented unrigorously -- not merely sans proof, but without even contextual specification. The book is sorely in need of detail regarding the memory profiles of these algorithms, beyond the occasional, trite "state fits into cache". Especially when performing exact multimatch on irregularly-quantized data, and especially with modern encodings, the memory-intensive automata approaches provide a keenly elegant computational model, flexibility and generous assurances of correctness. At the same time, their naive implementations are incredible underperformers on today's deeply pipelined processors and horribly latent processor/DRAM subsystems -- making optimal use of (preferably on-die) SRAM cache, minimizing unpredictable branches and eliminating data dependency is of absolutely critical importance, and seperates the men from the boys. Parallelized approaches also deserve more than the scant attention they're here paid.

This book ought only get four stars, but there's not yet a superior authority regarding this fascinating, utterly exquisite branch of computer science, a deliciously pure intersection of theory and application. Perhaps I will one day write it [shrug].
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.