Name: Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences
Rating: 4.33 (1 reviews)
ISBN: 9780521039932

Nick Black

Author��2 books865 followers

December 5, 2007

There is no better collection of state-of-the-art multipattern matching algorithms, and certainly no better review of the quite recent bit-parallel approaches (such as the Shift-OR algorithm of Baeza-Yates). Old favorites such as [Advanced] Aho-Corasick (with a detailed construction of its DFA and δ-function) and Backward Oracle / Backward Nondeterministic Dawg are covered and related, and the regex chapter's development of the Glushkov and Thompson automata is not unacceptable. Matching with errors has come a long way since Navarro hit his pattern-matching peak in 2001 or so (FPMiS was published 2002, but largely culled from previous journal articles) -- anyone looking for fuzzy matching would do well to look into recent bioinformatics research. The Wu-Manber coverage is far more readable than that team's original literature (see Wu 1994 and 1996).

The fourth chapter is a gem for practitioners, introducing several low-cost partial solutions to multiple regex matching in a DFA or bit-parallel approach (these techniques will. admittedly. be trivially derived by any competent theorist).

More fleshing-out of time- and space-costs, especially for wider sets of more diverse inputs, would be a fine addition. Average-case analyses are presented unrigorously -- not merely sans proof, but without even contextual specification. The book is sorely in need of detail regarding the memory profiles of these algorithms, beyond the occasional, trite "state fits into cache". Especially when performing exact multimatch on irregularly-quantized data, and especially with modern encodings, the memory-intensive automata approaches provide a keenly elegant computational model, flexibility and generous assurances of correctness. At the same time, their naive implementations are incredible underperformers on today's deeply pipelined processors and horribly latent processor/DRAM subsystems -- making optimal use of (preferably on-die) SRAM cache, minimizing unpredictable branches and eliminating data dependency is of absolutely critical importance, and seperates the men from the boys. Parallelized approaches also deserve more than the scant attention they're here paid.

This book ought only get four stars, but there's not yet a superior authority regarding this fascinating, utterly exquisite branch of computer science, a deliciously pure intersection of theory and application. Perhaps I will one day write it [shrug].

ŷ��

Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences

Gonzalo Navarro, Mathieu Raffinot

About the author

Gonzalo Navarro

Ratings & Reviews

Friends & Following

Community Reviews

Join the discussion

Can't find what you're looking for?

ŷ������