From: J Strother Moore
Hi. This week I will talk about string searching.
This talk will have nothing to do with theorem
proving!
A New (?) Variation of the Boyer-Moore Fast String Searching Algorithm
J Moore
Wednesday, May 23, 2007
ACES 3.116
Abstract: The Boyer-Moore fast string searching
algorithm is an fast algorithm for finding the
first occurrence of one string (`the pattern')
inside another (`the text'). It preprocesses the
pattern to build two tables that allow it to skip
through the text in jumps that can be as long as
the pattern itself. It tends to be more efficient
when the pattern and text are over a large
alphabet. It is widely used in computer virus
detection, medical literature searching, and other
applications where the material being searched is
prohibitively large and unstructured.
Preprocessing is linear in the length of the
pattern. Worse case behavior was proved by Knuth
(and eventually several others) to be linear in
the length of the text. The algorithm is
typically "sublinear" in that it does not even
fetch every character from the text. It's most
interesting characteristic is that it gets faster
as patterns get longer. This, however, tends to
reach an asymptote determined in part by the size
of the alphabet.
In this talk I will present two variations. The
first combines the two tables of the original
Boyer-Moore algorithm into one table. The
algorithm is strictly better than Boyer-Moore.
Like the original algorithm, the improved version
tends to be less efficient on small alphabets.
The second variation corrects this deficiency:
even on small alphabets (e.g., size 4) the
algorithm tends (longer than the other variation)
to keep improving as the patterns get longer.
I do not know yet whether this algorithm is new or
of any interest to anybody else!