From: J Strother Moore Hi. This week I will talk about string searching. This talk will have nothing to do with theorem proving! A New (?) Variation of the Boyer-Moore Fast String Searching Algorithm J Moore Wednesday, May 23, 2007 ACES 3.116 Abstract: The Boyer-Moore fast string searching algorithm is an fast algorithm for finding the first occurrence of one string (`the pattern') inside another (`the text'). It preprocesses the pattern to build two tables that allow it to skip through the text in jumps that can be as long as the pattern itself. It tends to be more efficient when the pattern and text are over a large alphabet. It is widely used in computer virus detection, medical literature searching, and other applications where the material being searched is prohibitively large and unstructured. Preprocessing is linear in the length of the pattern. Worse case behavior was proved by Knuth (and eventually several others) to be linear in the length of the text. The algorithm is typically "sublinear" in that it does not even fetch every character from the text. It's most interesting characteristic is that it gets faster as patterns get longer. This, however, tends to reach an asymptote determined in part by the size of the alphabet. In this talk I will present two variations. The first combines the two tables of the original Boyer-Moore algorithm into one table. The algorithm is strictly better than Boyer-Moore. Like the original algorithm, the improved version tends to be less efficient on small alphabets. The second variation corrects this deficiency: even on small alphabets (e.g., size 4) the algorithm tends (longer than the other variation) to keep improving as the patterns get longer. I do not know yet whether this algorithm is new or of any interest to anybody else!