RPE Nathan Clement

Contact Name: 
Nathan Clement
May 16, 2013 11:00am - 12:30pm

Title:  Approximate and exact string matching algorithms with applications to DNA sequencing



DNA sequencing technologies have advanced significantly in the past decade, providing many different applications in biology and personalized medicine.  Their advantage is speed; the drawback is the short length of the DNA sequences and the frequency of errors.  Referenced DNA sequence mapping (the process of determining the optimal location for these short DNA strings on a longer reference genome) has seen many advances in the past several years.  The current best fast-matching algorithm uses suffix trees with a randomized walk through the possible search space to find the “optimal” match.

This research attempts to look at fast-matching short string algorithms, and evaluates the tradeoffs between accuracy and speed.  Obviously, an exact algorithm would have reduced speed, but how accurate is “good enough”?  Is it possible in the context of DNA sequences to achieve 100% accuracy?