CS 394C: Algorithms for Computational Biology

The course topic is algorithm design in computational molecular biology. This year we will focus our study on multiple sequence alignment, phylogeny (evolutionary history) reconstruction, metagenomics, and historical linguistics. These problems are individually enormously computationally intensive - computational approaches generally involve attempts to solve NP-hard optimization problems, and months of analysis can be used to estimate the phylogeny, without any guarantee of optimality. Thus, algorithmic development, firmly grounded in mathematical theory, is needed by the biological research community.

The goal of the course is to enable the students to do high quality research (both mathematical and algorithmic) for these problems. The mathematical foundations of phylogeny reconstruction are quite elegant and deep, and so most of our discussion will be focused there.

We will cover the statistical models used to describe evolution of molecular sequences, the primary optimization problems, and the mathematical theory needed to predict the performance of reconstruction methods under the models. The course will also present a large spectrum of algorithms (both deterministic with established theory, and heuristics) that have been developed for these problems. In addition, we will have several lectures from guest biologists with whom I collaborate.

No background in biology is assumed for this course. The mathematics used in developing algorithms for both phylogeny reconstruction and multiple sequence alignment combines combinatorics and graph theory (including the theory of chordal graphs), complexity theory, and probability and statistics. Students with good preparation in at least one of these areas will be able to begin doing algorithm development early on. However, abundant research opportunities exist for those students without significant mathematical training: simulation studies of existing methods is an important methodology within the field, and has resulted in many highly influential and important publications in major scientific journals. Therefore, students of all backgrounds (including from biology) are welcome.

The grading scheme is:

Class presentations

Final Project

Final Exam

Homework

Readings

News articles

Additional resources