CS 395T: Algorithms for Computational Biology

The course topic is algorithm design in computational molecular biology, but we will focus our study on two related topics: multiple sequence alignment and phylogeny (evolutionary history) reconstruction. These problems are intimately related because the inference of an evolutionary history generally involves first obtaining a multiple alignment of the sequences, and then reconstructing a tree on the aligned sequences (though a simultaneous approach can also be taken). Both these problems are individually enormously computationally intensive - computational approaches generally involve attempts to solve NP-hard optimization problems, and months of analysis can be used to estimate the phylogeny, without any guarantee of optimality. Thus, algorithmic development, firmly grounded in mathematical theory, is needed by the biological research community.

The goal of the course is to enable the students to do high quality research (both mathematical and algorithmic) for both multiple sequence alignment and phylogeny reconstruction. The mathematical foundations of phylogeny reconstruction are quite elegant and deep, and so most of our discussion will be focused there.

We will cover the statistical models used to describe evolution of molecular sequences, the primary optimization problems, and the mathematical theory needed to predict the performance of reconstruction methods under the models. The course will also present a large spectrum of algorithms (both deterministic with established theory, and heuristics) that have been developed for these problems. In addition, we will have several lectures from guest biologists with whom I collaborate.

No background in biology is assumed for this course. The mathematics used in developing algorithms for both phylogeny reconstruction and multiple sequence alignment combines combinatorics and graph theory (including the theory of chordal graphs), complexity theory, and probability and statistics. Students with good preparation in at least one of these areas will be able to begin doing algorithm development early on. However, abundant research opportunities exist for those students without significant mathematical training: simulation studies of existing methods is an important methodology within the field, and has resulted in many highly influential and important publications in major scientific journals. Therefore, students of all backgrounds (including from biology) are welcome.

The grading scheme is:

Final Project

The course requires a final project of each student: click here for a list of suggested topics. Your final project should be a paper of about 15 pages, in a format and style appropriate for submission to a journal. The grade on the final project will be 30% writing, 30% summary of the literature you discuss, and 40% commentary (i.e., insight, critical and thoughtful discussion of the issues that come up). The final project is due at the beginning of the final exam.

Final Exam

The final exam will cover all the material discussed in the course, and will be closed book. You are also responsible for four of the papers that were presented by students: the paper by Wayne Maddison (presented on March 19) (PDF), the paper on phylogenetic networks by Jin et al. (presented on March 25) (PDF), the paper by Nakhleh et al. (presented on April 8) (PDF), and the paper by Gusfield et al. (presented on April 24) (PPT). You are not responsible for the proofs but are responsible for knowing the results (but only the ones that appear in the talks - not everything).

Required reaading

Recommended (not required) reading

Additional resources

  • Bioinformatics: Sequence Analysis, a course taught by Luay Nakhleh at Rice University. His course website has slides from his course, a reading list, and a list of software tools (some for alignment and some for phylogeny reconstruction), all of which are likely to be helpful to you.
  • See Evolution 101, a website provided by UC Berkeley for teaching about evolution.
  • Lectures:



    Final project

    Papers on networks to read

    Click here for the "Who's Who of Phylogenetic Networks" and some of these papers.