CS 394C: Algorithms for Computational Biology

The course topic is algorithm design in computational molecular biology. This year we will focus our study on genome assembly, multiple sequence alignment, phylogeny (evolutionary history) reconstruction, and metagenomics. These problems are individually enormously computationally intensive - computational approaches generally involve attempts to solve NP-hard optimization problems. Algorithmic development, firmly grounded in mathematical theory, is needed by the biological research community.

The goal of the course is to enable the students to do high quality research (both mathematical and algorithmic) for these problems. The mathematical foundations of phylogeny reconstruction are quite elegant and deep, and so most of our discussion will be focused there.

We will cover statistical models, primary optimization problems, and the mathematical theory needed to predict the performance of estimation methods under the models. The course will also present a large spectrum of algorithms (both deterministic with established theory, and heuristics) that have been developed for these problems. In addition, we will have several lectures from guest biologists with whom I collaborate.

No background in biology is assumed for this course. The mathematics used in developing algorithms for these problems combines combinatorics and graph theory (including the theory of chordal graphs), complexity theory, and probability and statistics. Students with good preparation in at least one of these areas will be able to begin doing algorithm development early on. However, abundant research opportunities exist for those students without significant mathematical training: simulation studies of existing methods is an important methodology within the field, and has resulted in many highly influential and important publications in major scientific journals. Therefore, students of all backgrounds (including from biology) are welcome.

Undergraduates registered for the course will be given assignments at a level different from that given to the graduate students. However, research opportunities exist for both undergraduate and graduate students.

The grading scheme is:

Final Project

The course requires a final project of each student. You are strongly encouraged to do a research project, but you can also do a survey paper on some topic relevant to the course material. In both cases, your project should be a paper (of about 15 pages) in a format and style appropriate for submission to a journal. Research projects can involve two students, but survey papers must be done by yourself.

Grades on the final project depend upon the kind of project you do. For a research paper, your grade will be 30% writing, 40% scientific/algorithmic rigor, and 30% impact. If you do a survey paper, the grade will be 30% writing, 30% summary of the literature you discuss, and 40% commentary (i.e., insight, critical and thoughtful discussion of the issues that come up).

Schedule:

Note: all too often, I have found that some students do not understand that copying text (with, perhaps, slight modifications) amounts to plagiarism, unless the copied text is put in quotes, and the original source correctly attributed. Please do not plagiarize, even unintentionally.

Click here for more about the final project.

Final Exam

The final exam is here. This will be a take home exam, and you are free to consult your notes. Please return this by email as a PDF by noon on Monday, May 7, and also deliver it in hardcopy to Laurie Alvarez in PAT 141.

Homework

The majority of the homework problems for the course can be found here; these are based upon the textbook, available here.

Readings

News articles

Additional resources

Lectures