Suggested topics for the CS 395T final project
You have at least two options for your final project. One is to
do a project that involves software development, either to
produce a new method and study it, or to compare existing methods.
The other main option is to do a survey paper on some topic related
to phylogenetic analysis and multiple sequence alignment. Of these
two basic categories, the survey paper is much easier, but the
first category is much more fun! So if you want to do a project
that involves developing a new method or testing existing methods,
please come see me early in the semester. I will help you learn
how these studies are done, and we'll begin the planning for this
kind of work early.
TREE RECONSTRUCTION
-
Survey of Bayesian MCMC methods in phylogenetics for various kinds of data
-
Supertree methods
-
Supermatrix methods
-
Missing Data - how handled, and what is the impact
-
Fast evolving sites - detection, and then what?
-
Models of evolution which aren't identifiable
-
Computational complexity of maximum likelihood (theory)
-
Heuristics for ML - how they *actually* operate
SITE VARIATION
-
Heterotachy
-
Covarion model
-
Rates-across-sites (distributions
and debates)
-
Evidence (pro and con) regarding the molecular clock
-
Deviation from the molecular clock (how defined, and how
estimated)
ALIGNMENT
-
Pairwise alignment of genomic sequences (allowing rearrangement
events, duplications, etc.)
-
Space-efficient algorithms
-
Algorithms for (fixed tree) Tree Alignment - exact and approximate
-
Gap penalty models (linear, affine, other)
-
Models of evolution that incorporate indel events
-
Generalized Tree Alignment algorithms and heuristics
-
Techniques used for "cleaning up" the alignment
MISCELLANEOUS
-
Rooting trees
-
Software for simulation of evolution
-
Gene Tree/Species Tree reconciliation
-
Reticulation detection
-
Inference of reticulation
-
Models of speciation and "shape" effects
-
The non-parametric bootstrap applied to large datasets
-
The parametric bootstrap
-
Phylogeny reconstruction on whole genomes,
using gene order and content
-
Reconstruction of ancestral states
-
"Rogue taxa" identification
-
Calculating evolutionary distances under
conditions where the model formula cannot be
applied
General comments:
You will need to wrie a
proposal for your project,
which will include the the literature you'll read. We'll
discuss this proposal and refine it, to help make
the proposal interesting to you.
If you are having a hard time picking a topic,
try reading the overview chapter and see if anything strikes
your fancy.
For the mathematically inclined among you,
you may find inspiration from
Mike Steel's webpage (warning: these tend to be very mathematical
papers).
Another source of a lot of papers on phylogeny from the computer
science side (with an emphasis on whole genome phylogenetics)
is Bernard
Moret's webpage.
Everyone should be aware of the ethical standards regarding
writing, and in particular the definition of ``plagiarism."
The grade for a survey article will depend in part on writing
quality. Please take this part of the work seriously. (I am glad
to give you feedback on your writing if you submit drafts ahead of the
deadline.)