Suggested topics for the CS 394C final project
You have at least two options for your final project. One is to
do a project that involves software development, either to
produce a new method and study it, or to compare existing methods.
The other main option is to do a survey paper on some topic related
to phylogenetic analysis and multiple sequence alignment. Of these
two basic categories, the survey paper is much easier, but the
first category is much more fun! So if you want to do a project
that involves developing a new method or testing existing methods,
please come see me early in the semester. I will help you learn
how these studies are done, and we'll begin the planning for this
kind of work early.
TREE RECONSTRUCTION
-
Survey of Bayesian MCMC methods in phylogenetics for various kinds of data
-
Supertree methods
-
Supermatrix methods
-
Missing Data - how handled, and what is the impact
-
Fast evolving sites - detection, and then what?
-
Models of evolution which aren't identifiable
-
Computational complexity of maximum likelihood (theory)
-
Heuristics for ML - how they *actually* operate
SITE VARIATION
-
Heterotachy
-
Covarion model
-
Rates-across-sites (distributions
and debates)
-
Evidence (pro and con) regarding the molecular clock
-
Deviation from the molecular clock (how defined, and how
estimated)
ALIGNMENT
-
Pairwise alignment of genomic sequences (allowing rearrangement
events, duplications, etc.)
-
Space-efficient algorithms
-
Algorithms for (fixed tree) Tree Alignment - exact and approximate
-
Gap penalty models (linear, affine, other)
-
Models of evolution that incorporate indel events
-
Generalized Tree Alignment algorithms and heuristics
-
Techniques used for "cleaning up" the alignment
MISCELLANEOUS
-
Rooting trees
-
Software for simulation of evolution
-
Gene Tree/Species Tree reconciliation
-
Reticulation detection
-
Inference of reticulation
-
Models of speciation and "shape" effects
-
The non-parametric bootstrap applied to large datasets
-
The parametric bootstrap
-
Phylogeny reconstruction on whole genomes,
using gene order and content
-
Reconstruction of ancestral states
-
"Rogue taxa" identification
-
Calculating evolutionary distances under
conditions where the model formula cannot be
applied
General comments:
- You will need to wrie a
proposal for your project,
which will include the the literature you'll read. We'll
discuss this proposal and refine it, to help make
the proposal interesting to you.
- If you are having a hard time picking a topic,
try reading the overview chapter and see if anything strikes
your fancy.
- For the mathematically inclined among you,
you may find inspiration from
Mike Steel's webpage (warning: these tend to be very mathematical
papers).
-
Other sources of a lot of papers on phylogeny from the computer
science side
include
-
Bernard
Moret's webpage
(with an emphasis on whole genome phylogenetics),
-
Papers by Alexandros Stamatakis, which are largely
concerned with high performance computing for
maximum likelihood estimation. He is
the developer of the RAxML software, and
quite serious about the programming and algorithm
design issues.
Try
this page
and
this page.
Neither seems to be up to date, however.
-
Papers by Daniel Huson, available
here. Daniel's work covers a wide range,
including metagenomics, genome assembly,
visualization of trees, estimation of reticulations,
and more.
-
The grade for a survey article will depend in part on writing
quality. Please take this part of the work seriously. (I am glad
to give you feedback on your writing if you submit drafts ahead of the
deadline.)
Everyone should be aware of the ethical standards regarding
writing, and in particular the definition of "plagiarism."
The consequences for plagiarism (intentional or not) can
be severe.
The most common form of plagiarism I've encountered
among students is copying phrases from papers they are
summarizing, sometimes with the wording slightly changed.
If you wish to use someone else's wording, please do one of
two things: put the words in quotes, and give the proper
attribution, or head up the entire section by saying you
are paraphrasing someone else (and give the appropriate
reference), making it clear exactly which phrases
are paraphrased, and from where.
However, paraphrasing someone else means you are not
writing it yourself, and can come quite close to plagiarism.
Please be careful about this.
It is much better to just read the paper, understand it, and then
write it up without using any of the language in the paper.