The CS 394C final project
You have at least two options for your final project. One is to
do a project that involves software development, either to
produce a new method and study it, or to compare existing methods.
The other main option is to do a survey paper on some topic related
to the research discussed in the course
(for example, some
aspect of phylogenetic analysis and multiple sequence alignment).
Of these
two basic categories, the survey paper is much easier, but the
first category is much more fun!
Research Project
If you want to do a project
that involves developing a new method or testing existing methods,
please come see me early in the semester. I will help you learn
how these studies are done, and we'll begin the planning for this
kind of work early.
There are journals that you can consider submitting
a research paper to, including
PLoS Currents, Tree of Life.
The advantage of PLoS Currents is that the requirements for
acceptance do not focus on perceived impact
(where reviewers can disagree substantially).
It is not required that you submit your research paper to a journal,
however!
Some examples of research projects that you might consider
doing:
- Evaluate the impact of encoding multi-state characters
as binary characters on phylogeny estimation. Use at least
two methods and a collection of datasets. (This is relevant
to understanding the impact of this technique on phylogeny
estimation in linguistics, where it is a common technique.)
- Evaluate the impact of correcting distances or
not correcting distances on phylogeny
estimation. Be sure to include datasets with different
rates of evolution.
- Evaluate an alignment-free estimation method compared to
a standard two-phase method.
- Compare the 2013 method of Bouchard-Cote and Jordan
(PNAS 110:1160-1166)
for co-estimating alignments and trees under a statistical model
of evolution that includes indels
to some other method that also co-estimates.
For example, consider comparing BAli-Phy,
which also does this co-estimation under a statistical model.
- Analyze a biological or linguistic dataset using new
methods, and compare to previous analysis.
This is especially interesting if the dataset was reasonably
large and the methods used were not so good -- a poor
alignment method, or a poor tree estimation method.
-
Evaluate the impact of alignment estimation error on
estimating a parameter of the model tree besides its
topology. Examples of these
parameters include branch lengths and the 4x4 GTR matrix.
You might also want to evaluate the impact of
alignment estimation error on
the detection of selection, ancestral sequence estimation,
and dates at internal nodes.
Some projects that would require more substantial
programming
include
-
Design a method for Maximum Weighted Quartet
Compatibility and use it to infer trees
from weighted quartet trees.
-
Re-implement
DCM1 (see paper #40 for the
theoretical description of the
algorithm and paper #44 for
a heuristic implementation and
its performance) and test it.
-
Test SuperFine for boosting
the
maximum likelihood supertree
method.
-
Test DACTAL as a heuristic for
maximum likelihood, by requiring
that the input contain an alignment
and then not re-aligning the sequences.
-
Implement a maximum likelihood
heuristic for the model
of linguistic evolution described in
paper #78, and test it
on linguistic data.
-
Test multiple sequence alignment methods
when the input data contains fragmentary
sequences.
-
Evaluate the impact of "missing data"
on "species tree" methods, i.e.,
methods that combine estimated
gene trees into a species tree.
Here the "missing data" occur when
not all of the the given gene trees contain
all the species.
Survey Paper
Writing a good survey paper is not trivial. You will need
to understand the papers you are reading and have some insights into
the different contributions made by different papers.
The quality of your writing is very important, and you should
think of this as something that you would be willing to submit to
a journal in the form that you submit it for a grade. That means,
among other things, no typos, no grammatical mistakes, a
proper bibliography (with full bibliographical information), and
thoughtful exposition.
Also hand in hardcopy of the main papers you reference.
Be careful, of course, not to include any text from any
other paper, unless you put quotes around it and properly attribute it.
When you write a survey paper, you need to specifically
identify the
question
you are interested in, and why
it is interesting and important.
You should explain controversies (if any),
the leading approaches,
and the evidence in favor or against each approach.
You need, as always, to really be critical - not necessarily
just accepting what the authors say, but pointing out
limitations of their approach.
Examples of possible topics for a survey paper include:
- Alignment-free tree estimation methods
- Methods for detecting horizontal gene transfer
- Methods for estimating species trees from gene trees
when gene trees can differ due to incomplete lineage sorting
- Methods for estimating species trees from gene trees
when gene trees can differ due to duplication and loss
- Models of evolution that are more complex than GTR,
and so allow
(for example) for dependencies between sites
- Techniques for dating ancestral nodes
- Techniques for inferring ancestral sequences
- Genome-scale multiple alignment methods (taking rearrangements into account)
- Genome rearrangement phylogeny (taking rearrangements into account)