GRACS Speaker Series-Kevin Liu/UT-Austin: "Rapid and Accurate Estimation of Large-scale Phylogenies and Sequence Alignments," TAY 3.128, Tuesday, May 4, 2010, 2:00 p.m.

Contact Name: 
Jenna Whitney
May 4, 2010 2:00pm - 3:00pm

Type of Talk: GRACS Speaker Series

Kevin Liu/University of Texas at Austin

Date/Time: Tuesday, May 4,
2010, 2:00 p.m.

Location: TAY 3.128



Title:Rapid and Accurate Estimation of Large-scale Phylogenies and Sequence

Talk Abstract:

Computational phylogenetics creates

and evaluates algorithms
that use present day biological sequence data
to estimate
evolutionary history. This history is represented both as
phylogeny and a multiple sequence alignment. Scientists from

iology, chemistry, and other fields use phylogenies and
alignments t

o address many different problems, including the
origin of life, epi

demiology, proteomics, and biomedical

In the si

mplest case, a phylogeny is represented as a tree.
The tree''s leaves
represent present-day groups of organisms,
also known as taxa, and

the tree''s edges show how those taxa
are related. A multiple sequence
alignment shows
relationships among the sequences themselves by inser

dashes ``-'''' , called indels, into sequences to line up

sequence letters. A pair of lined-up letters represents
homology, or

shared evolutionary ancestry, of the two
letters. Indels represent hi

storical insertion or deletion of

Most comput

ational phylogenetic studies proceed by aligning
sequences in the firs

t phase, and then estimating a tree
using the alignment in the second
phase. The main advantage
of these so-called two-phase methods is the

ir speed. However,
two-phase methods are also inaccurate under modera

te to high
rates of evolution.

To address these and other s

hortcomings, a new generation of
methods simultaneously estimate an a

lignment and tree from an
input of unaligned sequences. These methods

are either
prohibitively slow or have not been shown to be more accura

than the best two-phase methods in practice.

Due to expo

nential growth in sequence data and computing
power, biologists now p

erform phylogenetic studies that have
grown by orders of magnitude in

terms of number of taxa,
sequence length, and number of markers, as
compared with past
decades. Moreover, these datasets span greater ev

timescales and involve more complex evolutionary events tha

ever before. As the ambitions of phylogenetic studies grow,

e state of the art of computational phylogenetic algorithms
must keep

up *both* in terms of scalability and accuracy.

To this end, I

present SATe, short for Simultaneous Alignment
and Tree Estimation. S

ATe is the first algorithm that can
accurately estimate phylogenies an

d alignments with thousands
of taxa and thousands of aligned sites. Th

is work is part of
my larger goal of creating algorithms for large-sca

accurate estimation of evolutionary history under complex

olutionary models.

Speaker Bio:

Kevin Liu is a Ph.D. student in
the Warnow lab in the
Department of Computer Science at the Universit

y of Texas at