GRACS Speaker Series-Kevin Liu/UT-Austin: "Rapid and Accurate Estimation of Large-scale Phylogenies and Sequence Alignments," TAY 3.128, Tuesday, May 4, 2010, 2:00 p.m.

Contact Name: 
Jenna Whitney
Date: 
May 4, 2010 2:00pm - 3:00pm

Type of Talk: GRACS Speaker Series

Speaker/Affiliation:
Kevin Liu/University of Texas at Austin

Date/Time: Tuesday, May 4,
2010, 2:00 p.m.

Location: TAY 3.128

Host: GRACS

Talk

Title:Rapid and Accurate Estimation of Large-scale Phylogenies and Sequence
Alignments

Talk Abstract:

Computational phylogenetics creates

and evaluates algorithms
that use present day biological sequence data
to estimate
evolutionary history. This history is represented both as
a
phylogeny and a multiple sequence alignment. Scientists from
b

iology, chemistry, and other fields use phylogenies and
alignments t

o address many different problems, including the
origin of life, epi

demiology, proteomics, and biomedical
research.

In the si

mplest case, a phylogeny is represented as a tree.
The tree''s leaves
represent present-day groups of organisms,
also known as taxa, and

the tree''s edges show how those taxa
are related. A multiple sequence
alignment shows
relationships among the sequences themselves by inser

ting
dashes ``-'''' , called indels, into sequences to line up

sequence letters. A pair of lined-up letters represents
homology, or

shared evolutionary ancestry, of the two
letters. Indels represent hi

storical insertion or deletion of
subsequences.

Most comput

ational phylogenetic studies proceed by aligning
sequences in the firs

t phase, and then estimating a tree
using the alignment in the second
phase. The main advantage
of these so-called two-phase methods is the

ir speed. However,
two-phase methods are also inaccurate under modera

te to high
rates of evolution.

To address these and other s

hortcomings, a new generation of
methods simultaneously estimate an a

lignment and tree from an
input of unaligned sequences. These methods

are either
prohibitively slow or have not been shown to be more accura

te
than the best two-phase methods in practice.

Due to expo

nential growth in sequence data and computing
power, biologists now p

erform phylogenetic studies that have
grown by orders of magnitude in

terms of number of taxa,
sequence length, and number of markers, as
compared with past
decades. Moreover, these datasets span greater ev

olutionary
timescales and involve more complex evolutionary events tha

n
ever before. As the ambitions of phylogenetic studies grow,
th

e state of the art of computational phylogenetic algorithms
must keep

up *both* in terms of scalability and accuracy.

To this end, I

present SATe, short for Simultaneous Alignment
and Tree Estimation. S

ATe is the first algorithm that can
accurately estimate phylogenies an

d alignments with thousands
of taxa and thousands of aligned sites. Th

is work is part of
my larger goal of creating algorithms for large-sca

le,
accurate estimation of evolutionary history under complex
ev

olutionary models.

Speaker Bio:

Kevin Liu is a Ph.D. student in
the Warnow lab in the
Department of Computer Science at the Universit

y of Texas at
Austin.