UTCS Colloquium: Alexandros Stamatakis Ecole Polytechnique Federale de Lausanne/School of Comp & Communication Sciences Faster Algorithms for Support Value Computation & Emerging Parallel Architectures for Phylogeny Reconstruction TAY 3.128 (East

Contact Name: 
Jenna Whitney
Date: 
Apr 17, 2007 2:00pm - 3:30pm

There is a signup schedule for this event.

Speak

er: Alexandros Stamatakis

Affiliation: Ecole Polytechnique Federal

e de Lausanne/School of Comp & Communication Sciences

Date/Time: 2:

00 p.m. - 3:30 p.m.

Location: TAY 3.128 - East wall (chalkboard)

Host: Tandy Warnow

Talk Title: Faster Algorithms for Support

Value Computation & Emerging
Parallel Architectures fo

r Phylogeny Reconstruction

Talk Abstract:
Despite the impressive

progress that has been achieved with the new
generation of Maximum Likel

ihood (ML) search algorithms the computation
of support values based o

n non-parametric bootstrapping (BS) still represents
a major computation

al challenge.

Initially I will discuss why the Randomized Estimated
Log Likelihood (RELL)
method is probably very hard to apply to large re

al-world datasets. Thereafter
I will present new heuristics to acceler

ate the BS procedure in RAxML
(Randomized Axelerated Maximum Likelihood)

. In comparison to the standard
BS procedure these heuristics yield run
time improvements between factor 7
on datasets with 500 sequences up to
factor 14 on 1 700 sequences. At the
same time the support values obta

ined by the new BS heuristics show
correlation coefficients ranging bet

ween 0.94 and 0.96 compared to those
obtained via the standard method.

In absolute numbers this means that 100
bootstrap replicates on single-g

ene datasets up to 2 000 taxa can be
conducted within less than 24 hours
on a single - reasonably fast - processor.

In the second part of my
talk I will outline how the computation of large
multi-gene datasets wi

th ML can efficiently be parallelized on hardware
platforms with very di

stinct architectures such as the IBM Cell and the IBM
BlueGene. The para

llelization on BlueGene scales well up to 512 processors
on the largest

dataset analyzed under ML to date which consists of 270
sequences and

500 000 base pairs.

I will conclude with an overview of current work
on related projects.

Related papers (PDF) and software (open source
code for Mac/Linux)
available at: icwww.epfl.ch/%7Estamata