My research combines mathematics, computer science, probability, and statistics, in order to develop algorithms with improved accuracy for large-scale and complex estimation problems in phylogenomics and metagenomics. My major interests include multiple sequence alignment and phylogeny estimation (both gene trees and species trees) and metagenomic analysis, but I also work in Historical Linguistics. My current work aims to develop methods for ultra-large datasets (anywhere from 10,000 to 1,000,000 sequences), including datasets that are highly fragmentary and present other real world challenges. We use real data and perform massive simulations to evaluate the performance of methods that we develop, and also collaborate closely with biologists and linguists in data analysis. I will be moving to the University of Illinois at Urbana-Champaign in Fall 2014, and will be a Professor with a split position between Bioengineering and Computer Science, and a courtesy appointment in Mathematics.


Our current collaborations include the 1KP (Thousand Transcriptome Project) and the Avian Phylogenomics Project. These collaborations include data analysis and the development of new methods for estimating alignments and trees (both gene trees and species trees). We welcome collaborations with biologists who have data that are difficult to analyze, either because the datasets are too large for current methods, or because current methods fail to have sufficiently high accuracy.


My current research is funded by the National Science Foundation (DEB 0733029 and DBI-1062335). I also recently benefited from support of the John P. Simon Guggenheim Foundation, and early support from the David and Lucile Packard Foundation, the Radcliffe Institute for Advanced Study at Harvard University, and the Program for Evolutionary Dynamics at Harvard University.

