Summer School on Big Data for Biology
Introduction to Phylogenomic Analysis
May 19-22, 2014
The summer school on
Introduction to Phylogenomic Analysis
a survey of phylogenomic pipelines, and then
in the use of new software for several steps in
Participants will bring their laptops and download software for each
tutorial in advance.
This summer school session will teach both basic and advanced material in multiple sequence
alignment, phylogenetic tree estimation, and phylogenetic network estimation.
Students will learn how to compute multiple sequence alignments and trees using PASTA, and to
use Phylonet (a software suite) to estimate phylogenetic trees and networks from multiple gene
trees. They will also learn how to compute trees from sequence datasets that contain a mixture
of full-length and fragmentary sequences, and to add sequences into a tree, using SEPP.
All students should be sure to download, install, and test all the software (PASTA, Phylonet,
and SEPP) well in advance of the course. Some of these programs may not run on Windows, or on
older machines with smaller amounts of memory. Please make sure to run the software on some
datasets (either datasets provided with the software, or your own). Report problems with
installing and running the software no later than Thursday, May 15.
PASTA and SEPP:
Instructions and software for PASTA and SEPP can be found at
Please go to the website and download the software before coming to class.
We will also provide a Virtual Machine with the software
installed for people who use Windows. The website also includes test
cases to check that the software has been properly installed and is
runnable on your machine. Please run the test cases after downloading
our tools. If you have problems running the test cases, contact Siavash
(firstname.lastname@example.org) or Nam (email@example.com) for
Please go to the
Phylonet website and download the
executable jar file. Make sure you have Java 1.6.0 or later installed on your system. Then you
should be able to launch PhyloNet from the command line. For more information, like
installation, usage, available commands or examples, please visit
https://wiki.rice.edu/confluence/pages/viewpage.action?pageId=8898533. Contact Yun Yu
(firstname.lastname@example.org) if you have trouble with installing and running the software.
Also: please send an email to the course instructors (Yun Yu, Siavash Mirarab, Tandy Warnow, and
Luay Nakhleh) with information about the laptop you will bring, the operating system (e.g., OS
X, version 10.8.5), and the amount of available memory. This will help us prepare for the
- Monday: overview (taught by Tandy Warnow)
- Tuesday: Large-scale multiple sequence alignment
See this page for downloadable software.
(new version of SATé that is faster,
more accurate on large datasets, and can analyze
much larger datasets - up to 200,000 sequences). Both
co-estimate of multiple sequence alignments and phylogenetic trees,
and have almost identical GUIs. Even if you are an experienced SATé
user, this will be a useful tutorial.
Taught by Siavash Mirarab (Texas).
See this page for the SATé tutorial, which we will extend to PASTA.
- Wednesday: Phylogenetic tree and network estimation
from multiple genes, using Phylonet.
is a software suite for estimating species trees
and networks under conditions where gene trees can differ
from the species tree due to incomplete lineage sorting
and hybridization. The tutorial on how to use Phylonet will
be taught by
Yun Yu (Rice).
the following webpages:
Phylogenetic placement and Q&A.
- Phylogenetic placement:
Many phylogenetic datasets are hard to align
because some of the sequences are fragmentary rather than
full-length genes. We will teach techniques for
estimating trees using fragmentary sequences, focusing
on the SEPP method.
page for downloadable software.
Instructor: Siavash Mirarab.
- Q&A with students.
Registration is closed at this time.
If you are bringing a Windows laptop, it is essential
that you contact the instructors before the tutorials.