utcs Phylogenetics
Research · Publications ·
Software
· People · Datasets

SEPP

    SEPP is SATé-enabled phylogenetic placement, and so is a method for the following problem:

    • Input: tree T and alignment A for a set of full-length gene sequences, and set X of fragmentary sequences for the same gene
    • Output: placement of each fragment in X into the tree T, and alignment of each fragment in X to the alignment A.

    SEPP operates by using a divide-and-conquer strategy to improve the alignment produced by running HMMER (code by Sean Eddy). It then places each fragment into the user-provided tree using pplacer (code by Erick Matsen). Our study shows that SEPP provides improved accuracy for quickly evolving genes as compared to other methods.

    Developers:

    Tandy Warnow, Nam Nguyen, and Siavash Mirarab

    Publication

    S. Mirarab, N. Nguyen, and T. Warnow, SEPP: SATé-enabled phylogenetic placement, PSB 2012. (PDF).

    Software

    NOTE: Development and maintenance of SEPP has moved to this github repository. Please refer to the github repository for the latest version and installation instructions (README file). Alternatively, refer to this page for a VM image of an Ubuntu machine with pre-installed SEPP.

    This section details instructions for installing and running version 1.0 of SEPP. This is the version used in the submission of the SEPP paper. Current users should use the latest github version instead of the following instructions. Version 1.0 is provided in this page only to enable replication of the results reported in the paper.

    • Version 1.0 instructions:

    • We ran SEPP on a machine running Linux release 2.6.32-33-server. If you experience difficulty installing or running the software, please contact one of us (Tandy Warnow, Nam Nguyen, or Siavash Mirarab.

    • 0. Installing HMMER package.
      • Download the HMMER binaries from the HMMER website. Modify your PATH environment variable to include the directory of the HMMER binaries.

      1. Installing pplacer.
      • Download the pplacer binary from the pplacer webiste. Modify your PATH environment variable to include the location of the pplacer binary.

      2. Installing python packages.
      • The software packages listed below are Python source distributions. To use them, you must first have Python installed on your system; for details on obtaining and installing Python, please visit the Python home page. We used Python version 2.6.

        To uncompress and inflate each distribution file, run "tar -xzf <package>.tar.gz". To install each package, run "python setup.py install" from inside the uncompressed package directory; this step requires root access to the system.

        If you do not have root access, invoke the setup script as follows: "python setup.py install --prefix=/some/path/on/your/system", where "/some/path/on/your/system" is the path to a directory on your system to which you do have read and write access. If you use the "--prefix" option, you must ensure that the "lib/python2.x/site-packages" subdirectory (where "x" denotes the minor version number of your Python install) of the directory you specify following "--prefix=" is on Python's search path. To add a directory to Python's search path, modify your PYTHONPATH environment variable.

        More instructions on installing Python packages can be found on this Python page.

    • 3. Install the DendroPy 3 package.

    • 4. Install the NumPy package.

    • 5. Install the Biopython package.

    • 6. Install the SEPP package.
      • sepp-1.0.tar.gz

        After installing, add an environment variable MERGE_JAR with the absolute path to the the location of merge.jar. This jarfile is included in the gzip in the folder location sepp-1.0/sepp/tools/merge.jar.

        A README file for running SEPP can be found in sepp-1.0/.

    • 5. Running SEPP.
      • To run SEPP, invoke the "run_sepp.py" script from the "bin" subdirectory of the location in which you installed the Python packages. To see options for running the script, using the command
        "python <bin>/run_sepp.py -h"

        The general command for running sepp is:

        "python <bin>/run_sepp.py -t <tree_file> -a <alignment_file> -f <fragment_file> -r <raxml_info_file> -A <alignment_set_size> -P <placement_set_size> "

        SEPP can also be run using a configuration file. Sample configuration files and input files can be found in the folder location sepp-1.0/sample/. Change to that directory to run SEPP on the sample files. To run using command options, run

        "python <bin>/run_sepp.py -t test.tree -a test.fasta -f test.fas -r test.RAxML_info -A 250 -P 250"

        and to run using a configuration file, run

        "python <bin>/run_sepp.py -c sample.config"

        The output of SEPP is a .json file, created according to pplacer format. Please refer to here for more information on the format of the josn file. Also note that pplacer package provides a program called guppy that can read .json files and perform downstream steps such as visualization.

    Data

    • Empirical data
      • Download for the empirical data from our study are provided below. Uncompress files ending in ".tar.gz" using the command "tar -zxvf <file>". Within each resulting directory is a file named "README" which describes the contents of the download.

      • empirical.tar.gz

    • Simulated data
      • Downloads for the simulated data from our study are provided below. Uncompress files ending in ".tar.gz" using the command "tar -zxvf <file>". Within each resulting directory is a file named "README" which describes the contents of the download.

      • sims.tar.gz

    Contact

    • SEPP is under active research development at UTCS by the Warnow Lab (and especially with her PhD students Siavash Mirarab and Nam Nguyen). We welcome research collaborations. Please contact Tandy Warnow directly by email (not by phone, please).

Copyright 2009-2010 Computational Phylogenetics Lab | ACES 3.304 | University of Texas | Austin, TX 78712
Site help/questions/feedback/requests: e-mail Tandy Warnow