utcs Phylogenetics
Research · Publications ·
Software
· People · Datasets

FastSP

Overview

  • FastSP is a Java program for computing alignment error (SP-FN) quickly and using little memory.

Download

Version 1.6.0:
Binary (jar file): FastSP_1.6.0.jar
Code (java file): Available in this github repository.

Older versions are available here and a change log is available here.

Usage

java -jar FastSP_1.6.0.jar -r reference_alignment_file -e estimated_alignment_file

FAQ

  1. Is FastSP sensetive to case? Should I expect different results if I change the alignments from upper case to lower case or vice versa?
    No. By default, FastSP is not sensitive to case. In fact, it is not even sensitive to what characters you have in the alignment (and it doesn't need to). FastSP just cares about whether a certain position in the alignment is a residue or a gap. So, by default, lower case letters are considered aligned as well as upper case case letters. Note that qscore is sensitive to case. qscore treats lower case letters as not aligned.

  2. Since version 1.5.0, a new option (-ml) is added to FastSP that makes FastSP case-sensitive. With -ml, homologies in the estimated alignment where at least one of the two characters is in lower case are ignored (so those characters are considered not aligned). Thus if entire sites are in lower case, those sites are entirely are considered unaligned, and thus are excluded from the calculation. Also note that, individual characters in a site can also be lower case. In such cases, only homologies introduced by that character are excluded. So, if a site is a mix of lower and upper case, the upper case homologies are still included, but any homology where one of the two pairs is lower case is excluded. Since version 1.6.0, a similar option (-mlr) is added for removing lower case homologies from the reference alignment.
  3. What do I do if I get a OutOFMemoryException?
    By default Java limits the memory available to programs. If you run out of memory, try increasing the maximum memory available to jvm using the -Xmx option. For example, to make 2GB available to jvm use:

    java -Xmx2048m -jar FastSP.jar -r reference_alignment_file -e estimated_alignment_file

    2GB has been more than enough on the largest alignments we have looked at so far (with more than 1,000,000,000 cells.) However, increasing available memory, if you have more memory available, could make FastSP run faster.
  4. What is the output?
    Run FastSP with a -h option to see the output format. The main output is:
    • SP-Score: number of shared homologies (aligned pairs) / total number of homologies in the reference alignment.
    • Modeler: number of shared homologies (aligned pairs) / total number of homologies in the estimated alignment.
    • SP-FN: 1 - SP-Score
    • SP-FP: 1- Modeler
    • TC: number of correctly aligned columns / total number of aligned columns.
    • Compression Factor: number of columns in the estimated alignment / number of columns in the reference alignment
    But FastSP also ouputs:
    • MaxLenNoGap: maximum number of non-gap characters
    • NumSeq: Number of sequences
    • LenRef: Length of reference alignment
    • LenEst: Length of estimated aignment
    • Cells: (LenEst+LenRef)*NumSeq
    • Number of shared homologies
    • Number of homologies in the reference alignment
    • Number of homologies in the estimated alignment
    • Number of correctly aligned columns
    • Number of aligned columns in reference alignment

Publication

FastSP: Linear time calculation of alignment accuracy
by Siavash Mirarab and Tandy Warnow
Bioinformatics 2011; doi: 10.1093/bioinformatics/btr553
Copyright © 2009-2010 Computational Phylogenetics Lab | ACES 3.304 | University of Texas | Austin, TX 78712
Site help/questions/feedback/requests: e-mail Tandy Warnow