utcs Phylogenetics
|
|
[All]
[SATé]
[SuperFine]
[SEPP]
[GRAPPA]
[Rec-I-DCM3]
[spruce]
[FastSP]
[DynaDup]
[restricted access]
|
FastSP
Overview
- FastSP is a Java program for computing alignment error
(SP-FN) quickly and using little memory.
Download
Version 1.3:
Binary (jar file): FastSP_1.3.jar
Code (java file): Available in this github repository.
Older versions are available here and a change log is available here.
Usage
java -jar FastSP_1.3.jar -r reference_alignment_file -e
estimated_alignment_file
FAQ
- Is FastSP sensetive to case? Should I expect different
results if I change the alignments from upper case to lower case or
vice versa?
No. FastSP is not sensitive to case. In fact, it
is not even sensitive to what characters you have in the alignment (and
it doesn't need to). FastSP just cares about whether a certain position
in
the alignment is a residue or a gap. So, lower case letters are
considered aligned as well as upper case case letters. Note that qscore is sensitive to case.
qscore treats lower case letters as not aligned.
- What do I do if I get a OutOFMemoryException?
By default
Java limits the memory available to programs. If you run out of memory,
try increasing the maximum memory available to jvm using the -Xmx
option. For example, to make 2GB available to jvm use:
java -Xmx2048m -jar FastSP.jar -r reference_alignment_file -e estimated_alignment_file
2GB has been more than enough on the largest alignments we have looked
at so far (with more than 1,000,000,000 cells.) However, increasing available memory, if you have more memory available, could make FastSP run faster.
- What is the output?
Run FastSP with a -h option to see the output format. The main output is:
- SP-Score: number of shared homologies (aligned pairs) / total number of homologies in the reference alignment.
- Modeler: number of shared homologies (aligned pairs) / total number of homologies in the estimated alignment.
- SP-FN: 1 - SP-Score
- SP-FP: 1- Modeler
- TC: number of correctly aligned columns / total number of aligned columns.
- Compression Factor: number of columns in the estimated alignment / number of columns in the reference alignment
But FastSP also ouputs:
- MaxLenNoGap: maximum number of non-gap characters
- NumSeq: Number of sequences
- LenRef: Length of reference alignment
- LenEst: Length of estimated aignment
- Cells: (LenEst+LenRef)*NumSeq
- Number of shared homologies
- Number of homologies in the reference alignment
- Number of homologies in the estimated alignment
- Number of correctly aligned columns
- Number of aligned columns in reference alignment
Publication
FastSP: Linear time calculation of alignment accuracy
by Siavash Mirarab and Tandy Warnow
Bioinformatics 2011; doi: 10.1093/bioinformatics/btr553
|
Copyright © 2009-2010 Computational
Phylogenetics Lab | ACES
3.304 | University of Texas |
Austin, TX 78712
Site help/questions/feedback/requests: e-mail Tandy Warnow
|