- FastSP is a Java program for computing alignment error
(SP-FN) quickly and using little memory.
Binary (jar file): FastSP_1.6.0.jar
Code (java file): Available in this github repository.
Older versions are available here and a change log is available here.
Datasets are available here.
java -jar FastSP_1.6.0.jar -r reference_alignment_file -e
- Is FastSP sensetive to case? Should I expect different
results if I change the alignments from upper case to lower case or
No. By default, FastSP is not sensitive to case. In fact, it
is not even sensitive to what characters you have in the alignment (and
it doesn't need to). FastSP just cares about whether a certain position
the alignment is a residue or a gap. So, by default, lower case letters are
considered aligned as well as upper case case letters. Note that qscore is sensitive to case.
qscore treats lower case letters as not aligned.
Since version 1.5.0, a new option (-ml) is added to FastSP that makes FastSP case-sensitive. With -ml,
homologies in the estimated alignment where at least one of the two characters is in lower case are ignored (so those characters are considered not aligned).
Thus if entire sites are in lower case, those sites are entirely are considered unaligned, and thus are excluded from the calculation.
Also note that, individual characters in a site can also be lower case. In such cases, only homologies introduced by that character
are excluded. So, if a site is a mix of lower and upper case, the upper case homologies are still included, but any homology where one of the
two pairs is lower case is excluded.
Since version 1.6.0, a similar option (-mlr) is added for removing lower case homologies from the reference alignment.
- What do I do if I get a OutOFMemoryException?
Java limits the memory available to programs. If you run out of memory,
try increasing the maximum memory available to jvm using the -Xmx
option. For example, to make 2GB available to jvm use:
java -Xmx2048m -jar FastSP.jar -r reference_alignment_file -e estimated_alignment_file
2GB has been more than enough on the largest alignments we have looked
at so far (with more than 1,000,000,000 cells.) However, increasing available memory, if you have more memory available, could make FastSP run faster.
- What is the output?
Run FastSP with a -h option to see the output format. The main output is:
But FastSP also ouputs:
- SP-Score: number of shared homologies (aligned pairs) / total number of homologies in the reference alignment.
- Modeler: number of shared homologies (aligned pairs) / total number of homologies in the estimated alignment.
- SP-FN: 1 - SP-Score
- SP-FP: 1- Modeler
- TC: number of correctly aligned columns / total number of aligned columns.
- Compression Factor: number of columns in the estimated alignment / number of columns in the reference alignment
- MaxLenNoGap: maximum number of non-gap characters
- NumSeq: Number of sequences
- LenRef: Length of reference alignment
- LenEst: Length of estimated aignment
- Cells: (LenEst+LenRef)*NumSeq
- Number of shared homologies
- Number of homologies in the reference alignment
- Number of homologies in the estimated alignment
- Number of correctly aligned columns
- Number of aligned columns in reference alignment
FastSP: Linear time calculation of alignment accuracy
by Siavash Mirarab and Tandy Warnow
Bioinformatics 2011; doi: 10.1093/bioinformatics/btr553