- PASTA (Practical Alignment using SATé and TrAnsitivity)
is an improvement to SATé: it uses some of the
design of SATé but is faster, produces more accurate alignments
and trees, and can scale to much larger datasets.
PASTA computes alignments on very large
datasets using a divide-and-conquer
technique, as follows. It divides the dataset into smaller and evolutionary less diverged subsets,
gets alignments on those subsets, merges some pairs of these subset alignments to get a
set of overlapping and compatible alignments, and finally uses transitivity to
merge all these overlapping alignments and produce a final alignment. The novel transitivity-based merge
technique allows PASTA to be very scalable, but also improves its accuracy compared to
SATé, its predecessor technique.
Various options are available for downloading and installing PASTA.
If you have a Windows machine, VM is your only option. If you have Linux, you can
still use VM, but downloading the cod from github and installing it is a better option.
On MAC, you have three options: VM, installing from the code, and downloading .dmg file.
If you have MAC and you mostly use GUI,
then the MAC .dmg file is a good option (although sometimes it can be behind the latest code).
- PASTA code is available from github. The README file gives the
detailed installation instructions (which are pretty simple).
- The .dmg file for MAC application is available here (version 1.6.0).
- VM Image (mostly for Windows users) is available here for download.
Note that the VM image is 1.7 GB and can take a long time to download.
Once the image is downloaded, you need to run it using a VM environment.
If you don't have a virtual machine environment, VirtualBox is a good option.
It's free and easy to use.
Download VirtualBox and install it on your machine.
After you install VirtualBox, you just need to use File/import to import the Phylolab.ova image that you have downloaded.
When importing the VM image, you are given a set of options that you can tweak.
The VM image tries to allocate 1GB of RAM by default.
If your machine has 4GB or more of RAM, that default value should be fine.
If you have less than that, you might wish to reduce the memory to something like 512MB, but that could affect the maximum dataset size you can
analyze using PASTA. You can always modify this value later.
Once VM is imported, you can start it from the Virtualbox.
If you are asked to login, the username and passwords are (username: phylolab, password: phylolab).
PASTA is already installed on the VM machine, so you can simply proceed
by opening a terminal and running it using run_pasta.py.
A tutorial for PASTA is available here.
You can also consult the README file.
A presentation is also available.
Refer to the README file on the github repository.
On MAC, you can simply download the image file, run it, copy the PASTA application to a location of your preference, and simply run the application.
On Linux, to run PASTA using GUI run:
Basic command-line usage is:
python run_pasta.py -i input.fasta -t starting.tree --auto
S. Mirarab, N. Nguyen, and T. Warnow, 2014. "PASTA: ultra-large multiple sequence alignment". Proceedings of RECOMB 2014
Supplementary materials are available
The 10 small AA datasets are available here.
The HomFam datasets are available here.
The FastTree COG datasets are available here.
The Indelible 10K datasets are available here.
1000-taxon simulated datasets are available at SATé paper website.
The three 16S RNA biological datasets
(16S.3, 16S.T, and 16S.B.ALL)
can be found at
this page. However, we also used
thresholds other than 75% for these datasets. The reference biological datasets without edge contraction can be found here.
The RNASim dataset is obtained by creating random subsets of the
RNASim dataset created by S. Guo, L.-S. Wang, and J. Kim and described here.
True alignments and tree are given here for our random subsamples.
All questions and inquires should be addressed to our user email group: email@example.com