The Effect of the Guide Tree on Multiple Sequence Alignments
and Subsequent Phylogenetic Analyses
These are the auxiliary materials for our Pacific Symposium on Biocomputing 2008 paper.
Individual model condition graphs:
Program Commands
Please note that these are not necessarily optimal or even generally available ways of running these programs. However, it is how we ran these programs for our study.
Clustal
default: clustalw -align -infile=< raw sequence file > -outfile=< output file > -output=fasta -newtree=< file for default guide >
passing guide: clustalw -infile=< raw sequence file > -outfile=< output file > -output=fasta -usetree=< guide tree file name >
Muscle
default: muscle -in < raw sequence file > -fastaout < output file > -tree1 < file for default guide >
passing guide: muscle -in < raw sequence file > -fastaout < output file > -usetree_nowarn < guide tree file name >
ProbCons
default: probcons < raw sequence file > < output file>
passing guide: probcons < raw sequence file > -g < guide tree file name>
FTA
passing guide: fta -t < guide tree file name> -f < raw sequence file> -w /tmp/ -g < gapOpen> -G < gapExtend> -m < misMatch> -median DCA -out < output file>
MAFFT-linsi
default: mafft --localpair --maxiterate 1000 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --localpair --maxiterate 1000 < raw sequence file>
MAFFT-fftnsi
default: mafft --retree --maxiterate 2 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --retree 2 --maxiterate 2 < raw sequence file>
MAFFT-fftns2
default: mafft --retree 2 --maxiterate 0 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --retree 2 --maxiterate 0 < raw sequence file>
RAxML
raxmlHPC -m GTRGAMMA -w ./ -n < output file > -s < aligned sequence file in phylip format >
PAUP
upgma: upgma brlens=yes showtree=no treefile= < upgma tree file name >
midpoint: roottrees rootmethod=midpoint userbrlens=yes;
r8s Script
#NEXUS
begin rates;
simulate diversemodel=bdback seed=< random seed > ntaxa=< 25 or 100 > T=0 ;
describe tree=0 plot=tree_description;
end;
ROSE Script
An example ROSE script:
InputType = 4 // DNA
TheAlphabet = "ACGT"
TheFreq = [.25,.25,.25,.25]
TheInsFunc = < either [0.4613, 0.2527, 0.1545, 0.0896, 0.0419] or
[0.1028, 0.0899, 0.0792, 0.0702, 0.0627,
0.0565, 0.0514, 0.0470, 0.0433, 0.0400,
0.0369, 0.0341, 0.0314, 0.0289, 0.0266,
0.0245, 0.0225, 0.0206, 0.0188, 0.0171,
0.0155, 0.0141, 0.0127, 0.0114, 0.0100,
0.0087, 0.0075, 0.0063, 0.0052, 0.0042]>
TheDelFunc = < either [0.4613, 0.2527, 0.1545, 0.0896, 0.0419] or
[0.1028, 0.0899, 0.0792, 0.0702, 0.0627,
0.0565, 0.0514, 0.0470, 0.0433, 0.0400,
0.0369, 0.0341, 0.0314, 0.0289, 0.0266,
0.0245, 0.0225, 0.0206, 0.0188, 0.0171,
0.0155, 0.0141, 0.0127, 0.0114, 0.0100,
0.0087, 0.0075, 0.0063, 0.0052, 0.0042]>
TheDNAmodel = "HKY"
MeanSubstitution = < either 0.01 or 0.005 for 100 taxon datasets, or
0.004 or 0.008 for 25 taxon datasets>
TransitionBias = 2.0
TTratio = 0.0
TheInsertThreshold = one of {0.0001, 0.0005, 0.002500}
TheDeleteThreshold = one of {0.0001, 0.0005, 0.002500}
SequenceLen = 1000
TheTree = < model tree with 100 taxa in Newick format >
# the below three parameters control output of the dataset to
# either be leaf-constrained or over all nodes including internal nodes
ChooseFromLeaves = True
AlignmentWithAncestors = False
TreeWithAncestors = False
SequenceNum = 100
# the below parameter seeds the pseudorandom number generator used
# for ROSE dataset creation
SeedVal = < random number >
Serita Nelesen