The Effect of the Guide Tree on Multiple Sequence Alignments
and Subsequent Phylogenetic Analyses

These are the auxiliary materials for our Pacific Symposium on Biocomputing 2008 paper.

Individual model condition graphs:

Graphs Guide Tree Error
Graphs Alignment SP-Error
Graphs Tree Estimation Error

Program Commands

Please note that these are not necessarily optimal or even generally available ways of running these programs. However, it is how we ran these programs for our study.

Clustal

default: clustalw -align -infile=< raw sequence file > -outfile=< output file > -output=fasta -newtree=< file for default guide >
passing guide: clustalw -infile=< raw sequence file > -outfile=< output file > -output=fasta -usetree=< guide tree file name >

Muscle

default: muscle -in < raw sequence file > -fastaout < output file > -tree1 < file for default guide >
passing guide: muscle -in < raw sequence file > -fastaout < output file > -usetree_nowarn < guide tree file name >

ProbCons

default: probcons < raw sequence file > < output file>
passing guide: probcons < raw sequence file > -g < guide tree file name>

FTA

passing guide: fta -t < guide tree file name> -f < raw sequence file> -w /tmp/ -g < gapOpen> -G < gapExtend> -m < misMatch> -median DCA -out < output file>

MAFFT-linsi

default: mafft --localpair --maxiterate 1000 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --localpair --maxiterate 1000 < raw sequence file>

MAFFT-fftnsi

default: mafft --retree --maxiterate 2 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --retree 2 --maxiterate 2 < raw sequence file>

MAFFT-fftns2

default: mafft --retree 2 --maxiterate 0 < raw sequence file>
passing guide: mafft --topin < guide tree file name> --retree 2 --maxiterate 0 < raw sequence file>

RAxML

raxmlHPC -m GTRGAMMA -w ./ -n < output file > -s < aligned sequence file in phylip format >

PAUP

upgma: upgma brlens=yes showtree=no treefile= < upgma tree file name >
midpoint: roottrees rootmethod=midpoint userbrlens=yes;

r8s Script

#NEXUS
begin rates;
simulate diversemodel=bdback seed=< random seed > ntaxa=< 25 or 100 > T=0 ;
describe tree=0 plot=tree_description;
end;

ROSE Script

An example ROSE script:

InputType = 4 // DNA
TheAlphabet = "ACGT"
TheFreq = [.25,.25,.25,.25]

TheInsFunc = < either [0.4613, 0.2527, 0.1545, 0.0896, 0.0419] or
[0.1028, 0.0899, 0.0792, 0.0702, 0.0627,
0.0565, 0.0514, 0.0470, 0.0433, 0.0400,
0.0369, 0.0341, 0.0314, 0.0289, 0.0266,
0.0245, 0.0225, 0.0206, 0.0188, 0.0171,
0.0155, 0.0141, 0.0127, 0.0114, 0.0100,
0.0087, 0.0075, 0.0063, 0.0052, 0.0042]>
TheDelFunc = < either [0.4613, 0.2527, 0.1545, 0.0896, 0.0419] or
[0.1028, 0.0899, 0.0792, 0.0702, 0.0627,
0.0565, 0.0514, 0.0470, 0.0433, 0.0400,
0.0369, 0.0341, 0.0314, 0.0289, 0.0266,
0.0245, 0.0225, 0.0206, 0.0188, 0.0171,
0.0155, 0.0141, 0.0127, 0.0114, 0.0100,
0.0087, 0.0075, 0.0063, 0.0052, 0.0042]>

TheDNAmodel = "HKY"
MeanSubstitution = < either 0.01 or 0.005 for 100 taxon datasets, or
0.004 or 0.008 for 25 taxon datasets>
TransitionBias = 2.0
TTratio = 0.0

TheInsertThreshold = one of {0.0001, 0.0005, 0.002500}
TheDeleteThreshold = one of {0.0001, 0.0005, 0.002500}
SequenceLen = 1000
TheTree = < model tree with 100 taxa in Newick format >
# the below three parameters control output of the dataset to
# either be leaf-constrained or over all nodes including internal nodes
ChooseFromLeaves = True
AlignmentWithAncestors = False
TreeWithAncestors = False
SequenceNum = 100
# the below parameter seeds the pseudorandom number generator used
# for ROSE dataset creation
SeedVal = < random number >

Serita Nelesen