FastTreeMP on XSEDE

FastTreeMP on XSEDE 2.1.10 Fast (Approximate) Maximum Likelihood tree construction - run on XSEDE M.N. Price, P.S. Dehal, A.P. Arkin Price, M.N., Dehal, P.S., and Arkin, A.P. (2009) FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix. Molecular Biology and Evolution 26:1641-1650, doi:10.1093/molbev/msp077 Price, M.N., Dehal, P.S., and Arkin, A.P. (2010) FastTree 2 -- Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3):e9490. doi:10.1371/journal.pone.0009490 Phylogeny / Alignment fasttree_xsede fasttree_trestles perl "" 0 number_nodes perl !$more_memory 2 scheduler.conf perl


									"nodes=1\\n" .
									"node_exclusive=0\n" .
									"request_mem=15G\\n" .
									"threads_per_process=4\\n"

number_nodes2 perl $more_memory 2 scheduler.conf perl


									"nodes=1\\n" .
									"node_exclusive=0\n" .
									"request_mem=60G\\n" .
									"threads_per_process=4\\n"

infile Input File (Fasta or interleaved PHYLIP format.) perl "infile" 99 infile all_results * runtime 1 Maximum Hours to Run (up to 168 hours) scheduler.conf 0.5 The maximum hours to run must be less than 168 perl $runtime > 168.0 The maximum hours to run must be greater than 0.05 perl $runtime < 0.05 perl "runhours=$value\\n" The job will run on 4 processors as configured. If it runs for the entire configured time, it will consume 4 x $runtime cpu hours perl $runtime ne 0 && !$more_memory The job will run on 16 processors as configured. If it runs for the entire configured time, it will consume 16 x $runtime cpu hours perl $runtime ne 0 && $more_memory Estimate the maximum time your job will need to run. We recommend testimg initially with a < 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. Once you are sure the configuration is correct, you then increase the time. The reason is that jobs > 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may run sooner than jobs configured for the full 168 hours. datatype 5 Please Specify your data type DNA Nucleotide PROTEIN Amino acid DNA "-nt" PROTEIN "" DNA This option will help the application determine how to run the job more_memory I need more memory This option will help when more memory is needed intree Starting Tree in Newick Format (-intree) 5 perl (defined $value) ? "-intree starting_tree": "" starting_tree -intree newick_file to set the starting tree. Tree must be in Newick format write_log Write intermediate trees to a log file (-log) 5 perl ($value) ? "-log logfile.txt":"" 1 Save intermediate trees so you can extract the trees and restart long-running jobs if they crash; -log also reports the per-site rates (1 means slowest category) write_quoted Quote sequence names in output (-quote) 5 perl ($value) ? "-quote":"" Quote sequence names in the output and allow spaces, commas, and colons in them but not ' characters (fasta files only)\ distances Distances Default: For protein sequences, log-corrected distances and an amino acid dissimilarity matrix derived from BLOSUM45 or for nucleotide sequences, Jukes-Cantor distances dist_choice Use non-default distances? usedefault usedefault Use Default rawdist Raw Distances matrix User Specified Matrix nomatrix No Matrix usedefault "" rawdist " -rawdist " matrix " -matrix user_matrix" nomatrix " -nomatrix " Use -rawdist to turn the log-correction off or to use %different instead of Jukes-Cantor my_matrix Substitution matrix file for (-matrix) perl $dist_choice == "matrix" user_matrix.distances Please select a substitution matrix file perl $dist_choice eq "matrix" && !defined $my_matrix transition_matrix Read a transition matrix (and stationary distribution) from a file (-trans) perl $datatype eq "PROTEIN" perl defined $value ? "-trans transmatrix.txt":"" transmatrix.txt pseudo Use pseudocounts to estimate distances between sequences with little or no overlap. (-pseudo weight) Use pseudocounts to estimate distances between sequences with little or no overlap. (Off by default.) Recommended if analyzing the alignment has sequences with little or no overlap. If the weight is not specified, it is 1.0 pseudo_value Weight value for pseudocounts perl $pseudo perl " -pseudo $value" 1.0 Please enter a weight for pseudocounts perl !defined $pseudo_value If the weight is not specified, it is 1.0 topology_refinement Topology Refinement By default, FastTree tries to improve the tree with up to 4*log2(N) rounds of minimum-evolution nearest-neighbor interchanges (NNI), where N is the number of unique sequences, 2 rounds of subtree-prune-regraft (SPR) moves (also min. evo.), and up to 2*log(N) rounds of maximum-likelihood NNIs. nni 40 perl !$noml Number of rounds of nearest-neighbor interchanges (-nni) perl (defined $value) ? " -nni $value" : "" 10 Please enter the number of nearest neighbor interchanges perl !defined $nni Use -nni to set the number of rounds of min. evo. minimum-evolution nearest-neighbor interchanges. spr 40 perl !$noml Rounds of subtree-prune-regraft (SPR) moves (-spr) perl " -spr $value" 2 Please enter the number of nearest neighbor interchanges (default = 2) perl !defined $spr By default, there are 2 rounds of subtree-prune-regraft (SPR) moves (also min. evo.) noml 40 Turn off both min-evo NNIs and SPRs (-noml) 0 perl ($value) ? " -noml" : "" Use -noml to turn off both min-evo NNIs and SPRs (useful if refining an approximately maximum-likelihood tree with further NNIs) sprlength 40 Maximum length of a SPR move (-sprlength) perl " -sprlength $value " 10 mlnni 40 Set the number of rounds of maximum-likelihood NNIs. (-mlnni) perl (defined $value) ? " -mlnni $value " : "" mlacc Number of rounds of optimization for NNIs (-mlacc) default default 2 2 3 3 default "" 2 " -mlacc 2 " 3 " -mlacc 3 " default Use -mlacc 2 or -mlacc 3 to always optimize all 5 branches at each NNI, and to optimize all 5 branches in 2 or 3 rounds. mllen Optimize branch lengths without ML NNIs. (-mllen) perl !defined $intree perl ($value) ? " -mllen " : "" Use -mllen to optimize branch lengths without ML NNIs mllen_fixedtopo Optimize branch lengths on a fixed topology (-mllen with a Newick tree) perl defined $intree && !$mllen perl ($value) ? " -mllen " : "" Use -mllen with -intree to optimize branch lengths on a tree of fixed topolgy slownni Turn off heuristics to avoid constant subtrees. (-slownni) Use -slownni to turn off heuristics to avoid constant subtrees (affects both ML and ME NNIs) perl ($value) ? " -slownni" : "" evolutionary_models Evolutionary Models protein_models 50 Substitution Model (AA) perl $datatype eq "PROTEIN" jtt JTT+CAT Model (Default) wag WAG+CAT Model jtt "" wag "-wag" jtt -wag - Whelan-And-Goldman 2001 model instead of (default) Jones-Taylor-Thorton 1992 model (a.a. only). -gtr - generalized time-reversible instead of (default) Jukes-Cantor (nt only); -cat # - specify the number of rate categories of sites (default 20). -nocat - no CAT model (just 1 category); -gamma - after the final round of optimizing branch lengths with the CAT model, report the likelihood under the discrete gamma model with the same number of categories. FastTree uses the same branch lengths but optimizes the gamma shape parameter and the scale of the lengths. The final tree will have rescaled lengths. Used with -log, this also generates per-site likelihoods for use with CONSEL, see GammaLogToPaup.pl and documentation on the FastTree web site. nucleotide_models 50 Substitution Model (NT) perl $datatype eq "DNA" jk Jukes-Cantor + CAT Model (Default) gtr Generalized Time-Reversable jk "" gtr "-gtr" jk cat 50 perl !$nocat The number of rate categories of sites. (-cat) 20 perl (defined $value) ? " -cat $value" : "" The number of rate categories of sites (default 20) nocat 50 No CAT model (just 1 category) (-nocat) 0 perl ($value) ? " -nocat" : "" optimize_gamma 50 After optimizing the tree under the CAT approximation, rescale the lengths to optimize the Gamma20 likelihood. (-gamma) perl ($optimize_gamma) ? " -gamma" : "" support_value_options Support value options By default, FastTree computes local support values by resampling the site likelihoods 1,000 times and the Shimodaira Hasegawa test. If you specify -nome,it will compute minimum-evolution bootstrap supports instead. In either case, the support values are proportions ranging from 0 to 1. Use -nosupport to turn off support values or -boot 100 to use just 100 resamples. Use -seed to initialize the random number generator nosupport Turn off support values. (-nosupport) perl ($nosupport) ? " -nosupport" : "" boot 60 Number of bootstraps for a Shimodaira-Hasegawa test. (Default 1000) (-boot) perl !$nosupport perl (defined $value)? " -boot $value " : "" nome Compute minimum-evolution bootstrap supports (-nome) Compute minimum-evolution bootstrap supports instead of performing the Shimodaira-Hasegawa test. perl ($value) ? " -nome " : "" searching_for_the_best_join Searching for the best join search_speed Search Speed (-slow) and (-fastest) default default slow -slow fastest -fastest default "" slow " -slow " fastest " -fastest " default -slow -- exhaustive search (like NJ or BIONJ, but different gap handling); takes half an hour instead of 8 seconds for 1,250 proteins; -fastest -- search the visible set (the top hit for each node) only; Unlike the original fast neighbor-joining, -fastest updates visible(C)after joining A and B if join(AB,C) is better than join(C,visible(C))-fastest also updates out-distances in a very lazy way, -fastest sets -2nd on as well, use -fastest -no2nd to avoid this top_hit_heuristics Top-hit Heuristics By default, FastTree combines the 'visible set' of fast neighbor-joining with local hill-climbing as in relaxed neighbor-joining notop Turn off top-hit list. (-notop) By default, FastTree uses a top-hit list to speed up search. Use -notop (or -slow) to turn this feature off and compare all leaves to each other, and all new joined nodes to each other perl ($value) ? " -notop " : "" topm Top-Hit list size, as a proportion of sqrt(N) (-topm) perl !$notop perl (defined $value) ? " -topm $value " : "" 1.0 Sorry, the value for -topm must be greater than 0 perl $topm <= 0 Set the top-hit list size to parameter*sqrt(N)FastTree estimates the top m hits of a leaf from the top 2*m hits of a 'close' neighbor, where close is defined as d(seed,close) < 0.75 * d(seed, hit of rank 2*m), and updates the top-hits as joins proceed close Enter a value to modify the close heuristic (default = 0.75) (-close) perl (defined $value) ? " -close $value " : "" 0.75 Sorry, the -close value must be greater than 0 and less than 1 perl $close <= 0 || $close >= 1 -close 0.75 -- modify the close heuristic, lower is more conservative. refresh Enter a value to modify the refresh value (default = 0.8) (-refresh) perl !$notop perl (defined $value) ? " -refresh $value" : "" 0.8 Sorry, the value for -refresh must be greater than 0 and less than 1 perl $refresh <= 0 || $refresh >= 1 -refresh 0.8 -- compare a joined node to all other nodes if its top-hit list is less than 80% of the desired length, or if the age of the top-hit list is log2(m) or greater. use_second Use 2nd-level top hits (-2nd) uncheck for (-no2nd) perl ($value) ? "-2nd":"-no2nd" 1 -2nd or -no2nd to turn 2nd-level top hits heuristic on or off. This reduces memory usage and running time but may lead to marginal reductions in tree quality. (By default, -fastest turns on -2nd.) join_options Join Options -nj: regular (unweighted) neighbor-joining (default). -bionj: weighted joins as in BIONJ. FastTree will also weight joins during NNIs bionj 90 Weighted joins as in BIONJ. FastTree will also weight joins during NNIs. (default is -nj) (-bionj) perl ($value) ? " -bionj " : "" constrained_topology Constrained topology search options constraints Select a split constraints alignment file(-constraints) -constraints alignmentfile -- an alignment with values of 0, 1, and -. Not all sequences need be present. A column of 0s and 1s defines a constrained split. Some constraints may be violated (see 'violating constraints:' in standard error). perl " -constraints constraints_file " constraints_file constraint_weight Constraint weight (-constraintWeight) perl (defined $constraints) perl "-constraintWeight $value " 100 Please enter a weight for the constraints file perl !defined $constraint_weight -constraintWeight -- how strongly to weight the constraints. A value of 1 means a penalty of 1 in tree length for violating a constraint. Default: 100.0