Seqboot

Seqboot Phylip 3.66 Bootstrap, Jackknife, or Permutation Resampling Felsenstein Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle. Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. Phylogeny / Alignment http://bioweb.pasteur.fr/docs/gensoft-evol.html#PHYLIP seqboot scheduler_input scheduler.conf perl


									"ChargeFactor=1.0\\n" .
									"nodes=1\\n" .
									"node_exclusive=0\\n" .
									"threads_per_process=1\\n"

runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 1.0 Estimate the maximum time your job will need to run (up to 72 hrs). Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter maximum run times are often scheduled sooner than longer jobs. Maximum Hours to Run must be between 0.1 - 72.0. perl $runtime < 0.1 || $runtime > 72.0 Please set a value for the runtime perl !defined $runtime perl "runhours=$value\\n" seqboot perl "seqboot < params" 0 infile Alignment File perl "ln -s infile.phylip infile; " -5 12 infile.phylip interleaved Input Sequences Interleaved perl ($value)? "": "I\\n$" 1 1 params data_type Data type (D) sequence 1 sequence Molecular sequences morph Discrete Morphology rest Restriction Sites freq Gene Frequencies sequence "" morph "D\\n" rest "D\\nD\\n" freq "D\\nD\\nD\\n" params method Resampling methods (J) perl "$value" bootstrap 1 bootstrap Bootstrap jackknife Delete-half jackknife permute Permute species for each character permute "J\\nJ\\n" bootstrap "" jackknife "J\\n" 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). 3. Permuting species within characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just a pair of sibling species). params seed Random number seed (must be odd) perl "$value\\n" 99 params replicates How many replicates (R) perl ($value && $value != $vdef)? "R\\n$value\\n" : "" 100 1 this server allows no more than 1000 replicates perl $replicates > 1000 params freq_opt Genes Frequencies options perl $data_type eq "freq" alleles All alleles present at each locus (default: no, one absent at each locus) (A) perl ($value)? "A\\n" : "" 0 1 perl $data_type eq "freq" params percentage what fraction of the characters The % option allows the user control over what fraction of the characters are sampled in the bootstrap and jackknife methods. Normally the bootstrap samples a number of times equal to the number of characters, and the jackknife samples half that number. This option permits you to specify a smaller fraction of characters to be sampled. Note that doing so is "statistically incorrect", but it is available here for whatever other purposes you may have in mind. Note that the fraction you will be asked to enter is the fraction of characters sampled, not the fraction left out. If you specify 100 as the fraction of sites retained and are using the jackknife, the data set will simply be rewritten. Note (as mentioned below) that this can be used together with the W (Weights) option to rewrite a data set while omitting a particular set of sites. perl (defined $value && $value < 100)? "%\\n$value\\n" : "" 1 params Bbootstrap Block Bootstrap The B option selects the Block Bootstrap. When you select option B the program will ask you to enter the block length. When the block length is 1, this means that we are doing regular bootstrapping rather than block-bootstrapping. perl "B\\n$value\\n" 1 params weights Weights The W (Weights) option allows weights to be read from a file whose default name is "weights". The weights follow the format described in the Phylip main documentation file. Weights can only be 0 or 1, and act to select the characters (or sites) that will be used in the resampling, the others being ignored and always omitted from the output data sets. If you use W together with the S (just weights) option, you write a file of weights (whose default name is "outweights"). In that file, any character whose original weight is 0 will have weight 0, the other weights varying according to the resampling. Note that if you write out data sets rather than weights (not using the S option), this output weights file is not written, as the characters are written different numbers of times in the data output file Note that with restriction sites, the weights are not used by some of the programs. Writing out files of weights will not be helpful with those programs. For the moment, with all gene frequencies programs the weights are also not used. perl ($value)? "W\\n$" : "" 1 0 params weights_file Weights Input File perl "" 1 perl $weights weights categories Categories The C (Categories) option can be used with molecular sequence programs to allow assignment of sites or amino acid positions to user-defined rate categories. The assignment of rates to sites is then made by reading a file whose default name is "categories". It should contain a string of digits 1 through 9. A new line or a blank can occur after any character in this string. Thus the categories file might look like this: 122231111122411155 1155333333444 The only use of the Categories information in Seqboot is that they are sampled along with the sites (or amino acid positions) and are written out onto a file whose default name is "outcategories", which has one set of categories information for each bootstrap or jackknife replicate. perl ($value)? "C\\n$" : "" 1 0 params categories_file Categories Input File perl "" 1 perl $categories categories multiple_weights Produce multiple weights file The S option is a particularly important one. It is used whether to produce multiple output files or multiple weights. If your data set is large, a file with (say) 1000 such data sets can be very large and may use up too much space on your system. If you choose the S option, the program will instead produce a weights file with multiple sets of weights. The default name of this file is "outweights". Except for some programs that cannot handle multiple sets of weights, PHYLIP programs have an M (multiple data sets) option that asks the user whether to use multiple data sets or multiple sets of weights. If the latter is selected when running those programs, they read one data set, but analyze it multiple times, each time reading a new set of weights. As both bootstrapping and jackknifing can be thought of as reweighting the characters, this accomplishes the same thing (the multiple weights option is not available for the various kinds of permutation). As the file with multiple sets of weights is much smaller than a file with multiple data sets, this can be an attractive way to save file space. When multiple sets of weights is chosen, they reflect the sampling as well as any set of weights that was read in, so that you can use Seqboot's W option as well. ©Copyright 1980-2007. University of Washington. perl ($value)? "S\\n$" : "" 1 0 params outweights perl "" 1 perl $multiple_weights outweights rest_opt Restriction enzymes options perl $data_type eq "rest" enzymes_nb Number of enzymes: not present in input file (E) perl (! $value)? "E\\n" : "" 1 1 perl $data_type eq "rest" params all_results * confirm perl "y\\n" 90 params terminal_type perl "0\\n" 0 params