ProbCons

ProbCons 1.12 Probabilistic Consistency-based Multiple Alignment of Amino/Nucleic Acid Sequences C.Do, M.Brudno, S.Batzoglou Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment. Genome Research 15: 330-340. Nucleic Acid Sequence,Protein Sequence,Phylogeny / Alignment http://probcons.stanford.edu/ probcons scheduler_input scheduler.conf perl !$more_memory perl


									"ChargeFactor=1.0\\n" .
									"nodes=1\\n" .
									"mem=2G\\n" .
									"node_exclusive=0\\n" .
									"threads_per_process=1\\n"

scheduler_input2 scheduler.conf perl $more_memory && !$more_memory2 perl


									"ChargeFactor=1.0\\n" .
									"nodes=1\\n" .
									"mem=8G\\n" .
									"node_exclusive=0\\n" .
									"threads_per_process=1\\n"

scheduler_input3 scheduler.conf perl $more_memory2 perl


									"ChargeFactor=1.0\\n" .
									"nodes=1\\n" .
									"mem=64G\\n" .
									"node_exclusive=0\\n" .
									"threads_per_process=1\\n"

invocation_string perl "/expanse/projects/ngbt/opt/expanse/probcons/1.12/probcons" runtime 1 scheduler.conf Maximum Hours to Run (click here for help setting this correctly) 1.0 Estimate the maximum time your job will need to run (up to 168 hrs). Your job will be killed if it doesn't finish within the time you specify, however jobs with shorter maximum run times are often scheduled sooner than longer jobs. Maximum Hours to Run must be between 0.1 - 72.0. perl $runtime < 0.1 || $runtime > 168.0 Please set a value for the runtime perl !defined $runtime perl "runhours=$value\\n" This job will run on one core as configured. If it runs for the entire time, it will consume 1 x $runtime cpu hours perl !$more_memory This job will run on four cores as configured. If it runs for the entire time, it will consume 4 x $runtime cpu hours perl $more_memory infile Sequences File (FASTA format) input.fasta perl " input.fasta" 98 sequenceType Data Type aminoAcid aminoAcid Amino Acid 0 Amino acid inputs are processed by ProbCons version 1.12; nucleic acid inputs are processed by ProbConsRNA. outputFormat Output file format MFA Multi-FASTA clustal ClustalW MFA perl (defined $value && $value ne $vdef) ? " -clustalw" : "" 10 For detailed descriptions of the Multi-FASTA and ClustalW formats, please consult the ProbCons User Manual at http://probcons.stanford.edu/. more_memory I need more memory 0 outputMFAFile output.mfa perl " > output.mfa" perl defined $outputFormat && $outputFormat eq "MFA" 99 outputMFAPairwise *.fasta perl defined $outputFormat && $outputFormat eq "MFA" && $pairwise outputClustalWFile output.clustal perl " > output.clustal" perl defined $outputFormat && $outputFormat eq "clustal" 99 outputClustalWPairwise *.aln perl defined $outputFormat && $outputFormat eq "clustal" && $pairwise numConsistencyReps Number of passes of consistency transformation (--consistency) 0 0 1 1 2 2 3 3 4 4 5 5 2 perl (defined $value && $value ne $vdef) ? " -c $value" : "" 12 Each pass applies one round of the consistency transformation on the set of sequences. The consistency transformation is described in detail in the publication. In each round, the aligner computes the consistency transformation for each pair of sequences using all other sequences. The aligner then updates the posterior probability matrices of the pairwise alignments. numRefinementReps Number of passes of iterative refinement, up to 1000 (--iterative-refinement) perl (defined $value && $value > -1 && $value < 1001 && $value ne $vdef) ? " -ir $value" : "" 100 Values for "--iterative-refinement" must be between 0 and 1000, inclusive. perl $value < 0 || $value > 1000 13 This specifies the number of iterations of iterative refinement to be performed. In each stage of iterative refinement, the set of sequences in the alignment is randomly partitioned into two groups. After projecting the alignments to these groups, the two groups are realigned, resulting in an alignment whose objective score is guaranteed to be at least that of the original alignment. numPretrainingReps Number of rounds of pretraining, up to 20 (--pre-training) perl (defined $value && $value > -1 && $value < 21 && $value ne $vdef) ? " -pre $value" : "" 0 Values for "--pre-training" must be between 0 and 20, inclusive. perl $value < 0 || $value > 20 14 This specifies the number of rounds of EM to be applied on the set of sequences being aligned. This option is used in case the default parameters are not appropriate for the particular sequences being aligned; in general, this option is not recommended as it may lead to unstable alignment parameters. pairwise Generate only pairwise alignments? (-pairs) perl $value ? " -pairs" : "" 0 15 When this option is selected, PROBCONS generates all pairs pairwise maximum expected accuracy alignments using the posterior matrices without generating a full multiple alignment. The names of the files are based on the header comments for each of the sequences in the original input file with .fasta appended. When the clustalw output option is selected, then .aln is used as a suffix instead. viterbi Use Viterbi decoding (-viterbi) perl $value ? " -viterbi" : "" 0 perl $pairwise 16 Generates all-pairs pairwise alignments using the Viterbi algorithm. Note that this option requires the -pairs option to be enabled. This option is not recommended but is available for comparison to the maximum expected accuracy alignments. writeAnnotation Write annotation for multiple alignment (-annot) perl $value ? " -annot output.annotations" : "" 0 17 Turning on this option causes the program to write quality scores for columns in the produced alignment to a file called output.annotations. The quality score for each column of the alignment is given on a separate line and is an integer between 0 and 100 inclusive, representing the expected percentage of correct pairwise matches in the column. Columns containing only one non-gap character automatically have quality score 0. annotationResults output.annotations perl $writeAnnotation writeTraining Write EM transition probabilities (--train) perl $value ? " --train trained.params" : "" 0 18 This option is used to train the aligner using a set of sequences. The test sequences are read from the specified input file. This performs exactly one round of EM training on the sequences; multiple calls to PROBCONS are needed in order to obtain convergence. The training parameters are written to a file called trained.params as three lines: initMatchProb initInsertXProb initInsertYProb startInsertXProb startInsertYProb extendInsertXProb extendInsertYProb trainingResults trained.params perl $writeTraining paramsFile Trained ProbCons parameter file (--paramfile) input.params perl defined $value ? " --paramfile input.params" : "" 19 Reads initial/final and transition probabilities from a user-specified file. This file should specify the initial/final probabilities and transition probabilities for the HMM model used by the aligner. The HMM model consists of a Match state, an Insert X state, and an Insert Y state, and is described in more detail in the publication. The file format consists of three lines, containing: initMatchProb initInsertXProb initInsertYProb startInsertXProb startInsertYProb extendInsertXProb extendInsertYProb