<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pise PUBLIC "pise2.dtd" "pise2.dtd" >
<pise>
	<head>
		<title>ExaBayes on XSEDE</title>
		<version>1.5.1</version>
		<description>Bayesian Evolutionary Analysis by Sampling Trees - run on XSEDE</description>
		<authors>Aberer, A. J., Kobert, K., and Stamatakis, A.</authors>
		<reference>Aberer, A. J., Kobert, K., and Stamatakis, A. (2014) ExaBayes: Massively Parallel Bayesian Tree Inference for the Whole-Genome Era. Molecular Biology and Evolution 31, 2553-2556</reference>
		<category>Phylogeny / Alignment</category>
	</head>
	
	<command>exabayes_xsede</command>
<!--
*****************************************************************************************************************************************
Created by Mark Miller for CSG, 6/22/2018 or thereabouts
*****************************************************************************************************************************************
Command Line Options

Command line options specify how ExaBayes will carry out the analyses. In contrast, the  config file (see below) specifies which kind of analyses will be executed.
5.1 Mandatory Arguments

•-f alignmentFile provides an alignment file. If this file is the binary output produced by parser (see Section 7.1), then no further arguments are required. If you provide a plain (un-processed) Phylip file, then either -m (single partition model) or -q (model file) are mandatory. 

•-m DNA | PROT specifies the data type used, when a Phylip-formatted alignment has been passed via -f. This way, the alignment is parsed as a single partition with either DNA or amino acid (PROT) data. 

•-q modelFile specifies a raxml-style partitioning/model scheme for the alignment. For this option, a Phylip-formatted alignment must be passed via -f. See Section 9 for a description of the file format. 

•-s seed provides a random seed. This number makes the run reproducible. The same seed, data set configuration file will result in the exact same result (apart from limitations given in Section 8.5). If you restart from a checkpoint file, this option will be ignored. 

•-n id provides a run id used for naming output files 

•-r runid restarts your run from a previous run id. If your previous ExaBayes-run did not finish (because of a manual abort or walltime restrictions), this option can be used for continuing the run. It is essential, that you pass the same configuration and alignment file. Some adaptions to the configuration file are possible (e.g., larger number of generations to be run, lower topological convergence threshold). Furthermore, all files that carry the previous runid in their name must be located in the current folder. 

Example: 
$ mpirun -np 8 ./exabayes -s $RANDOM -n myId -c myConfig -f myBinaryAlnFile.bin 
$ [runnig....] -> aborted!
$ mpirun -np 2  ./exabayes -r myId -n myIdContinued -c myConfig -f myBinaryAlnFile.bin -S


5.2 Optional Arguments
•-d carries out a dry-run. Only checks your config and alignment file and does not compute anything. Very recommendable, before submitting a large run to a cluster. 
•-T n Executes Yggdrasil with \(n\) threads. We recommend to use the multi-threaded version of yggdrasil only on systems, where no MPI installation is available. 
•-c configFile passes a configuration file that specifies how the MCMC will be carried out (see ./examples/configFile-all-options.nex and Section 6 for details) 
•-w workDir specifies a location for output files 
•-R num (exabayes-only) specifies the number of runs (i.e., independent chains) to be executed in parallel. Large runs should be carried out as separate runs, see Section 8 for further details. 
•-C num (exabayes-only) specifies the number of chains (i.e., coupled chains per independent run) to be executed in parallel. Employing this option may be less efficient in terms of runtime and memory than data-level parallelism, see Section 8 for further details. 
•-S try to save memory using the SEV-technique for gap columns on large gappy alignments Please refer to this paper. On very gappy alignments this option yields considerable runtime improvements. 
•-M mode specifies the memory versus runtime trade-off. <mode> is a value between 0 (fastest, highest memory consumption) and 3 (slowest, least memory consumption). See Section 8.3 for details. 
-->

<!-- there are postprocessing options. I tried to add these in. MMiller 6/24/2019 -->
<!-- cipres@comet-ln2:/projects/ps-ngbt/backend/comet_workspace>consense -h

consense computes various flavours of consensus trees from sets of trees.

Usage: ./consense -n id  -f file[..] [-t threshold] [-b burnin] 

        -n runid         an id for the output file
        -t thresh        a threshold for the consenus tree. Valid values:
                         values between 50 (majority rule) and 100 (strict) or MRE
                         (the greedily refined MR consensus).  Default: MRE
        -b relBurnin     proportion of trees to discard as burn-in (from start). Default: 0.25
        -f file[..]      one or more exabayes topology files
 -->
 <!-- move to expanse 4/20/2021 -->
 <!-- rules from wayne p 
Rules for running ExaBayes 1.5.1 on Expanse via the CIPRES gateway

All runs are on one standard node of Expanse.

- Ask the user for the following:

  the total number of patterns in the data set

- Specify the number of MPI processes, Slurm partition, and Slurm memory
  according to the following table.

                   MPI    Slurm
          Data    pro-   parti-   Slurm
      patterns  cesses     tion  memory

        <4,000      24   shared     46G
  4,000-24,999      48   shared     92G
      >=25,000     128  compute    243G

- For runs in the Slurm shared partition, include the following in the run script.

  #SBATCH -p shared
  #SBATCH -*qos=shared-cipres
  #SBATCH -N 1
  #SBATCH -*ntasks-per-node=<MPIprocesses>
  #SBATCH -*cpus-per-task=1
  #SBATCH -*mem=<memory>
  ...
  export CIPRES_NP=$SLURM_TASKS_PER_NODE
  export CIPRES_THREADSPP=$SLURM_CPUS_PER_TASK
  ...
  srun -n $CIPRES_NP -*mpi-pmi2 /expanse/projects/ngbt/home/cipres/ngbw/contrib/scripts/exabayes_1.5.1.clang_expanse ...
 
- For runs in the Slurm compute partition, replace the first two lines in the
  run script by the following.

  #SBATCH -p compute
  #SBATCH -*qos=cipres
 
 -->
<parameters>
<!--  submission on comet: the invocation line and any needed thread specification  -->
					<parameter ismandatory="1" ishidden="1" type="String">
						<name>exabayes_15</name>
						<attributes> 
							<format> 
								<language>perl</language>		
								<code>"<![CDATA[exabayes_1.5.1.clang_expanse]]>"</code>
							</format>  
							<group>0</group>
						</attributes>
					</parameter>									
								
<!--  configuration of scheduler.conf depending on nu_partitions -->					

<!--  patterns    <4,000      24   shared     46G0  -->	
								
				<parameter type="String" ishidden="1" >
					<name>number_threads1</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
						<precond>
							<language>perl</language>
							<code>$num_patterns &lt; 4000</code>
						</precond>
						<format>
							<language>perl</language>
							<code>
									"jobtype=mpi\\n" .
									"mpi_processes=24\\n" .
									"mem=46G\\n" .
									"node_exclusive=0\\n" .
									"nodes=1\\n"
							</code>
						</format>
					</attributes>
				</parameter>
				
<!--  4,000-24,999      48   shared     92G	  -->	
								
				<parameter type="String" ishidden="1" >
					<name>number_threads2</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
						<precond>
							<language>perl</language>
							<code>$num_patterns &gt;  3999 &amp;&amp; $num_patterns &lt; 25000  </code>
						</precond>
						<format>
							<language>perl</language>
							<code>
									"jobtype=mpi\\n" .
									"mpi_processes=48\\n" .
									"mem=92G\\n" .									
									"node_exclusive=0\\n" .
									"nodes=1\\n"
							</code>
						</format>
					</attributes>
				</parameter>
				
<!--  	      >=25,000     128  compute    243G -->

								
				<parameter type="String" ishidden="1" >
					<name>number_threads3</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
						<precond>
							<language>perl</language>
							<code>$num_patterns &gt; 24999 </code>
						</precond>
						<format>
							<language>perl</language>
							<code>
									"jobtype=mpi\\n" .
									"mpi_processes=128\\n" .
									"mem=243G\\n" .
									"node_exclusive=1\\n" .
									"nodes=1\\n"
							</code>
						</format>
					</attributes>
				</parameter>

<!-- end number of nodes 
				<parameter issimple="1" ismandatory="1" type="Integer">
					<name>num_procs</name>
					<attributes>
					<prompt>Enter the number of processes you need (required)</prompt>
					<format>
						<language>perl</language>
						<code>"mpirun -np $num_procs /projects/ps-ngbt/opt/comet/exabayes/exabayes-1.5/exabayes"</code>
					</format>
					<group>1</group>
					<ctrls>
						<ctrl>
							<message>Please enter the number of mpi processes</message>
							<language>perl</language>
							<code>!defined $num_procs</code>
						</ctrl>
						<ctrl>
							<message>The number of mpi processes must be less than or equal to 24</message>
							<language>perl</language>
							<code>$num_nodes == 1 &amp;&amp; $num_procs &gt; 24 </code>
						</ctrl>
						<ctrl>
							<message>The number of mpi processes must be less than or equal to 48</message>
							<language>perl</language>
							<code>$num_nodes == 2 &amp;&amp; $num_procs &gt; 48 </code>
						</ctrl>
					</ctrls>

					</attributes>
				</parameter>  -->	

 <!-- exabayes 1.5-->
 <!-- configuration here when worked out. -->		

<!-- input file specification -->
<!-- the input file to be operated on ends the command line -->
		<parameter issimple="1" ismandatory="1" isinput="1" type="Sequence">
			<name>infile</name>
			<attributes>
				<prompt>Input File (must be in relaxed Phylip format)</prompt>
				<format>
					<language>perl</language>
					<code>"-f infile.phy"</code>
				</format>
				<group>5</group>
<!-- this file designator seems to come at the end of the command string, so we set if for 99 currently -->
				<filenames>infile.phy</filenames>
			</attributes>
		</parameter>
	
<!-- Results section. To start we return all results files available-->
		<parameter ishidden="1" type="Results">
			<name>all_results</name>
			<attributes>
				<filenames>*</filenames>
			</attributes>
		</parameter>
		
<!-- This section provides visible queries that help configure the interface  -->

<!-- this sets the run time -->
				<parameter type="Float" issimple="1" ismandatory="1">
					<name>runtime</name> 
					<attributes>
						<group>1</group>
						<paramfile>scheduler.conf</paramfile>
						<prompt>Maximum Hours to Run (up to 168 hours)</prompt>
						<vdef>
							<value>0.5</value>
						</vdef>
						<ctrls>
							<ctrl>
								<message>The maximum hours to run must be less than 168</message>
								<language>perl</language>
								<code>$runtime &gt; 168.0</code>
							</ctrl>
							<ctrl>
								<message>The maximum hours to run must be greater than 0.05</message>
								<language>perl</language>
								<code>$runtime &lt; 0.05</code>
							</ctrl>
						</ctrls>
						<format>
							<language>perl</language>
							<code>"runhours=$value\\n"</code>
						</format>
						<warns>
							<warn>
								<message>The job will run on 24 processors as configured. If it runs for the entire configured time, it will consume 24 X $runtime cpu hours</message>
								<language>perl</language>
								<code>$num_patterns &lt; 4000</code>
							</warn>
							<warn>
								<message>The job will run on 48 processors as configured. If it runs for the entire configured time, it will consume 48 X $runtime cpu hours</message>
								<language>perl</language>
								<code>$num_patterns &lt; 25000 &amp;&amp; $num_patterns &gt; 3999  </code>
							</warn>
							<warn>
								<message>The job will run on 128 processors as configured. If it runs for the entire configured time, it will consume 128 X $runtime cpu hours</message>
								<language>perl</language>
								<code>$num_patterns &gt; 24999  </code>
							</warn>
						</warns>						
						<comment>
<value>Estimate the maximum time your job will need to run. We recommend testing initially with a &lt; 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. 
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs &gt; 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
	run sooner than jobs configured for the full 168 hours. 
</value>
						</comment>
					</attributes>
				</parameter>
				
<!-- •-m DNA | PROT specifies the data type used, when a Phylip-formatted alignment has been passed via -f. This way, the alignment is parsed as a single partition with either DNA or amino acid (PROT) data.  -->				
				<parameter issimple="1"  type="Excl">
					<name>datatype</name>
					<attributes>
						<prompt>Specify the datatype (required if there is no partitioning)</prompt>
						<precond>
							<language>perl</language>
							<code>!$is_partitioned</code>
						</precond>
						<vlist>
							<value>DNA</value>
							<label>DNA</label>
							<value>PROT</value>
							<label>Protein</label>
						</vlist>
						<format>
							<language>perl</language>
							<code>"-m $value"</code>
						</format>
						<ctrls>
							<ctrl>
								<message>Please select a datatype</message>
								<language>perl</language>
								<code>!defined $datatype &amp;&amp; !$is_partitioned </code>
							</ctrl>
						</ctrls>
					</attributes>
				</parameter>

				<parameter issimple="1" type="InFile">
					<name>config_file</name>
					<attributes>
						<group>12</group>
						<prompt>Select configuration file (optional)</prompt>
						<format>
							<language>pise</language>
							<code>defined $config_file ? "-c config.nex":""</code>
						</format>
						<filenames>config.nex</filenames>
						<comment>
							<value>This option will help the application determine how to run the job</value>
						</comment>
					</attributes>
				</parameter> 		

<!-- Determine whether the data set is partitioned, this will determine how the job is run. -->
					<parameter issimple="1" ismandatory="1" type="Switch">
					<name>is_partitioned</name>
					<attributes>
						<prompt>My data set is partitioned</prompt>
						<vdef>
							<value>0</value>
						</vdef>

					</attributes>
				</parameter> 
				
<!-- user must specify how many patterns they have -->
				<parameter issimple="1" ismandatory="1" type="Integer">
					<name>num_patterns</name>
					<attributes>
						<prompt>How many patterns does your data have?</prompt>
						<ctrls>
							<ctrl>
								<message>Please enter an integer greater than 1 for the number of patterns.</message>
								<language>perl</language>
								<code>!defined $num_patterns</code>
							</ctrl>
						</ctrls>
						<comment>
							<value>This option will help the application determine how to run the job</value>
						</comment>
					</attributes>
				</parameter> 
			
<!-- user must specify how many partitions they have 
				<parameter issimple="1" ismandatory="1" type="Integer">
					<name>nu_partitions</name>
					<attributes>
						<prompt>How many partitions does your data have?</prompt>
						<precond>
							<language>perl</language>
							<code>$is_partitioned</code>
						</precond>
						<ctrls>
							<ctrl>
								<message>Please enter an integer greater than 1 for the number of partitions. If you have just one partition, please uncheck the box that says "My data set is partitioned"</message>
								<language>perl</language>
								<code>$is_partitioned &amp;&amp; $nu_partitions &lt; 2 </code>
							</ctrl>
						</ctrls>
						<comment>
							<value>This option will help the application determine how to run the job</value>
						</comment>
					</attributes>
				</parameter> -->
				
<!-- •-q modelFile specifies a raxml-style partitioning/model scheme for the alignment. For this option, a Phylip-formatted alignment must be passed via -f. See Section 9 for a description of the file format.  -->
				<parameter issimple="1" type="InFile">
					<name>model_file</name>
					<attributes>
						<prompt>Select a partition file for the alignment</prompt>
						<group>7</group>
						<format>
							<language>perl</language>
							<code>(defined $value) ? "-q part.txt":""</code>
						</format>
						<filenames>part.txt</filenames>
						<ctrls>
							<ctrl>
								<message>Please select a partition file</message>
								<language>perl</language>
								<code>$is_partitioned &amp;&amp; !defined $model_file </code>
							</ctrl>
						</ctrls>
					</attributes>
				</parameter>

<!-- The user is permitted to specify a seed 
		<parameter issimple="1" type="Switch">
			<name>spec_seed</name>
			<attributes>
				<prompt>Specify a seed for this run (by default a random seed is used)</prompt>
				<vdef>
					<value>0</value>				
				</vdef>
				<comment>
<value>This option provides a random seed. This number makes the run reproducible. The same seed, data set configuration file will result in the exact same result (apart from limitations given in Section 8.5). If you restart from a checkpoint file, this option will be ignored. .</value>
				</comment>
			</attributes>
		</parameter> -->

		<parameter issimple="1" ishidden="0" type="Integer">
			<name>seed_val</name>
			<attributes>
<!--   				<precond>
					<language>perl</language>
					<code>$spec_seed</code>
				</precond> 		  -->
				<prompt>Enter a seed value here</prompt>	
				<format>
					<language>perl</language>
					<code> "-s $value"</code>
				</format>
				<group>6</group>
				<ctrls>
					<ctrl>
						<message>Please specify a seed value, this is required</message>
						<language>perl</language>
						<code>!defined $seed_val</code>
					</ctrl>
				</ctrls>
			</attributes>
		</parameter>
<!-- I heard variations of this question before, so I guess I have failed at
documenting properly what the difference between -R X -C Y and
numRuns/numCoupledChains is.

In the config file you specify how many coupled chains or independent
runs you want to execute. The command line options in comparison
indicate how many of each type will be executed *in parallel*.

Example:
numRuns            4
numCoupledChains   4
(=> 16 chains will be run in total)

invoked with
mpirun -np 64 exabayes -R 4 -C 2  ...

=> the 64 processes are divided into 4 x 2 = 8 groups (with 8 processes each)
* group 1 computes two chains 2 coupled chains of the first independent run
* group 2 computes the other two coupled chains of the first independent run
* group 3 computes two chains 2 coupled chains of the second independent run
* group 4 computes the other two coupled chains of the second independent run
...

In other words with -R x -C y you can calibrate the parallelization
scheme.

Since independent runs are (well) independent, -R <numRuns> is almost
always a good idea w.r.t. runtime, however, you increase the total
amount of memory needed.

For small datasets increasing values of -C help to ensure that every
process has enough work load (and thus reduces parallel overhead).  -->
		
		<parameter issimple="1" ishidden="0" type="Integer">
			<name>num_indanalyses</name>
			<attributes>
				<prompt>Number of independent analyses</prompt>
				<format>
					<language>perl</language>
					<code> "-R $value"</code>
				</format>
				<vdef>
					<value>2</value>
				</vdef>
				<group>6</group>
			</attributes>
		</parameter>
		
<!-- We should also allow the user to specify -R, -C, and -M. I think that the default 
values for these are 1, 1, and 0. Normally -R should be the same as numruns in config.nex; 
if that file is not provided, then -R must be 1. 
In addition, C cannot be greater than numCoupledChains in config.nex. -->
  		<parameter issimple="1" ishidden="0" type="Integer">
			<name>num_coupledchains</name>
			<attributes>
				<prompt>Number of chains to be run in parallel (-C)</prompt>
				<format>
					<language>perl</language>
					<code> "-C $value"</code>
				</format>
				<warns>
					<warn>
						<message>Advisory only: The value for C cannot be larger than numCoupledChains</message>
						<language>perl</language>
						<code>defined $value &amp;&amp; defined $config_file</code>
					</warn>
				</warns>
				<group>6</group>
			</attributes>
		</parameter>
		
<!-- •-n id provides a run id used for naming output files   -->		
		<parameter issimple="1" type="String">
			<name>output_name</name>
			<attributes>
				<prompt>Provide a name for output files (-n)</prompt>
				<format>
					<language>perl</language>
					<code>(defined $value ) ? "-n $value":""</code>
				</format>
				<group>12</group>
				<ctrls>
					<ctrl>
						<message>Please enter a name for the output files</message>
						<language>perl</language>
						<code>!defined $output_name</code>
					</ctrl>
				</ctrls>
			</attributes>
		</parameter>
		
<!-- •-M mode specifies the memory versus runtime trade-off. <mode> is a value between 0 (fastest, highest memory consumption) and 
3 (slowest, least memory consumption). See Section 8.3 for details.  -->
		<parameter issimple="1" type="Excl">
			<name>mode_type</name>
			<attributes>
				<prompt>Select the memory mode</prompt>
				<vlist>
					<value>0</value>
					<label>0</label>
					<value>1</value>
					<label>1</label>
					<value>2</value>
					<label>2</label>
					<value>3</value>
					<label>3</label>
				</vlist>
				<format>
					<language>perl</language>
					<code>(defined $value ) ? "-M $value":""</code>
				</format>
				<group>13</group>
			</attributes>
		</parameter>

<!-- consense computes various flavours of consensus trees from sets of trees.

Usage: ./consense -n id  -f file[..] [-t threshold] [-b burnin] 

        -n runid        an id for the output file
        -t thresh       a threshold for the consenus tree. Valid values:
                        values between 50 (majority rule) and 100 (strict) or MRE
                        (the greedily refined MR consensus).  Default: MRE
        -b relBurnin    proportion of trees to discard as burn-in (from start). Default: 0.25
        -f file[..]     one or more exabayes topology files. -->

  	<parameter issimple="1"  type="Switch">
		<name>run_consense</name>
		<attributes>
			<prompt>Use consense to compute consensus trees from sets of trees</prompt>
			<format>
				<language>perl</language>
				<code>($value) ? "&amp;&amp; exabayes_consense_wrapper_1.5.0":""</code>
			</format>
			<group>80</group>
		</attributes>
	</parameter>  

	<parameter issimple="1" type="Switch">
		<name>use_MRE</name>
		<attributes>
			<prompt>Use MRE for the threshold</prompt>
<!--  			<precond>
				<language>perl</language>
				<code>$use_MRE</code>
			</precond> -->
			<format>
				<language>perl</language>
				<code>($value) ? "-b MRE ":""</code>
			</format>
			<group>81</group>
		</attributes>
	</parameter> 
	
	<parameter issimple="1" type="Integer">
		<name>specify_threshold</name>
		<attributes>
			<prompt>Specify a threshold for the consensus tree</prompt>
			<precond>
				<language>perl</language>
				<code>$run_consense &amp;&amp; !$use_MRE</code>
			</precond>
			<format>
				<language>perl</language>
				<code>($value) ? "-b $value ":""</code>
			</format>
  		<ctrls>
				<ctrl>
					<message>Please choose a consensus threshold value less than 100</message>
					<language>perl</language>
					<code>$specify_threshold &gt; 100</code>
				</ctrl>
	  			<ctrl>
					<message>Please choose a consensus threshold value  greater than or equal to 50</message>
					<language>perl</language>
					<code>$specify_threshold &lt; 50</code>
				</ctrl>
				<ctrl>
					<message>Please choose MRE or specify a Threshold</message>
					<language>perl</language>
					<code>!defined $specify_threshold &amp;&amp; !$use_MRE</code>
				</ctrl> 
			</ctrls>
			<group>81</group>
		</attributes>
		</parameter> 
	
 	<parameter issimple="1"  type="Float">
		<name>specify_burnin</name>
		<attributes>
			<prompt>Specify burnin value for consensus trees</prompt>
			<precond>
				<language>perl</language>
				<code>$run_consense</code>
			</precond>
			<format>
				<language>perl</language>
				<code>(defined $value) ? "-b $value":"" </code>
			</format>
			<vdef>
				<value>0.25</value>
			</vdef>
			<group>82</group>
		</attributes>
	</parameter>  
	
	 	<parameter ishidden="1" type="String">
<!-- this adds the commands to specify the run and the file name -->
		<name>specify_nf</name>
		<attributes>
			<precond>
				<language>perl</language>
				<code>$run_consense</code>
			</precond>
			<format>
				<language>perl</language>
				<code>"-f *topologies.* -n $output_name"</code>
			</format>
			<group>82</group>
		</attributes>
	</parameter>  
		
</parameters>
</pise>


