<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pise PUBLIC "pise2.dtd" "pise2.dtd" >
<pise>
    <head>
      <title>BWA on XSEDE</title>
      <version>0.36</version>
      <description>Burrows-Wheeler Alignment Tool. (Make sure no white space exists in any of the alphabet input fields on this page. This does not apply to 'Enter your data below' field.)</description>
      <authors></authors>
      <reference></reference>
      <category>Assembly:Assemble_reads</category>
      <doclink>http://bio-bwa.sourceforge.net/</doclink>
      <doclink></doclink>
  </head>
	
	<command>bwa_xsede</command>

<!-- XML Author: Kisurb Choe                                               -->
<!-- Distributed under LGPLv2 Licence. Please refer to the COPYING.LIB document. -->

<parameters>
<!-- 

It looks like the basic functionality of BWA is to add a reference genome and sequence reads (query) to align to the reference.
The user can specify:
a single file with single end reads, 
a single file with both paired end reads shuffled
or two separate paired end read files

We designed our XMLs to look in a specific folder in our upload directory /mobyle_share/data.  This is just for convenience.
But we will find them in the target folder for that job in CIPRES.

Yes, typically the two paired read files from trimmo are specified as inputs for query File 1 and query File 2.
There can be one or two queries that are uploaded - typically it is both, 
but we we sometimes have only single end reads so have to build the command line to allow the user to upload one or two files.

Here is an example command line

bwa mem -M -t 16 TSWV_genome.fasta 1_S1_L001.1.paired.fq.gz 1_S1_L001.2.paired.fq.gz > 1_S1_L001.1_mem.sam

I attach the reference fasta database file and two paired end read files (zip archive) so you can try a test run.  The output is a .sam file.
* files_for_bwa.zip (89 MB): https://velocity.ncsu.edu/dl/TcdTlUO/302979
 
 -->
<!-- bwa mem -M -t 16 TSWV_genome.fasta 1_S1_L001.1.paired.fq.gz 1_S1_L001.2.paired.fq.gz > 1_S1_L001.1_mem.sam 
Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Algorithm options:

       -t INT        number of threads [16]
       -k INT        minimum seed length [19]
       -w INT        band width for banded alignment [100]
       -d INT        off-diagonal X-dropoff [100]
       -r FLOAT      look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
       -y INT        seed occurrence for the 3rd round seeding [20]
       -c INT        skip seeds with more than INT occurrences [500]
       -D FLOAT      drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
       -W INT        discard a chain if seeded bases shorter than INT [0]
       -m INT        perform at most INT rounds of mate rescues for each read [50]
       -S            skip mate rescue
       -P            skip pairing; mate rescue performed unless -S also in use

Scoring options:

       -A INT        score for a sequence match, which scales options -TdBOELU unless overridden [1]
       -B INT        penalty for a mismatch [4]
       -O INT[,INT]  gap open penalties for deletions and insertions [6,6]
       -E INT[,INT]  gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
       -L INT[,INT]  penalty for 5'- and 3'-end clipping [5,5]
       -U INT        penalty for an unpaired read pair [17]

       -x STR        read type. Setting -x changes multiple parameters unless overriden [null]
                     pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0  (PacBio reads to ref)
                     ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0  (Oxford Nanopore 2D-reads to ref)
                     intractg: -B9 -O16 -L5  (intra-species contigs to ref)

Input/output options:

       -p            smart pairing (ignoring in2.fq)
       -R STR        read group header line such as '@RG\tID:foo\tSM:bar' [null]
       -H STR/FILE   insert STR to header if it starts with @; or insert lines in FILE [null]
       -j            treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)

       -v INT        verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
       -T INT        minimum score to output [30]
       -h INT[,INT]  if there are <INT hits with score >80% of the max score, output all in XA [5,200]
       -a            output all alignments for SE or unpaired PE
       -C            append FASTA/FASTQ comment to SAM output
       -V            output the reference FASTA header in the XR tag
       -Y            use soft clipping for supplementary alignments
       -M            mark shorter split hits as secondary

       -I FLOAT[,FLOAT[,INT[,INT]]]
                     specify the mean, standard deviation (10% of the mean if absent), max
                     (4 sigma from the mean if absent) and min of the insert size distribution.
                     FR orientation only. [inferred]
-->

			<parameter isinput="1" type="InFile">
				<name>dbin</name>
				<attributes>
					<prompt>Database sequences (FASTA format)</prompt>
					<filenames>ref_genome.fasta</filenames>
				</attributes>
			</parameter>
			
			<parameter ishidden="1" type="String">
				<name>command_line_single</name>
				<attributes>
					<precond>
						<language>perl</language>
						<code>!defined $queryinpair</code>
					</precond>
					<format>
						<language>perl</language>
						<code>"&amp;&amp; bwa_expanse mem -M -t 16"</code>
					</format>
					<group>50</group>
				</attributes>
			</parameter>
			
			<parameter ishidden="1" type="String">
				<name>command_line_single2</name>
				<attributes>
					<precond>
						<language>perl</language>
						<code>!defined $queryinpair</code>
					</precond>
					<format>
						<language>perl</language>
						<code>"ref_genome.fasta query1.fq.gz > $SAM_NAME"</code>
					</format>
					<group>70</group>
				</attributes>
			</parameter>
			
			<parameter ishidden="1" type="String">
				<name>command_line_paired</name>
				<attributes>
					<precond>
						<language>perl</language>
						<code>defined $queryinpair</code>
					</precond>
					<format>
						<language>perl</language>
						<code>"&amp;&amp; bwa_expanse mem -M -t 16 "</code>
					</format>
					<group>50</group>
				</attributes>
			</parameter>
			
						
			<parameter ishidden="1" type="String">
				<name>command_line_paired2</name>
				<attributes>
					<precond>
						<language>perl</language>
						<code>defined $queryinpair</code>
					</precond>
					<format>
						<language>perl</language>
						<code>" ref_genome.fasta query1.fq.gz query2.fq.gz > $SAM_NAME"</code>
					</format>
					<group>70</group>
				</attributes>
			</parameter>
			
		<parameter ishidden="1" type="String">
			<name>bwa_scheduler</name>
				<attributes>
					<paramfile>scheduler.conf</paramfile>
					<format>
						<language>perl</language>
							<code>
									"threads_per_process=12\\n" .
									"node_exclusive=0\\n" .
									"mem=23G\\n" .
									"nodes=1\\n"
								</code>
					</format>
					<group>0</group>
				</attributes>
		</parameter>

<parameter type="Paragraph">
	<paragraph>
			<name>inputs</name>
			<prompt>Input Files</prompt>
		<parameters>

<!-- visible parameters -->	
<!-- Parameters with visible controls start here -->
		<parameter type="Float" issimple="1" ismandatory="1">
			<name>runtime</name>
			<attributes>
				<group>1</group>
				<paramfile>scheduler.conf</paramfile>
				<prompt>Maximum Hours to Run (click here for help setting this correctly)</prompt>
				<vdef>
					<value>0.25</value>
				</vdef>
				<comment>
<value>Estimate the maximum time your job will need to run. We recommend testimg initially with a &lt; 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. 
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs &gt; 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
run sooner than jobs configured for the full 168 hours. 
</value>
				</comment>
				<ctrls>
					<ctrl>
						<message>Maximum Hours to Run must be less than 168</message>
						<language>perl</language>
						<code>$runtime &gt; 168.0</code>
					</ctrl>
					<ctrl>
						<message>Maximum Hours to Run must be greater than 0.1 </message>
						<language>perl</language>
						<code>$runtime &lt; 0.1</code>
					</ctrl>
				</ctrls>
				<warns>
					<warn>
						<message>The job will run on 12 processors as configured. If it runs for the entire configured time, it will consume 12 x $runtime cpu hours</message>
						<language>perl</language>
						<code>$runtime ne 0 </code>
					</warn>
				</warns>
				<format>
					<language>perl</language>
					<code>"runhours=$value\\n"</code>
				</format>
			</attributes>
		</parameter>
			
		<parameter type="InFile">
				<name>queryin</name>
				<attributes>
					<prompt>Query File 1</prompt>
					<filenames>query1.fq.gz</filenames>
				</attributes>
		</parameter>
		
 		<parameter type="InFile">
			<name>queryinpair</name>
			<attributes>
				<prompt>Query File 2 (paired reads)</prompt>
				<filenames>query2.fq.gz</filenames>
			</attributes>
		</parameter> 
		
			<parameter type="String">
          		<name>SAM_NAME</name>
          		<attributes>
          			<prompt>Output File Name</prompt>
          			<vdef>
            			<value>aln.sam</value>
          			</vdef>
          		</attributes>
        	</parameter>
		 
		</parameters>	
	</paragraph>		
</parameter>
 
<parameter type="Paragraph">
	<paragraph>
			<name>index</name>
			<prompt>BWA indexing settings</prompt>
			<parameters>	
			
			<parameter ishidden="1" type="String">
					<name>buildindex</name>
					<attributes>			
						<format>
							<language>perl</language>
							<code>"bwa_expanse"</code>
						</format> 
						<group>10</group>
					</attributes>
				</parameter>

<!-- 			 <parameter type="Switch">
            		<name>Pvalue</name>
            		<attributes>
            			<prompt>Submitting a single interleaved fastq?</prompt>
						<precond>
							<language>perl</language>
							<code>!defined $queryinpair</code>
						</precond>
            			<format>
            				<language>perl</language>
            				<code>($value) ? "-p":""</code>
            			</format>
            			<vdef>
                			<value>0</value>
            			</vdef>
            			<group>11</group>
            		</attributes>
        		</parameter>  -->
        		
        		<parameter ismandatory="1" type="Excl">
					<name>indexalg</name>
					<attributes>
					<prompt>Indexing algorithm</prompt>
					<vlist>
							<value>is</value>
							<label>is</label>
							<value>bwtsw</value>
							<label>bwtsw</label>
					</vlist>
					<vdef>
						<value>is</value>
					</vdef>
					<format>
						<language>perl</language>
						<code>"index -a $value"</code>
					</format>
					<comment>
						<value>is - linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is the size of the database. IS is moderately fast, but does not work with database larger than 2GB.</value>
						<value>bwtsw - Algorithm implemented in BWT-SW. This method works with the whole human genome, but it does not work with database smaller than 10MB and it is usually slower than IS.</value>
					</comment>
					<group>12</group>
					</attributes>
				</parameter>
        		
        		<parameter type="Switch">
					<name>colorspace</name>
					<attributes>
						<prompt>Build color-space index?</prompt>
            			<format>
            				<language>perl</language>
            				<code>($value ? "-c":"")</code>
            			</format> 
						<vdef>
							<value>0</value>
						</vdef>
						<group>13</group>
					</attributes>
				</parameter>
				
			<parameter ishidden="1" type="String">
					<name>buildindex2</name>
					<attributes>			
						<format>
							<language>perl</language>
							<code>"ref_genome.fasta"</code>
						</format> 
						<group>14</group>
					</attributes>
				</parameter>			
			</parameters>
	</paragraph>
</parameter>
			
 <!-- Now for BWA mem options  --> 
 <parameter type="Paragraph">
 	<paragraph>
 		<name>memopts</name>
 		<prompt>BWA mem options</prompt>
 			<parameters>
			    <parameter type="Switch">
            		<name>Pvalue2</name>
            		<attributes>
            			<prompt>Submitting a single interleaved fastq?</prompt>
						<precond>
							<language>perl</language>
							<code>!defined $queryinpair</code>
						</precond>
            			<format>
            				<language>perl</language>
            				<code>($value) ? "-p":""</code>
            			</format>
            			<vdef>
                			<value>0</value>
            			</vdef>
            			<group>51</group>
            		</attributes>
        		</parameter>
 		
 		</parameters>	
 	</paragraph>
 </parameter>

		
		<parameter type="OutFile">
			<name>stdoutfile</name>
			<attributes>
				<filenames>std.out</filenames>
			</attributes>
		</parameter>
	
		<parameter type="Results">
			<name>all_results</name>
			<attributes>
				<filenames>*</filenames>
			</attributes>
		</parameter>
	</parameters>
</pise>