<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pise PUBLIC "pise2.dtd" "pise2.dtd" >
<pise>
	<head>
		<title>ParallelStructure on XSEDE</title>
		<version>2.3.4</version>
		<description>A program to investigate population structure using multi-locus genotype data</description>
		<authors>Pritchard J.K., Stephens M., and Donnelly P.</authors>
		<reference> Pritchard J.K., Stephens M., and Donnelly P. (2000)Inference of population structure using multilocus genotype data. Genetics. 2000 Jun;155(2):945-59.</reference>
		<reference> Melissa J. Hubisz, Daniel Falush,Matthew Stephens, and Jonathon K. Pritchard (2009) Inferring weak population structure with the assistance of sample group information
		Molecular Ecology Resources (2009) 9, 1322–1332.</reference>
		<reference>Francois Besnier and Kevin A. Glover (2013) ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers PLoS One 8(7): e70651.</reference>
		<category>Population Genetics</category>
	</head>
	
	<command>parallelstructure_xsede</command>
	
<!--
*****************************************************************************************************************************************
Created by Mark Miller for CSG, 10_6_2016 or thereabouts.
*****************************************************************************************************************************************
7  Running structure from the command line
There are a number of program parameters that are set by the user. These are in two files (main- params and extraparams), which are read every time the program executes. mainparams specifies the input format for the data file and the most basic run parameters. extraparams specifies a wider variety of program options. You will need to set all the values in mainparams, while the default val-  ues in extraparams are probably ok to begin with. Note that the default model assumes admixture, and  does not make  use  of the  user-defined  PopData.
Each parameter is printed in all-caps in one of these two files, preceded by the word “#define”. (They are also printed in all-caps throughout this document.)  The value is set  immediately  following  the name of the parameter (eg “#define NUMREPS 1000” sets the number of MCMC repetitions      to 1000).
Following each parameter definition, there is a brief comment (marked “//”), describing the parameter. This includes an indication of what sort of value is expected. These include:  “(str)”,  for string (used for the names of the input and output files); “(int)”, for integer; “(d)”, for double (i.e.,  a real number such  as 3.14); and “(B)”,  for Boolean (i.e.,  the parameter takes  values  TRUE  or FALSE  by  setting this to 1 or 0,     respectively).
The program is insensitive to the order of the parameters, so you can re-arrange them or add comments, etc. The values of all parameters used for a given run are printed at the end of the output file.

7.1	Program parameters
In this section we list all of the parameters that can be set by the user. These are ordered according to the parameter files that are used in the command-line version of structure.

7.2	Parameters in file mainparams.
The user will need to set all of these parameters before running the program. Several of these parameters  (LABEL,  POPDATA,  POPFLAG,  PHENOTYPE,  EXTRACOLS)  indicate   whether
 
particular types of data are present in the input file; these are described in Section 2.
Basic Program Parameters.

MAXPOPS (int) Number of populations assumed for a particular run of the program. Pritchard  et al. (2000a) call this K. Sometimes (depending on the nature of the data) there is a natural value of K that can be used, otherwise K can be estimated by checking the fit of the model     at different values of K (see Section  5).
BURNIN (int) Length of burnin period before the start of data collection.  (See Section  3.3.)
NUMREPS (int) Number of MCMC reps after burnin. (See Section  3.3.)

Input/Output files.

INFILE (string) Name of input data file. Max length 30 characters (or possibly less depending on operating system).
OUTFILE (string) Name for program output files (the suffixes “ 1”, “ 2”, ...,“ m” (for inter- mediate results) and “ f” (final results) are added to this name).  Existing  files  with  these  names will be overwritten. Max length of name 30 characters (or possibly less depending on operating   system).

Data file format.
NUMINDS (int) Number of individuals in data file.
NUMLOCI (int) Number of loci in data file.
PLOIDY (int) Ploidy of the organism.  Default is 2 (diploid).
MISSING (int) Value given to missing genotype data. Must be an integer, and must not appear elsewhere in the data set. Default is -9.
ONEROWPERIND (Boolean) The data for each individual are arranged in a single row. E.g., for diploid data, this would mean that the two alleles for each locus are in consecutive order  in the same row, rather than being arranged in the same column, in two  consecutive  rows. See section 2 for details about input   formats.
LABEL (Boolean) Input file contains labels (names) for each individual. 1 = Yes; 0 = No.
POPDATA (Boolean) Input file contains a user-defined population-of-origin for each individual.
1 = Yes; 0 = No.
POPFLAG (Boolean) Input file contains an indicator variable which says whether to use popinfo when USEPOPINFO==1 (see below). 1 = Yes; 0 = No.
LOCDATA (Boolean) Input file contains a user-defined sampling location for each individual. 1 = Yes; 0 = No. For use in the LOCPRIOR model. Can set LOCISPOP=1 to use the POPDATA instead in the LOCPRIOR model.
PHENOTYPE (Boolean) Input file contains a column of phenotype information. 1 = Yes; 0 = No.
 
EXTRACOLS (int) Number of additional columns of data after the Phenotype before the geno- type data start. These are ignored by the program. 0 = no extra  columns.
MARKERNAMES (Boolean) The top row of the data file contains a list of L names corre- sponding to the markers  used.
RECESSIVEALLELES (Boolean) Next row of data file contains a list of L integers indicating which alleles are recessive at each locus. Setting this to 1 implies that the dominant marker model is in use.
MAPDISTANCES (Boolean) The next row of the data file (or the first row if MARKER- NAMES==0) contains a list of mapdistances between neighboring   loci.
Advanced data file options.
PHASED (Boolean) For use with linkage model. Indicates that data are in correct phase. If (LINKAGE=1, PHASED=0), then PHASEINFO can be used–this  is an extra line in the input file that gives phase probabilities. When PHASEINFO =0 each value is set to 0.5, implying no phase information. When the linkage model is used with polyploids, PHASED=1 is required.
PHASEINFO (Boolean) The row(s) of genotype data for each individual are followed by a row of information about haplotype phase. This is for use with the linkage model only.  See sections  2 and 3.1 for further  details.
MARKOVPHASE (Boolean) The phase information follows a Markov model. See sections 2.2  and  9.6  for details.
NOTAMBIGUOUS (int) For use with polyploids when RECESSIVEALLELES=1. Defines the code indicating that genotype data at a marker are unambiguous.  Must not match MISSING or any allele value in the  data.

7.3	Parameters in file extraparams.
These options allow the user to refine the model in various ways,  and do more involved  analyses.      The default values are probably fine to begin with. For Boolean options, type 1 for “Yes”, or “Use  this option”; 0 for “No” or “Don’t use this option”.
Program options.
NOADMIX (Boolean) Assume the model without admixture (Pritchard et al., 2000a). (Each individual is assumed to be completely from one of the K populations.) In the output, instead of printing the average value of Q as in the admixture case, the program prints the posterior probability that each individual is from each population. 1 = no admixture; 0 = model with admixture.
LINKAGE (Boolean) Use the linkage model. See section 3.1. RLOG10START sets the initial value of recombination rate r per unit distance. RLOG10MIN and RLOG10MAX set the minimum and maximum allowed values for log10r. RLOG10PROPSD sets the size of the proposed changes to log10r in each update. The front end makes some guesses about these, but some care on the part of the user in required to be sure that the values are sensible for the particular application.
USEPOPINFO (Boolean) Use prior population information to assign individuals to clusters. See also MIGRPRIOR and GENSBACK. Must have    POPDATA=1.
LOCPRIOR (Boolean) Use location information to improve the performance on data that are weakly informative about structure.
FREQSCORR (double) Use the “F model”, in which the allele frequencies are correlated across populations (Falush et al., 2003a).  More specifically, rather than assuming a prior in which the allele frequencies in each population are independent draws from a uniform Dirichlet dis- tribution, we start with a distribution which is centered around the mean allele frequencies in the sample. This model is more realistic for very closely related populations (where we expect the allele frequencies to be similar across populations), and can produce better clustering (section 3.2). The prior of Fk is set using FPRIORMEAN, and FPRIORSD. There may be a tendency to overestimate K when FREQSCORR is turned    on.
ONEFST (Boolean) Assume the same value of Fk for all populations (analogous to Wright’s traditional FST ). This is not recommended for most data, because in practice you probably expect different levels of divergence in each population. When K = 2 it may sometimes be difficult to estimate two values of FST separately (but see Harter et al. (2004)). When you’re trying to estimate K, you should use the same model for all K (we suggest ONEFST=0).
INFERALPHA (Boolean) Infer the value of the model parameter α from the data; otherwise   α is fixed at the value  ALPHA which is chosen by  the user.  This option is ignored under  the NOADMIX model. (The  prior  for  the ancestry  vector Q is Dirichlet  with  parameters (α, α, ..., α). Small α implies that most individuals are essentially from one population or another, while alpha > 1 implies that most individuals are admixed.)
POPALPHAS (Boolean) Infer a separate α for each population. Not recommended in most cases but may be useful for situations with asymmetric    admixture.
ALPHA (double) Dirichlet parameter (α) for degree of admixture (this is the initial value if INFERALPHA==1).
INFERLAMBDA (Boolean) Infer a suitable value for λ. Not recommended for most analyses.
POPSPECIFICLAMBDA (Boolean) Infer a separate λ for each population.
LAMBDA (double) parameterizes the allele frequency prior, and for most data the default value of 1 seems to work pretty well. If the frequencies at most markers are very skewed towards low/high frequencies, a smaller value of λ may potentially lead to better performance. It doesn’t seem to work very well to estimate λ at the same time as the other hyperparameters, α and F .
Priors.These values are used to parametrize the assumed probability models. In most cases the default settings should be fairly sensible and you may not want to worry about these.
FPRIORMEAN, FPRIORSD (double) See FREQSCORR. The prior for Fk is taken to be Gamma with mean FPRIORMEAN, and standard deviation FPRIORSD. Our default settings place a lot of weight on small values of F . We find that this makes the algorithm sensitive to subtle structure, but at some increased risk of overestimating K (Falush et al., 2003a).
UNIFPRIORALPHA (Boolean), ALPHAMAX (double) Assume a uniform prior for α which runs between 0 and ALPHAMAX. This model seems to work fine; the alternative model (when UNIFPRIORALPHA=0) is to take α as having a Gamma prior, with mean ALPHAPRI- ORA × ALPHAPRIORB, and variance ALPHAPRIORA × ALPHAPRIORB2.
LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the
linkage model is used, the switch rate r is taken to have a uniform prior on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  used.
Using prior population  information  (USEPOPINFO).
GENSBACK (int) This corresponds to G (Pritchard et al., 2000a). When using prior population information for individuals (USEPOPINFO=1), the program tests whether each individual has an immigrant ancestor in the last G generations, where G = 0 corresponds to the individual being an immigrant itself. In order to have decent power, G should be set fairly small (2, say) unless the data are highly informative.
MIGRPRIOR (double) Must be in [0,1]. This is ν in Pritchard et al. (2000a). Sensible values might be in the range 0.001—0.1.
PFROMPOPFLAGONLY (Boolean) This option, new with version 2.0, makes it possible to update the allele frequencies, P , using only a prespecified subset of the individuals. To use this, include a POPFLAG column, and set POPFLAG=1 for individuals who should be used to update P , and POPFLAG=0 for individuals who should not be used  to update P .  This can be used both with, or without USEPOPINFO    turned on.
This option will be useful, for example, if you have a standard reference set of individuals from known populations, and then you want to estimate the ancestry of some unknown individuals. Using this option, the q estimate for each unknown individual depends only on the reference set, and not on the other unknown individuals in the sample. This property is sometimes desirable.
LOCPRIOR model for using location   information.
LOCISPOP (Boolean) This option instructs the program to use the PopData column in the input file as location data when the LOCPRIOR model is turned on. When LOCISPOP=0, the program requires a LocData column to use LOCPRIOR.
LOCPRIORINIT (double) Initial value for the LOCPRIOR parameter r, that parameterizes how informative the populations are (citepHubiszEtAl09). We found that LOCPRIORINIT=1 helped achieve good convergence.
MAXLOCPRIOR (double) Range of r is from (0,MAXLOCPRIOR). We suggest MAXLOCPRIOR=20.
Output options
PRINTNET (Boolean) Print the “net nucleotide distance” between clusters. This distance between populations A and B, DAB, is calculated.
In words, the net nucleotide distance is the average probability that a pair of alleles, one each from populations A and B are different, less the average within-population heterozygosities. Perhaps more intuitively, this can be thought of as being the average amount of pairwise difference between alleles from different populations, beyond the amount of variation found within each population. The distance has the appropriate property that similar populations have distances near 0, and in particular, DAA = 0. Notice that the distance is symmetric, so that DAB = DBA. This distance is suitable for drawing trees of populations to help visualize the levels of difference among the clusters (Falush et al., 2003b).
PRINTKLD (Boolean) [Deprecated] This option is no longer available.
PRINTLAMBDA (Boolean) Print current value of λ to screen.
PRINTQSUM (Boolean) Print summary of current Q estimates to screen; this prints an average for each value of  PopData.
SITEBYSITE (Boolean) (Linkage model) Print a complete summary of assignment probabilities for every genotype in the data. This is printed to a separate file with the suffix “ss”. This file can be big!
PRINTQHAT (Boolean) When this is turned on, the point estimate for Q is not only printed into the main results file, but also into a separate file with suffix “q”. This file is required in order to run the companion program   STRAT.
UPDATEFREQ (int) Frequency of printing updates to the screen. Set automatically if this =0.
PRINTLIKES (Boolean) Print the current value of the likelihood to the screen in every iteration.
INTERMEDSAVE (int) If you’re impatient to see  preliminary  results  before the  end  of the  run, you can have results printed to file at intervals during the MCMC run. A total of INTERMEDSAVE such files are printed, at equal intervals following the completion of the BURNIN. Turn  this off by  setting to 0.      Names of these  files  created using  OUTFILE name.
ECHODATA (Boolean) Print a brief summary of the data set to the screen and output file. (Prints the beginnings and ends of the top and bottom lines of the input file to allow the user to check that it has been read correctly.)
ANCESTDIST (Boolean) Collect information about the distribution of Q for each individual, as well  as just estimating the mean. When this is turned on,  the output file includes  the  left- and right-hand ends of the probability intervals  for each q(i).   (A probability interval      is the Bayesian analog of a confidence interval.)  The values printed show the middle 100p%  of the probability interval, where p is a number in the range 0.0 to 1.0 and is set using ANCESTPINT. The distribution of Q is estimated by recording the number of hits in each of a number of boxes between 0 and 1, to form a sort of histogram. The width of these boxes, which are of equal size, is set using NUMBOXES.
 
Miscellaneous
COMPUTEPROB (Boolean) Print the log-likelihood of the data at each update, and estimate the probability of the data given K and the model (see section 5). This is used in estimating K, and is also a useful diagnostic for whether the burnin is long enough. The main reason for turning this off would be to speed up the program (∼ 10–15%).
ADMBURNIN (int) (For use when RECOMBINE=1.) When using the linkage model, a short burnin with the admixture model (say 500 iterations) is strongly recommended in most cir- cumstances. Without such a burnin, the linkage model often produces peculiar results.
Set ADMBURNIN < BURNIN. We have dropped a related parameter (NOADMBURNIN) that was in Version  1.
ALPHAPROPSD (double) The Metropolis-Hastings update step for α involves picking a value αl from a Normal with mean α and standard deviation ALPHAPROPSD> 0. The value of ALPHAPROPSD does not affect the asymptotic behaviour of the Markov chain,  but may  have  a substantial impact on the rate of convergence.  If there is a lot of information about   α, small values of ALPHAPROPSD are preferable to obtain a reasonable acceptance rate. If there’s not much information about α, larger values produce better mixing.
STARTATPOPINFO (Boolean) Use given populations as the initial condition for population origins. (Need POPDATA==1). This option provides a check that the Markov chain is converging properly in cases where you expected the inferred structure to match the input labels, and it did not.  This option assumes that the PopData in the input file are   between
1 and k where k ≤MAXPOPS. Individuals for whom the PopData are not in this range are initialized  at random.
RANDOMIZE (Boolean) Use a different random number seed for each run, taken from the system clock. (See also SEED.)
SEED (Integer) If RANDOMIZE==0, then the simulation seed is initialized to SEED. This allows runs to be repeated exactly. If RANDOMIZE=1 then any value specified in SEED is ignored. Note that even when RANDOMIZE==1, the program output still  indicates  the  starting  seed value  so  that it is possible  to repeat  particular runs  if   desired.
METROFREQ (int) Frequency of using a Metropolis-Hastings step to update Q under the admixture model. When this is used, a new proposal q(i)l is chosen for each q(i). This proposal is sampled from the prior (ie q(i)l ∼ D(α, α, ..., α)). The rationale for having this update is that it may improve mixing when alpha is quite small, by making it easier for individuals to
jump between populations. The Metropolis-Hastings move is used once every METROFREQ iterations.  If METROFREQ is set to 0, it is never   used.
REPORTHITRATE (Boolean) Report acceptance rate of Metropolis update for q(i) (see METRO- FREQ).

7.4	Command-line  changes  to parameter values
In order to simplify batch runs and make it easier to run simulations involving structure, we have added command-line flags that update the values  of certain parameters, over-riding the values set  in mainparams. These are as follows:
 
-m (mainparams) Read a different parameter input file instead of mainparams.
-e (extraparams) Read a different parameter input file instead of extraparams.
-s (stratparams) Read a different parameter input file instead of stratparams. (For use with the accompanying  program,  STRAT,  for  association mapping.)
-K (MAXPOPS) Change the number of populations.
-L (NUMLOCI) Change the number of  loci.
-N (NUMINDS) Change the number of individuals.
-i (input file) Read data from a different input   file.
-o (output file) Print results to a different output file.
-D (SEED) (Not supported in Paralle structure) Initialize the random number generation using the value SEED. Note that RAN- DOMIZE MUST be set to 0 to use this   option.)arallel

Thus, to over-ride one of the preset parameter values, we invoke structure and then use the relevant flag, followed by the new parameter value. The flag and new value are separated by  a space.  The flags can be used in any order.
For example, to change the number of assumed populations to 5, and direct the output to a file called output5, we could call structure as follows:
./structure  -K  5  -o  output5 -->

<parameters> 																																		
<!-- here is what the basic PS command line looks like  
parallel_structure(structure_path=STR_path,joblist='joblist1.txt',n_cpu=4,infile='example_data.txt',outpath='testResults/',numinds=987,numloci=9,printqhat=1)-->
<!-- this will create the non-configurable parts -->					
<!--  submission on comet: the invocation line and any needed thread specification  -->					
					<parameter ismandatory="1" ishidden="1" type="String">
						<name>pstructure_comet</name>
						<attributes>
						<!-- (structure_path=STR_path, joblist='joblist.txt', n_cpu=12, infile='data.txt', -->
							<format>
								<language>perl</language>		
								<code>"<![CDATA[parallelstructure_2.3.4_expanse]]>"</code>
							</format>
							<group>1</group>
						</attributes>
					</parameter>					
			
<!--  submission: the invocation line and any needed thread specification  -->
<!-- 	The submission command line will be divided into several parts. first there is the submission itself, then there is the main input file 
		and 1 - 3 auxiliary files.
		
		Here is the basic protocol. 
		
-m (mainparams) Read a different parameter input file instead of mainparams. (this is auxilliary file one)
-e (extraparams) Read a different parameter input file instead of extraparams. (this is auxilliary file two)
-s (stratparams) Read a different parameter input file instead of stratparams. (this is auxilliary file three)
(For use with the accompanying  program,  STRAT,  for  association mapping.)
-K (MAXPOPS) Change the number of populations.
-L  (NUMLOCI) Change the number of  loci.
-N (NUMINDS) Change the number of individuals.
-i (input file) Read data from a different input   file.
-o (output file) Print results to a different output file.
-D (SEED) Initialize the random number generation using the value SEED. Note that RANDOMIZE MUST be set to 0 to use this   option.) 
 -->
								
<!-- ******************************************************************************************************************************** -->
<!-- here is an example of the parallel structure command line
parallel_structure(structure_path=STR_path,joblist='joblist1.txt',n_cpu=4,infile='example_data.txt',outpath='testResults/',numinds=987,numloci=9,printqhat=1)  -->
<!-- All runs are on a single node of Expanse.

- Ask the user for the number of jobs in joblist.txt.

- Set

  threads = min(jobs,32) = CIPRES_THREADSPP

  memory = 2G if threads = 1 or
         = floor(threads*248/128)G otherwise

- Include the following in the run script.

  #SBATCH -p shared
  #SBATCH -qos=shared-cipres
  #SBATCH -N 1
  #SBATCH -*ntasks-per-node=1
  #SBATCH -*cpus-per-task=<threads>
  #SBATCH -*mem=<memory>
  ...
  srun -n 1 -*mpi=pmi2 <wrapper> ...
 -->
<!-- the hidden config files below are placeholders for now -->								
				<parameter type="String" ishidden="1" >
					<name>number_nodes</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
 						<precond>
							<language>perl</language>
							<code>$specify_threads &lt; 32</code>
						</precond>  
						<format>
							<language>perl</language>
							<code>
									"nodes=1\\n" .
									"node_exclusive=0\\n" .
									"mem=" . (int(($specify_threads)*(248/128))) . "G\\n" .
									"threads_per_process=$specify_threads\\n"
							</code>
						</format>
					</attributes>
				</parameter>
				
				<parameter type="String" ishidden="1" >
					<name>number_nodes2</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
 						<precond>
							<language>perl</language>
							<code>$specify_threads &gt;= 32</code>
						</precond>  
						<format>
							<language>perl</language>
							<code>
									"nodes=1\\n" .
									"node_exclusive=0\\n" .
									"mem=" . (int(32*(248/128))) . "G\\n" .
									"threads_per_process=32\\n"
							</code>
						</format>
					</attributes>
				</parameter>
				
<!--  			<parameter type="String" ishidden="1" >
					<name>number_nodes3</name>
					<attributes>
						<group>2</group>
						<paramfile>scheduler.conf</paramfile>
 						<precond>
							<language>perl</language>
							<code>$specify_threads == 48 </code>
						</precond>  
						<format>
							<language>perl</language>
							<code>
									"nodes=2\\n" .
									"node_exclusive=1\\n" .
									"threads_per_process=$specify_threads\\n"
							</code>
						</format>
					</attributes>
				</parameter> -->	
							
<!--END System CONFIGURATION FOR STRUCTURE -->
		
<!-- input file specification -->
<!-- the input file to be operated on ends the command line -->
		<parameter issimple="1" ismandatory="1" isinput="1" type="InFile">
			<name>infile</name>
			<attributes>
				<prompt>Input File (must be in proper Structure format)</prompt>
<!-- 			<format>
					<language>perl</language>
					<code>"infile"</code>
				</format>  -->
				<filenames>data.txt</filenames>
			</attributes>
		</parameter>
<!-- Results section. To start we return all results files available-->

		<parameter ishidden="1" type="Results">
			<name>pdf_results</name>
			<attributes>
				<filenames>*.pdf</filenames>
			</attributes>
		</parameter>
		
		<parameter ishidden="1" type="Results">
			<name>txt_results</name>
			<attributes>
				<filenames>*.txt</filenames>
			</attributes>
		</parameter>
		
		<parameter ishidden="1" type="Results">
			<name>txt_results2</name>
			<attributes>
				<filenames>*.TXT</filenames>
			</attributes>
		</parameter>
		
		<parameter ishidden="1" type="Results">
			<name>results_results_f</name>
			<attributes>
				<filenames>results*_f</filenames>
			</attributes>
		</parameter>
		
		<parameter ishidden="1" type="Results">
			<name>results_results_q</name>
			<attributes>
				<filenames>results*_q</filenames>
			</attributes>
		</parameter>
		
		<parameter ishidden="1" type="Results">
			<name>csv_results</name>
			<attributes>
				<filenames>*.csv</filenames>
			</attributes>
		</parameter>
		

<!-- 		<parameter ishidden="1" type="Results">
			<name>all_results</name>
			<attributes>
				<filenames>*</filenames>
			</attributes>
		</parameter>  -->
		
<!-- This section provides visible queries that help configure the interface  -->

<!-- Dont need this yet			
		<parameter issimple="1" ismandatory="1" type="Excl">
			<name>which_beast</name>
			<attributes>
				<prompt>Which Version of Structure do you wish to run?</prompt>
				<vlist>
					<value>0</value>
					<label>Version 1.8.0</label>
					<value>1</value>
					<label>Version 1.8.1</label>
					<value>2</value>
					<label>Version 1.8.2</label>
					<value>3</value>
					<label>Version 1.8.3</label>
				</vlist>
				<vdef>
					<value>3</value>				
				</vdef>
				<ctrls>
					<ctrl>
						<message>Please choose a BEAST version</message>
						<language>perl</language>
						<code>!defined $which_beast</code>
					</ctrl>
				</ctrls>
				<group>4</group>
				<comment>
<value></value>
				</comment>
			</attributes>
		</parameter>  -->
		
<!-- this sets the run time -->
				<parameter type="Float" issimple="1" ismandatory="1">
					<name>runtime</name> 
					<attributes>
						<group>1</group>
						<paramfile>scheduler.conf</paramfile>
						<prompt>Maximum Hours to Run (up to 168 hours)</prompt>
						<vdef>
							<value>0.5</value>
						</vdef>
						<ctrls>
							<ctrl>
								<message>The maximum hours to run must be less than 168</message>
								<language>perl</language>
								<code>$runtime &gt; 168.0</code>
							</ctrl>
							<ctrl>
								<message>The maximum hours to run must be greater than 0.05</message>
								<language>perl</language>
								<code>$runtime &lt; 0.05</code>
							</ctrl>
						</ctrls>
						<format>
							<language>perl</language>
							<code>"runhours=$value\\n"</code>
						</format>
						<warns>
							<warn>
								<message>The job will run on $specify_threads processors as configured. If it runs for the entire configured time, it will consume $specify_threads X $runtime cpu hours</message>
								<language>perl</language>
								<code>$specify_threads &lt; 32</code>
							</warn>
							<warn>
								<message>The job will run on 32 processors as configured. If it runs for the entire configured time, it will consume 32 X $runtime cpu hours</message>
								<language>perl</language>
								<code>$specify_threads &gt; 31</code>
							</warn>
						</warns>
						<comment>
<value>Estimate the maximum time your job will need to run. We recommend testimg initially with a &lt; 0.5hr test run because Jobs set for 0.5 h or less depedendably run immediately in the "debug" queue. 
Once you are sure the configuration is correct, you then increase the time. The reason is that jobs &gt; 0.5 h are submitted to the "normal" queue, where jobs configured for 1 or a few hours times may
	run sooner than jobs configured for the full 168 hours. 
</value>
						</comment>
					</attributes>
				</parameter>
				
		<parameter issimple="1" ismandatory="1" type="InFile">
			<name>joblist_file</name>
			<attributes>
				<prompt>Joblist File</prompt>
<!-- 			<format>
					<language>perl</language>
					<code>"infile"</code>
				</format>  -->
				<filenames>joblist.txt</filenames>
				<ctrls>
					<ctrl>
						<message>Please select a joblist file</message>
						<language>perl</language>
						<code>!defined $joblist_file </code>	
					</ctrl>
				</ctrls>
			</attributes>
		</parameter>
		
		<parameter issimple="1" ismandatory="1" type="Integer">
			<name>specify_threads</name>
			<attributes>
				<prompt>How many jobs are in your job list file?</prompt>
				<ctrls>
					<ctrl>
						<message>Please specify the number of threads</message>
						<language>perl</language>
						<code>!defined $specify_threads</code>	
					</ctrl>
				</ctrls>
			</attributes>
		</parameter>
		
<!-- here is what the basic command line looks like  
parallel_structure(structure_path=STR_path,joblist='joblist1.txt',n_cpu=4,infile='example_data.txt',outpath='testResults/',numinds=987,numloci=9,printqhat=1)-->
<!-- -->

<parameter type="Paragraph">
	<paragraph>
		<name>data_file_params</name>
		<prompt>Data File Configuration</prompt>
		<parameters>

<!-- apply all the model parameters selected here to all partitions-->
					<parameter type="Integer">
						<name>set_numinds</name>
						<attributes>
							<prompt>Number of individuals in the population (NUMINDS)</prompt>
							<group>3</group>
<!-- 						<precond>
									<language>perl</language>
									<code></code>
							</precond>  -->
							<format>
								<language>perl</language>
								<code>"numinds = $value,"</code>
							</format>
							<ctrls>
								<ctrl>
									<message>Please specify the number of individuals in the population</message>
									<language>perl</language>
									<code>!defined $set_numinds</code>
								</ctrl>
							</ctrls>
							<comment>
								<value>
								</value>
							</comment>
						</attributes>			
					</parameter>
					
					<parameter type="Integer">
						<name>set_numloci</name>
						<attributes>
							<prompt>Number of loci in the dataset (NUMLOCI)</prompt>
							<group>4</group>
<!-- 						<precond>
									<language>perl</language>
									<code></code>
							</precond>  -->
							<format>
								<language>perl</language>
								<code>"numloci=$value,"</code>
							</format>
							<ctrls>
								<ctrl>
									<message>Please specify the number of loci in the population</message>
									<language>perl</language>
									<code>!defined $set_numloci</code>
								</ctrl>
							</ctrls>
							<comment>
								<value>
								</value>
							</comment>
						</attributes>			
					</parameter>
<!-- PLOIDY (int) Ploidy of the organism.  Default is 2 (diploid). -->
					<parameter type="Integer">
						<name>set_ploidy</name>
						<attributes>
							<prompt>Ploidy of the dataset (PLOIDY)</prompt>
							<group>5</group>
							<format>
								<language>perl</language>
								<code>(defined $set_ploidy) ? "ploidy=$value," :""</code>
							</format>
							<comment>
								<value>
								</value>
							</comment>
						</attributes>			
					</parameter>
<!-- 
MISSING (int) Value given to missing genotype data. Must be an integer, and must not appear elsewhere in the data set. Default is -9.-->
					<parameter type="Integer">
						<name>set_missing</name>
						<attributes>
							<prompt>Value given to missing genotype data (MISSING)</prompt>
							<group>6</group>
							<format>
								<language>perl</language>
								<code>(defined $set_missing) ? "missing=$value," :""</code>
							</format>
							<comment>
							<value> Must be an integer, and must not appear elsewhere in the data set. Default is -9</value>
							</comment>
						</attributes>			
					</parameter>

<!-- ONEROWPERIND (Boolean) The data for each individual are arranged in a single row. E.g., for diploid data, this would mean that the two alleles for each locus are in consecutive order  in the same row, rather than being arranged in the same column, in two  consecutive  rows. See section 2 for details about input   formats. -->
					<parameter type="Switch">
						<name>set_onerowperind</name>
						<attributes>
							<prompt>The data for each individual are arranged in a single row (ONEROWPERIND)</prompt>
							<group>7</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "onerowperind=1," :""</code>
							</format>
							<comment>
							<value> ONEROWPERIND (Boolean) The data for each individual are arranged in a single row. E.g.,
for diploid data, this would mean that the two alleles for each locus are in consecutive order in the same row, rather than being arranged in the same column, in two consecutive rows.
							</value>
							</comment>
						</attributes>			
					</parameter>
					
<!--LABEL (Boolean) Input file contains labels (names) for each individual. 1 = Yes; 0 = No.-->
<!--  					<parameter type="Switch">
						<name>set_labels</name>
						<attributes>
							<prompt>Input file contains labels (names) for each individual (LABEL)</prompt>
							<group>8</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "labels=1," :""</code>
							</format>
						</attributes>			
					</parameter> -->

<!-- USEPOPINFO (Boolean) Use prior population information to assign individuals to clusters. See also MIGRPRIOR and GENSBACK. 
Must have POPDATA=1. -->
    					<parameter type="Switch">
						<name>use_popinfo</name>
						<attributes>
							<prompt>Use prior population information to assign individuals to clusters (USEPOPINFO)</prompt>
							<group>23</group>
							<format> 
								<language>perl</language>
								<code>( $value) ? "usepopinfo=1," :""</code>
							</format>
							<ctrls>
								<ctrl>
									<message>To use population information information, you must indicate that your input file contains an indicator variable which says whether to use popinfo</message>
									<language>perl</language>
									<code>$use_popinfo &amp;&amp; !$set_popflag</code>
								</ctrl>
							</ctrls>
						</attributes>			
					</parameter>
					
<!-- POPDATA (Boolean) Input file contains a user-defined population-of-origin for each individual.
1 = Yes; 0 = No. -->
					<parameter type="Switch">
						<name>set_popdata</name>
						<attributes>
							<prompt>Input file contains a user-defined population-of-origin for each individual (POPDATA)</prompt>
							<group>9</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "popdata=1," :""</code>
							</format>
							<vdef>
								<value>1</value>
							</vdef>
							<warns>
								<warn>
									<message>ParallelStructure requires a population data column. The values can all be 1 if you don't want to use this parameter</message>
									<language>perl</language>
									<code>$set_popdata = 0</code>
								</warn>
							</warns>
						</attributes>			
					</parameter>


<!--  POPFLAG (Boolean) Input file contains an indicator variable which says whether to use popinfo when USEPOPINFO==1 (see below). 1 = Yes; 0 = No.-->
					<parameter type="Switch">
						<name>set_popflag</name>
						<attributes>
							<prompt>Input file contains an indicator variable which says whether to use popinfo (POPFLAG)</prompt>
							<group>10</group>
 							<precond>
								<language>perl</language>
								<code>$use_popinfo</code>
							</precond>  
							<format>
								<language>perl</language>
								<code>( $value) ? "popflag=1," :""</code>
							</format>
						</attributes>			
					</parameter>
					
<!-- LOCDATA (Boolean) Input file contains a user-defined sampling location for each individual. 1 = Yes; 0 = No. For use in the LOCPRIOR model. Can set LOCISPOP=1 to use the POPDATA instead in the LOCPRIOR model.  -->
					<parameter type="Switch">
						<name>set_locdata</name>
						<attributes>
							<prompt>Input file contains a user-defined sampling location for each individual (LOCDATA)</prompt>
							<group>11</group>
<!-- For use in the LOCPRIOR model. Can set LOCISPOP=1 to use the POPDATA instead in the LOCPRIOR model. -->
  						<precond>
								<language>perl</language>
								<code>$use_locprior</code>
							</precond>  
							<format>
								<language>perl</language>
								<code>( $value) ? "locdata=1," :""</code>
							</format>
						</attributes>			
					</parameter>
					
<!-- PHENOTYPE (Boolean) Input file contains a column of phenotype information. 1 = Yes; 0 = No.  -->
					<parameter type="Switch">
						<name>set_phenotype</name>
						<attributes>
							<prompt>Input file contains a column of phenotype information (PHENOTYPE)</prompt>
							<group>12</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "phenotype=1," :""</code>
							</format>
						</attributes>			
					</parameter>

 
<!--  EXTRACOLS (int) Number of additional columns of data after the Phenotype before the genotype data start. These are ignored by the program. 0 = no extra  columns. -->
<!-- 				<parameter type="Integer">
						<name>set_extracols</name>
						<attributes>
							<prompt>Number of additional columns of data after the Phenotype before the genotype data start (EXTRACOLS)</prompt>
							<group>13</group>
							<format>
								<language>perl</language>
								<code>(defined $set_extracols) ? "extracols=$set_extracols," :""</code>
							</format>
							<comment>
							<value>These are ignored by the program</value>
							</comment>
						</attributes>			
					</parameter>  -->
					
<!--  MARKERNAMES (Boolean) The top row of the data file contains a list of L names corre- sponding to the markers  used. -->
					<parameter type="Switch">
						<name>set_markernames</name>
						<attributes>
							<prompt>The top row of the data file contains a list of L names corresponding to the markers  used (MARKERNAMES)</prompt>
							<group>14</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "markernames=1,":""</code>
							</format>
						</attributes>			
					</parameter>

<!--  RECESSIVEALLELES (Boolean) Next row of data file contains a list of L integers indicating which alleles are recessive at each locus. Setting this to 1 implies that the dominant marker model is in use.-->
					<parameter type="Switch">
						<name>set_recessivealleles</name>
						<attributes>
							<prompt>Next row of data file contains a list of L integers indicating which alleles are recessive at each locus (RECESSIVEALLELES)</prompt>
							<group>15</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "recessivealleles=1," :""</code>
							</format>
							<comment>
								<value>Setting this to 1 implies that the dominant marker model is in use.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- MAPDISTANCES (Boolean) The next row of the data file (or the first row if MARKERNAMES==0) contains a list of mapdistances between 
neighboring loci. Advanced data file option. -->
<!--  					<parameter type="Switch">
						<name>set_mapdistances</name>
						<attributes>
							<prompt>The next row of the data file contains a list of mapdistances between neighboring loci (MAPDISTANCES)</prompt>
							<group>16</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "mapdistances=1," :""</code>
							</format>
							<comment>
								<value>The next row of the data file (or the first row if MARKER- NAMES==0) contains a list of mapdistances between neighboring loci.
Advanced data file option.</value>
							</comment>
						</attributes>			
					</parameter> -->

<!-- LINKAGE (Boolean) Use the linkage model. See Section 3.1. RLOG10START sets the initial value of recombination rate r per unit distance. 
RLOG10MIN and RLOG10MAX set the minimum and maximum allowed values for log10r. RLOG10PROPSD sets the size of the proposed changes to log10r in
each update. The front end makes some guesses about these, but some care on the part of the user in required to be sure that the values are
sensible for the  particular application. -->
    					<parameter type="Switch">
						<name>use_linkagemodel</name>
						<attributes>
							<prompt>Use the linkage model (LINKAGE)</prompt>
							<group>22</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "linkage=1," :""</code>
							</format>
							<comment>
								<value>RLOG10START sets the initial value of recombination rate r per unit distance. 
RLOG10MIN and RLOG10MAX set the minimum and maximum allowed values for log10r. RLOG10PROPSD sets the size of the proposed changes to log10r in
each update. The front end makes some guesses about these, but some care on the part of the user in required to be sure that the values are
sensible for the  particular application.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- PHASED (Boolean) For use with linkage model. Indicates that data are in correct phase. If (LINKAGE=1, PHASED=0), then PHASEINFO can be 
used – this is an extra line in the input file that gives phase probabilities. When PHASEINFO=0 each value is set to 0.5, implying no phase 
information. When the linkage model is used with polyploids, PHASED=1 is required. -->
					<parameter type="Switch">
						<name>set_phased</name>
						<attributes>
							<prompt>Indicates that data are in correct phase (PHASED)</prompt>
							<group>17</group>
<!-- For use with linkage model. -->
	  						<precond>
								<language>perl</language>
								<code>$use_linkagemodel</code>
							</precond>
							<format>
								<language>perl</language>
								<code>( $value) ? "phased=1," :""</code>
							</format>
							<ctrls>
								<ctrl>
									<message>When the linkage model is used with polyploids, PHASED=1 is required.</message>
									<language>perl</language>
									<code>$set_ploidy &gt; 2 &amp;&amp; !$set_phased </code>
								</ctrl>
							</ctrls>
							<comment>
								<value>For use with linkage model. Indicates that data are in correct phase. If (LINKAGE=1, PHASED=0), then PHASEINFO can be used–this is an extra line in the input file that gives phase probabilities. When PHASEINFO =0 each value is set to 0.5, 
								implying no phase information. When the linkage model is used with polyploids, PHASED=1 is required.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- PHASEINFO (Boolean) The row(s) of genotype data for each individual are followed by a row of information about haplotype phase. This is for use with the linkage model only.  See sections  2 and 3.1 for further  details. -->
					<parameter type="Switch">
						<name>set_phaseinfo</name>
						<attributes>
							<prompt>The row(s) of genotype data for each individual are followed by a row of information about haplotype phase (PHASEINFO)</prompt>
							<group>18</group>
<!-- For use with linkage model. -->
	  						<precond>
								<language>perl</language>
								<code>$use_linkagemodel &amp;&amp; !$set_phased</code>
							</precond>
							<format>
								<language>perl</language>
								<code>( $value) ? "phaseinfo=1," :""</code>
							</format>
							<comment>
								<value> The row(s) of genotype data for each individual are followed by a row of information about haplotype phase. This is for use with the linkage model only.  See sections  2 and 3.1 for further  details. </value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- MARKOVPHASE (Boolean) The phase information follows a Markov model. See sections 2.2  and  9.6  for details.-->
<!--  					<parameter type="Switch">
						<name>set_markovphase</name>
						<attributes>
							<prompt>The phase information follows a Markov model (MARKOVPHASE)</prompt>
							<group>19</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "markovphase=1," :""</code>
							</format>
							<comment>
								<value>See sections 2.2  and  9.6  for details..</value>
							</comment>
						</attributes>			
					</parameter> -->
					
<!-- NOTAMBIGUOUS (int) For use with polyploids when RECESSIVEALLELES=1. Defines the code indicating that genotype data at a marker are unambiguous.  Must not match MISSING or any allele value in the  data.-->
<!--  					<parameter type="Integer">
						<name>set_notambiguous</name>
						<attributes>
 							<prompt>Provide an integer that indicates genotype data at a marker are unambiguous (NOTAMBIGUOUS)</prompt> 
  							<group>20</group> 
 For use with polyploids when RECESSIVEALLELES=1. 
 							<precond>
								<language>perl</language>
								<code>$set_recessivealleles &amp;&amp; $set_ploidy &gt; 2 </code>
							</precond>  
							<format>
								<language>perl</language>
								<code>(defined $value) ? "notambiguous=$value," :""</code>
							</format>
							<comment>
							<value>Structure allows the data to consist of a mixture of loci for which there is, and isn’t genotypic ambiguity. 
							If some loci are not ambiguous, set the code NOTAMBIGUOUS to an integer that does not match any of the alleles in 
							the data, and that does not equal MISSING. Then in the recessive alleles line at the top of the input file put the 
							NOTAMBIGUOUS code for the unambiguous loci. If instead, at a particular locus the alleles are all codominant, but 
							there is ambiguity about the number of each (eg for microsatellites in a tetraploid) then set the recessive allele 
							code to MISSING. Finally, if there is a recessive allele, and there is also ambiguity about the number of each allele, 
							then set the recessive allele code to indicate which allele is recessive. Coding of alleles where there is copy number
						 	ambiguity is analogous to that where there are dominant markers. So for example in a tetraploid where three codominant
							loci B, C and D observed, this should be coded as B C D D or equivalently B B C D or any other combination including
							each of the three alleles. It should not be coded as B C D (MISSING), as this indicates that the particular individual
							is triploid at the locus in question. Nor should it be coded B C D A if there is a recessive allele A at the locus.
							For use with polyploids when RECESSIVEALLELES=1.</value>
							</comment>
						</attributes>			
					</parameter> -->
					
			</parameters>		
	</paragraph>				
</parameter>

<parameter type="Paragraph">
	<paragraph>
		<name>second_options</name>
		<prompt>Run Configuration Options (file extraparams)</prompt>
			<parameters>
<!-- NOADMIX (Boolean) Assume the model without admixture (Pritchard et al., 2000a). (Each individual is assumed to be completely from one of the K
 populations.) In the output, instead of printing the average value of Q as in the admixture case, the program prints the posterior probability that
  each individual is from each population. 1 = no admixture; 0 = model with admixture. -->
  					<parameter type="Switch">
						<name>set_noadmix</name>
						<attributes>
							<prompt>Assume the model without admixture (NOADMIX)</prompt>
							<group>21</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "noadmix=1," :""</code>
							</format>
							<comment>
								<value>Each individual is assumed to be completely from one of the K populations. In the output, instead of printing the average value of Q as in the admixture case, the program prints the posterior probability that
  each individual is from each population. 1 = no admixture; 0 = model with admixture.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- LOCPRIOR (Boolean) Use location information to improve the performance on data that are weakly informative about structure.-->
    					<parameter type="Switch">
						<name>use_locprior</name>
						<attributes>
							<prompt>Use location information to improve the performance on data that are weakly informative about structure (LOCPRIOR)</prompt>
							<group>24</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "locprior=1," :""</code>
							</format>
						</attributes>			
					</parameter>
					
<!-- INFERALPHA (Boolean) Infer the value of the model parameter α from the data; otherwise   α is fixed at the value ALPHA which is chosen by the 
user.  This option is ignored under  the NOADMIX model. (The  prior  for  the ancestry  vector Q is Dirichlet  with  parameters (α, α, ..., α). 
Small α implies that most individuals are essentially from one population or another, while alpha > 1 implies that most individuals are admixed.) -->
    					<parameter type="Switch">
						<name>use_inferalpha</name>
						<attributes>
							<prompt>Infer the value of the model parameter alpha from the data (INFERALPHA)</prompt>
							<group>27</group>
  	  						<precond>
								<language>perl</language>
								<code>!$set_noadmix</code>
							</precond>
							<format>
								<language>perl</language>
								<code>( $value) ? "inferalpha=1," :""</code>
							</format>
							<comment>
								<value>Assume the same value of Fk for all populations (analogous to Wright’s traditional FST ). This is not recommended for most 
data, because in practice you probably expect different levels of divergence in each population. When K = 2 it may sometimes be difficult to 
estimate two values of FST separately (but see Harter et al. (2004)). When you’re trying to estimate K, you should use the same model for all K 
(we suggest ONEFST=0).</value>
							</comment>
						</attributes>			
					</parameter>

<!-- POPALPHAS (Boolean) Infer a separate alpha for each population. Not recommended in most cases but may be useful for situations with asymmetric 
admixture. -->
    					<parameter type="Switch">
						<name>use_popalphas</name>
						<attributes>
							<prompt>Infer a separate α for each population (POPALPHAS)</prompt>
							<group>28</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "popalphas=1," :""</code>
							</format>
							<comment>
								<value>Not recommended in most cases but may be useful for situations with asymmetric admixture.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- ALPHA (double) Dirichlet parameter (alpha) for degree of admixture (this is the initial value if INFERALPHA==1). -->
    					<parameter type="Float">
						<name>set_alpha</name>
						<attributes>
							<prompt>Dirichlet parameter (α) for degree of admixture (ALPHA)</prompt>
							<group>29</group>
	  	  					<precond>
								<language>perl</language>
								<code>$use_inferalpha</code>
							</precond>   
							<format>
								<language>perl</language>
								<code>($set_alpha) ? "alpha=$value,":""</code>
							</format>
							<comment>
								<value>Dirichlet parameter (α) for degree of admixture (this is the initial value if INFERALPHA==1).</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- UNIFPRIORALPHA (Boolean), ALPHAMAX (double) Assume a uniform prior for α which runs between 0 and ALPHAMAX. This model seems to work fine; 
the alternative model (when UNIFPRIORALPHA=0) is to take α as having a Gamma prior, with mean ALPHAPRI- ORA × ALPHAPRIORB, and variance ALPHAPRIORA
 × ALPHAPRIORB2. -->
<!--  					<parameter type="Switch">
						<name>use_unifprioalpha</name>
						<attributes>
							<prompt>Assume a uniform prior for alpha which runs between 0 and ALPHAMAX (UNIFPRIORALPHA)</prompt>
							<group>35</group>
  	  					<precond>
								<language>perl</language>
								<code>!$set_noadmix</code>
							</precond>   
							<format>
								<language>perl</language>
								<code>( $value) ? "unifprioalpha=1," :""</code>
							</format>
							<comment>
<value>ALPHAMAX (double) Assume a uniform prior for α which runs between 0 and ALPHAMAX. This model seems to work fine; the alternative model 
(when UNIFPRIORALPHA=0) is to take α as having a Gamma prior, with mean ALPHAPRI- ORA × ALPHAPRIORB, and variance ALPHAPRIORA × ALPHAPRIORB2</value>
							</comment>
						</attributes>			
					</parameter> -->
					
<!-- LAMBDA (double) parameterizes the allele frequency prior, and for most data the default value of 1 seems to work pretty well. 
If the frequencies at most markers are very skewed towards low/high frequencies, a smaller value of λ may potentially lead to better performance. 
It doesn’t seem to work very well to estimate λ at the same time as the other hyperparameters, α and F . Priors.
These values are used to parametrize the assumed probability models. In most cases the default settings should be fairly sensible and you may not 
want to worry about these. -->

<!--   				<parameter type="Float">
						<name>set_lambda</name>
						<attributes>
							<prompt>Parameterize the allele frequency prior (LAMBDA)</prompt>
							<group>32</group>
							<format>
								<language>perl</language>
								<code>($set_alpha) ? "lamda=$value,":""</code>
							</format>
							<vdef>
								<value>1</value>
							</vdef>
							<warns>
								<warn>
									<message>The use of lambda with alpha or F does not work out well usually</message>
									<language>perl</language>
									<code>$set_lambda &amp;&amp; $set_alpha</code>
								</warn>
							</warns>
							<comment>
								<value>LAMBDA (double) parameterizes the allele frequency prior, and for most data the default value of 1 seems to work pretty well. 
If the frequencies at most markers are very skewed towards low/high frequencies, a smaller value of λ may potentially lead to better performance. 
It doesn’t seem to work very well to estimate λ at the same time as the other hyperparameters, α and F . Priors.
These values are used to parameterize the assumed probability models. In most cases the default settings should be fairly sensible and you may not 
want to worry about these</value>
							</comment>
						</attributes>			
					</parameter> --> 
					
<!-- INFERLAMBDA (Boolean) Infer a suitable value for lambda. Not recommended for most analyses. -->
 <!--    					<parameter type="Switch">
						<name>use_inferlambda</name>
						<attributes>
							<prompt>Infer a suitable value for lambda (INFERLAMBDA)</prompt>
							<group>30</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "inferlambda=1," :""</code>
							</format>
							<comment>
								<value>Not recommended for most analyses.</value>
							</comment>
						</attributes>			
					</parameter>  -->
					
<!-- POPSPECIFICLAMBDA (Boolean) Infer a separate lambda for each population. -->
<!--     					<parameter type="Switch">
						<name>use_popspecificlambda</name>
						<attributes>
							<prompt>Infer a separate lambda for each population (POPSPECIFICLAMBDA)</prompt>
							<group>31</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "popspecificlambda=1," :""</code>
							</format>
						</attributes>			
					</parameter> -->
					
<!-- FREQSCORR (double) Use the “F model”, in which the allele frequencies are correlated across populations (Falush et al., 2003a).  More
specifically, rather than assuming a prior in which the allele frequencies in each population are independent draws from a uniform Dirichlet 
distribution, we start with a distribution which is centered around the mean allele frequencies in the sample. This model is more realistic for
very closely related populations (where we expect the allele frequencies to be similar across populations), and can produce better clustering 
(section 3.2). The prior of Fk is set using FPRIORMEAN, and FPRIORSD. There may be a tendency to overestimate K when FREQSCORR is turned on.-->
     					<parameter type="Float">
						<name>use_freqscorr</name>
						<attributes>
							<prompt>Use the F model, in which the allele frequencies are correlated across populations (FREQSCORR)</prompt>
							<group>25</group>
							<format>
								<language>perl</language>
								<code>(defined $use_freqscorr) ? "freqscorr=$value," :""</code>
							</format>
							<comment>
								<value>FREQSCORR (double) Use the “F model”, in which the allele frequencies are correlated across populations (Falush et al., 2003a).  More
specifically, rather than assuming a prior in which the allele frequencies in each population are independent draws from a uniform Dirichlet 
distribution, we start with a distribution which is centered around the mean allele frequencies in the sample. This model is more realistic for
very closely related populations (where we expect the allele frequencies to be similar across populations), and can produce better clustering 
(section 3.2). The prior of Fk is set using FPRIORMEAN, and FPRIORSD. There may be a tendency to overestimate K when FREQSCORR is turned on.</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- FPRIORMEAN (double) See FREQSCORR. The prior for Fk is taken to be Gamma with mean FPRIORMEAN, and standard deviation FPRIORSD. 
Our default settings place a lot of weight on small values of F . We find that this makes the algorithm sensitive to subtle structure, but at 
some increased risk of overestimating K (Falush et al., 2003a). -->
    					<parameter type="Float">
						<name>set_fpriormean</name>
						<attributes>
							<prompt>Set mean FPRIORMEAN for Fk (FPRIORMEAN)</prompt>
							<group>33</group>
							<format>
								<language>perl</language>
								<code>($set_alpha) ? "fpriormean=$value,":""</code>
							</format>
							<comment>
								<value>The prior for Fk is taken to be Gamma with mean FPRIORMEAN, and standard deviation FPRIORSD. 
Our default settings place a lot of weight on small values of F . We find that this makes the algorithm sensitive to subtle structure, but at 
some increased risk of overestimating K (Falush et al., 2003a)</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- FPRIORSD (double) See FREQSCORR. The prior for Fk is taken to be Gamma with mean FPRIORMEAN, and standard deviation FPRIORSD. 
Our default settings place a lot of weight on small values of F . We find that this makes the algorithm sensitive to subtle structure, but at 
some increased risk of overestimating K (Falush et al., 2003a). -->
    					<parameter type="Float">
						<name>set_fpriorsd</name>
						<attributes>
							<prompt>Set std deviation for Fk (FPRIORSD)</prompt>
							<group>33</group>
							<format>
								<language>perl</language>
								<code>($set_alpha) ? "fpriorsd=$value,":""</code>
							</format>
							<comment>
								<value>The prior for Fk is taken to be Gamma with mean FPRIORMEAN, and standard deviation FPRIORSD. 
Our default settings place a lot of weight on small values of F . We find that this makes the algorithm sensitive to subtle structure, but at 
some increased risk of overestimating K (Falush et al., 2003a)</value>
							</comment>
						</attributes>			
					</parameter>
					
<!-- ONEFST (Boolean) Assume the same value of Fk for all populations (analogous to Wright’s traditional FST ). This is not recommended for most 
data, because in practice you probably expect different levels of divergence in each population. When K = 2 it may sometimes be difficult to 
estimate two values of FST separately (but see Harter et al. (2004)). When you’re trying to estimate K, you should use the same model for all K 
(we suggest ONEFST=0). -->
    					<parameter type="Switch">
						<name>use_onefst</name>
						<attributes>
							<prompt>Assume the same value of Fk for all populations (ONEFST)</prompt>
							<group>26</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "onefst=1," :""</code>
							</format>
							<comment>
								<value>Assume the same value of Fk for all populations (analogous to Wright’s traditional FST ). This is not recommended for most 
data, because in practice you probably expect different levels of divergence in each population. When K = 2 it may sometimes be difficult to 
estimate two values of FST separately (but see Harter et al. (2004)). When you’re trying to estimate K, you should use the same model for all K 
(we suggest ONEFST=0).</value>
							</comment>
						</attributes>			
					</parameter>			


<!-- LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the linkage model is used, the switch rate r is taken to have a uniform prior
 on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  
 used. Using prior population  information  (USEPOPINFO). -->

<!-- 					<parameter type="Float">
						<name>set_log10rmin</name>
						<attributes>
							<prompt>Set min prior for switch rate r (LOG10RMIN)</prompt>
							<group>36</group>
							<format>
								<language>perl</language>
								<code>($set_log10rmin) ? "log10rmin=$value,":""</code>
							</format>
							<comment>
								<value>LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the linkage model is used, the switch rate r is taken to have a uniform prior
 on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  
 used. Using prior population  information  (USEPOPINFO).</value>
							</comment>
						</attributes>			
					</parameter>	 -->

<!--  					<parameter type="Float">
						<name>set_log10rmax</name>
						<attributes>
							<prompt>Set max prior for switch rate r (LOG10RMAX)</prompt>
							<group>37</group>
							<format>
								<language>perl</language>
								<code>($set_log10rmax) ? "log10rmax=$value,":""</code>
							</format>
							<comment>
								<value>LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the linkage model is used, the switch rate r is taken to have a uniform prior
 on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  
 used. Using prior population  information  (USEPOPINFO).</value>
							</comment>
						</attributes>			
					</parameter> -->
					
<!-- 					<parameter type="Float">
						<name>set_log10rstart</name>
						<attributes>
							<prompt>Set start value for switch rate r (LOG10RSTART)</prompt>
							<group>39</group>
							<format>
								<language>perl</language>
								<code>($set_log10rstart) ? "log10rstart=$value,":""</code>
							</format>
							<comment>
								<value>LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the linkage model is used, the switch rate r is taken to have a uniform prior
 on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  
 used. Using prior population  information  (USEPOPINFO).</value>
							</comment>
						</attributes>			
					</parameter>	 -->

<!-- 					<parameter type="Float">
						<name>set_log10propsd</name>
						<attributes>
							<prompt>Set standard deviation for switch rate r (LOG10PROPSD)</prompt>
							<group>38</group>
							<format>
								<language>perl</language>
								<code>($set_log10propsd) ? "log10propsd=$value,":""</code>
							</format>
							<comment>
								<value>LOG10RMIN,  LOG10RMAX,  LOG10PROPSD,  LOG10RSTART  (double)  When the linkage model is used, the switch rate r is taken to have a uniform prior
 on a log scale, between LOG10RMIN and LOG10RMAX. These values need to be set by the user to make sense in terms of the scale of map units being  
 used. Using prior population  information  (USEPOPINFO).</value>
							</comment>
						</attributes>			
					</parameter>	 -->	
 
<!-- GENSBACK (int) This corresponds to G (Pritchard et al., 2000a). When using prior population information for individuals (USEPOPINFO=1), the 
program tests whether each individual has an immigrant ancestor in the last G generations, where G = 0 corresponds to the individual being an 
immigrant itself. In order to have decent power, G should be set fairly small (2, say) unless the data are highly informative.-->
<!--					<parameter type="Integer">
						<name>set_gensback</name>
						<attributes>
							<prompt>Set  value for G (GENSBACK)</prompt>
							<group>40</group>
 	  	  				
							<precond>
								<language>perl</language>
								<code>$use_popinfo</code>
							</precond>    
							<format>
								<language>perl</language>
								<code>($set_gensback) ? "gensback=$value,":""</code>
							</format>
							<comment>
								<value>This corresponds to G (Pritchard et al., 2000a). When using prior population information for individuals (USEPOPINFO=1), the 
program tests whether each individual has an immigrant ancestor in the last G generations, where G = 0 corresponds to the individual being an 
immigrant itself. In order to have decent power, G should be set fairly small (2, say) unless the data are highly informative.</value>
							</comment>
						</attributes>			
					</parameter>	-->

<!-- MIGRPRIOR (double) Must be in [0,1]. This is ν in Pritchard et al. (2000a). Sensible values might be in the range 0.001—0.1. -->
<!--  					<parameter type="Float">
						<name>set_migrprior</name>
						<attributes>
							<prompt>Set migration prior (MIGRPRIOR)</prompt>
							<group>41</group>
							<format>
								<language>perl</language>
								<code>(defined $set_migrprior) ? "migrprior=$value,":""</code>
							</format>
							<ctrls>
								<ctrl>
									<message>Please enter a value that is greater than zero, and less than 1</message>
									<language>perl</language>
									<code>$set_migrprior &gt; 1 || $set_migrprior &lt; 0 </code>
								</ctrl>
							</ctrls>
							<warns>
								<warn>
									<message>The value you have entered for migration prior is outside the recommended range</message>
									<language>perl</language>
									<code>defined $set_migrprior &amp;&amp; ($set_migrprior &gt; 0.1 || $set_migrprior &lt; 0.001) </code>
								</warn>
							</warns>
							<comment>
								<value>MIGRPRIOR (double) Must be in [0,1]. This is ν in Pritchard et al. (2000a). Sensible values might be in the range 0.001—0.1. </value>
							</comment>
						</attributes>			
					</parameter>	-->

<!-- PFROMPOPFLAGONLY (Boolean) This option, new with version 2.0, makes it possible to update the allele frequencies, P , using only a prespecified
subset of the individuals. To use this, include a POPFLAG column, and set POPFLAG=1 for individuals who should be used to update P , and POPFLAG=0 
for individuals who should not be used  to update P .  This can be used both with, or without USEPOPINFO turned on. 
This option will be useful, for example, if you have a standard reference set of individuals from known populations, and then you want to estimate the ancestry of some unknown individuals. Using this option, the q estimate for each unknown individual depends only on the reference set, and not on the other unknown individuals in the sample. This property is sometimes desirable.
LOCPRIOR model for using location information. -->
<!--   					<parameter type="Switch">
						<name>use_pfrompopflagonly</name>
						<attributes>
							<prompt>Update the allele frequencies, P , using only a prespecified subset of the individuals (PFROMPOPFLAGONLY)</prompt>
							<group>42</group>>
	  						<precond>
								<language>perl</language>
								<code>$set_popflag</code>
							</precond>
							<format>
								<language>perl</language>
								<code>( $value) ? "pfrompopflagonly=1," :""</code>
							</format>
							<comment>
<value>This option, new with version 2.0, makes it possible to update the allele frequencies, P , using only a prespecified subset of the individuals. 
To use this, include a POPFLAG column, and set POPFLAG=1 for individuals who should be used to update P , and POPFLAG=0 for individuals who should 
not be used  to update P. This can be used both with, or without USEPOPINFO turned on.  This option will be useful, for example, if you have a 
standard reference set of individuals from known populations, and then you want to estimate the ancestry of some unknown individuals. Using this 
option, the q estimate for each unknown individual depends only on the reference set, and not on the other unknown individuals in the sample. 
This property is sometimes desirable. LOCPRIOR model for using location information.</value>
							</comment>
						</attributes>			
					</parameter>  -->

<!-- LOCISPOP (Boolean) This option instructs the program to use the PopData column in the input file as location data when the LOCPRIOR model is 
turned on. When LOCISPOP=0, the program requires a LocData column to use LOCPRIOR. -->
 <!-- 					<parameter type="Switch">
						<name>use_locispop</name>
						<attributes>
							<prompt>Use the PopData Column for Location data (LOCISPOP)</prompt>
							<group>43</group>
 locprior must be 1 
	  	  					<precond>
								<language>perl</language>
								<code>$use_locprior</code>
							</precond>
							<format>
								<language>perl</language>
								<code>( $value) ? "locispop=1," :""</code>
							</format>
							<comment>
<value>This option instructs the program to use the PopData column in the input file as location data when the LOCPRIOR model is 
turned on. When LOCISPOP=0, the program requires a LocData column to use LOCPRIOR. </value>
							</comment>
						</attributes>			
					</parameter> -->

<!-- LOCPRIORINIT (double) Initial value for the LOCPRIOR parameter r, that parameterizes how informative the populations are (citepHubiszEtAl09). 
We found that LOCPRIORINIT=1 helped achieve good convergence. -->
<!--  					<parameter type="Float">
						<name>set_locpriorinit</name>
						<attributes>
							<prompt>Initial value for the LOCPRIOR parameter r (LOCPRIORINIT)</prompt>
							<group>44</group>
							<format>
								<language>perl</language>
								<code>(defined $set_locpriorinit) ? "locpriorinit=$value,":""</code>
							</format>
							<vdef>
								<value>1</value>
							</vdef>
							<comment>
								<value>Initial value for the LOCPRIOR parameter r, that parameterizes how informative the populations are (citepHubiszEtAl09). 
We found that LOCPRIORINIT=1 helped achieve good convergence.</value>
							</comment>
						</attributes>			
					</parameter> -->

<!-- MAXLOCPRIOR (double) Range of r is from (0,MAXLOCPRIOR). We suggest MAXLOCPRIOR=20. -->
<!--  					<parameter type="Float">
						<name>set_maxlocprior</name>
						<attributes>
							<prompt>Maximum value for the LOCPRIOR parameter r (MAXLOCPRIOR)</prompt>
							<group>45</group>
< 	  	  				<precond>
								<language>perl</language>
								<code>$use_inferalpha</code>
							</precond>    
							<format>
								<language>perl</language>
								<code>(defined $set_maxlocprior) ? "maxlocprior=$value,":""</code>
							</format>
							<vdef>
								<value>20</value>
							</vdef>
						</attributes>			
					</parameter> -->


		</parameters>
	</paragraph>
</parameter>

<!--  Output options  -->
<parameter type="Paragraph">
	<paragraph>
		<name>output_options</name>
		<prompt>Output Options</prompt>
			<parameters>
			
<!-- PRINTNET (Boolean) Print the “net nucleotide distance” between clusters. This distance between populations A and B, DAB, is calculated. 
In words, the net nucleotide distance is the average probability that a pair of alleles, one each from populations A and B are different, 
less the average within-population heterozygosities. Perhaps more intuitively, this can be thought of as being the average amount of pairwise
difference between alleles from different populations, beyond the amount of variation found within each population. 
The distance has the appropriate property that similar populations have distances near 0, and in particular, DAA = 0. 
Notice that the distance is symmetric, so that DAB = DBA. This distance is suitable for drawing trees of populations to help visualize the 
levels of difference among the clusters (Falush et al., 2003b). -->

<!-- 					<parameter type="Switch">
						<name>set_printnet</name>
						<attributes>
							<prompt>Print the net nucleotide distance between clusters (PRINTNET)</prompt>
							<group>70</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "printnet=1," :""</code>
							</format>
							<comment>
								<value>The distance between populations A and B, DAB, is calculated. In words, the net nucleotide distance is the average probability that a pair of alleles, 
								one each from populations A and B are different, less the average within-population heterozygosities. Perhaps more intuitively, this can be thought 
								of as being the average amount of pairwise difference between alleles from different populations, beyond the amount of variation found within each 
								population. The distance has the appropriate property that similar populations have distances near 0, and in particular, DAA = 0. Notice that the distance 
								is symmetric, so that DAB = DBA.This distance is suitable for drawing trees of populations to help visualize the levels of difference among the clusters (Falush et al., 2003b)</value>
							</comment>
						</attributes>			
					</parameter>  -->
<!--PRINTLAMBDA (Boolean) Print current value of λ to screen. not supported by cipres-->

<!--PRINTQSUM (Boolean) Print summary of current Q estimates to screen; this prints an average for each value of  PopData. not supported by cipres -->
<!--SITEBYSITE (Boolean) (Linkage model) Print a complete summary of assignment 
probabilities for every genotype in the data. 
This is printed to a separate file with the suffix “ss”. This file can be big!  -->
					<parameter type="Switch">
						<name>set_sitebysite</name>
						<attributes>
							<prompt>Print a complete summary of assignment probabilities for every genotype in the data (SITEBYSITE)</prompt>
							<group>71</group> 
 	  						<precond>
								<language>perl</language>
								<code>$use_linkagemodel</code>
							</precond>  
							<format>
								<language>perl</language>
								<code>( $value) ? "sitebysite=1," :""</code>
							</format>
							<comment>
								<value>Print a complete summary of assignment probabilities for every genotype in the data.
									This is printed to a separate file with the suffix “ss”. This file can be big! </value>
							</comment>
						</attributes>			
					</parameter>
					
<!--PRINTQHAT (Boolean) When this is turned on, the point estimate for Q is not only printed into the main results file, but also into a separate file with suffix “q”. This file is required in order to run the companion program   STRAT.  -->
					<parameter type="String" ishidden="1">
						<name>set_printqhat</name>
						<attributes>
							<prompt>Print Q estimates to a separate file with suffix q (PRINTQHAT)</prompt>
							<group>72</group>
							<format>
								<language>perl</language>
								<code>"printqhat=1,"</code>
							</format>
							<comment>
								<value>When this is turned on, the point estimate for Q is not only printed into the main results file, but also into a separate file with suffix “q”. This file is required in order to run the companion program   STRAT.  </value>
							</comment>
						</attributes>			
					</parameter>

<!--UPDATEFREQ (int) Frequency of printing updates to the screen. Set automatically if this =0.  not supported by CIPRES -->

<!--PRINTLIKES (Boolean) Print the current value of the likelihood to the screen in every iteration.  not dsupported by CIPRES -->

<!--INTERMEDSAVE (int) If you’re impatient to see  preliminary  results  before the  end  of the  run, you can have results printed to file at intervals during the MCMC run. A total of INTERMEDSAVE such files are printed, at equal intervals following the completion of the BURNIN. Turn  this off by  setting to 0.      Names of these  files  created using  OUTFILE name.  -->
<!-- 					<parameter type="Integer">
						<name>set_intermedsave</name>
						<attributes>
							<prompt>Print this many intermediate results (INTERMEDSAVE)</prompt>
							<group>74</group>
							<format>
								<language>perl</language>
								<code>(defined $set_intermedsave) ? "intermedsave=$value," :""</code>
							</format>
							<comment>
								<value>If you’re impatient to see  preliminary  results  before the  end  of the  run, you can have results printed
								to file at intervals during the MCMC run. A total of INTERMEDSAVE such files are printed, at equal intervals 
								following the completion of the BURNIN. Turn  this off by  setting to 0. Names of these  files  created using
								OUTFILE name.</value>
							</comment>
						</attributes>			
					</parameter>  -->
					
<!--ECHODATA (Boolean) Print a brief summary of the data set to the screen and output file. (Prints the beginnings and ends of the top and bottom lines of the input file to allow the user to check that it has been read correctly.)  -->
<!--  					<parameter type="Switch">
						<name>set_echodata</name>
						<attributes>
							<prompt>Print a brief summary of the data set to the output file (ECHODATA)</prompt>
							<group>75</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "echodata=1," :""</code>
							</format>
							<comment>
								<value>Print a brief summary of the data set to the screen and output file. (Prints the beginnings and ends of the top and 
								bottom lines of the input file to allow the user to check that it has been read correctly.) </value>
							</comment>
						</attributes>			
					</parameter> -->

<!--ANCESTDIST (Boolean) Collect information about the distribution of Q for each individual, as well  as just estimating the mean. When this is turned on,  the output file includes  the  left- and right-hand ends of the probability intervals  for each q(i).   (A probability interval      is the Bayesian analog of a confidence interval.)  The values printed show the middle 100p%  of the probability interval, where p is a number in the range 0.0 to 1.0 and is set using ANCESTPINT. The distribution of Q is estimated by recording the number of hits in each of a number of boxes between 0 and 1, to form a sort of histogram. The width of these boxes, which are of equal size, is set using NUMBOXES. -->
  					<parameter type="Switch">
						<name>set_ancestdist</name>
						<attributes>
							<prompt>Collect information about the distribution of Q for each individual (ANCESTDIST)</prompt>
							<group>76</group>
							<format>
								<language>perl</language>
								<code>( $value) ? "ancestdist=1," :""</code>
							</format>
							<comment>
								<value>Collect information about the distribution of Q for each individual, as well  as just estimating the mean. 
								When this is turned on, the output file includes the left- and right-hand ends of the probability intervals for each q(i).</value>
							</comment>
						</attributes>			
					</parameter>
			
		</parameters>
	</paragraph>
</parameter>

<!-- this should be the last of the input things -->					
<!-- 					<parameter issimple="1" type="String">
						<name>set_outputname</name>
						<attributes>
							<prompt>Name of the output file</prompt>
							<group>99</group>
							<format>
								<language>perl</language>
								<code>"-o $value"</code>
							</format>
							<vdef>
								<value>output.txt</value>
							</vdef>
							<ctrls>
								<ctrl>
									<message>Please specify a name for your output file</message>
									<language>perl</language>
									<code>!defined $set_outputname</code>
								</ctrl>
							</ctrls>
						</attributes>			
					</parameter>	  -->

</parameters>
</pise>


