GeLL comes with a driver that should be capable of doing many basic analyses. The usage of the driver and the format of associated files is described here.
java -jar GeLL.jar settings {var1 var2 var3 ...}
where settings
is the name of a settings file.
var1
etc are variables that can be "passed" to the setting
file. A $
followed by a number in the settings file will
be replaced by the corresponding command line argument. The format of
the settings file is described below.
A run of the of the driver is controlled by the settings file.
The settings file has four options sections. The start of each of
these sections should begin with the sections name in square brackets,
e.g. [Control]
. Each section is optional. The
control section contains general control sections while the
likelihood section controls likelihood optimisation. The ancestral
and simulation sections control the expected processes. Although
each section is optional if a section it may have settings that
must be set. These are shown by the darker background.
Control section
DebugLevel |
The amount of debug information that is displayed when
an error occurs. Valid values are:
|
---|---|
DebugFile | File to log debug information to. If no file is given debug information is printed to screen. |
Distributions |
How stationary and quasi-stationary distributions are
calculated. Valid values are:
|
MatrixExponentation |
How matrix exponentiations are calculated. Valid values
are:
|
ForceSquare | The minimum number of repeated squaring steps to use when calculating matrix exponentiations using the Taylor method. Defaults to 0. |
Likelihood section
AlignmentType | The type of alignment input. See alignment files below
for a description of the file formats. Valid values are:
|
---|---|
Alignment | Path to the alignment file. |
TreeInput | Path to the input tree file. This file should contain one line containing a tree in Newick format. |
Model | Path to the model file. See the Model file description below for format. |
ParameterInput | Required unless Restart is used.
Path to the parameters input file. See the Parameter
file description below for format. |
Ambig | Path to a file describing any ambiguous states in the alignment. |
Missing | Path to an alignment that gives the unobserved data. In the same format as the alignment. |
MissingAmbig | Path to a file describing any ambiguous states in the missing alignment. |
Optimizer | The optimiser to use. Valid values are:
|
Checkpoint | File to write checkpoints to. This allows the optimization
to be restarted using the Restart setting should
it be interrupted. |
CheckpointFreq | How often (in minutes) the checkpoint file should be written. |
Restart | Checkpoint file to restart optimisation from. |
TreeOutput | File to output the estimated tree to. If this option is not given then no output is written. |
ParameterOutput | File to output the estimated parameters to. If this option is not given then no output is written. |
Rescale | Whether to rescale the matrix to one event pet time unit. Any value beginning with f is false, all other values are true. Defaults to true. |
OptimizeTree | Whether to optimize the tree branch lengths or use those provided. Any value beginning with f is false, all other values are true. Defaults to true. |
Ancestral section
AlignmentType | Required if no Likelihood section. Same meaning as in Likelihood section. |
---|---|
Alignment | Required if no Likelihood section. Same meaning as in Likelihood section. |
Tree | Required if no Likelihood section. Same meaning as
TreeInput in Likelihood section. |
Model | Required if no Likelihood section. Same meaning as in Likelihood section. |
Parameters | Required if no Likelihood section. Same meaning as
ParameterInput in Likelihood section. |
Type | The type of reconstruction to do. Valid values are:
|
Output | File to write the reconstructed alignment to. |
Simulate section
AlignmentType | Required if no Likelihood section. Same meaning as in Likelihood section. |
---|---|
Tree | Required if no Likelihood section. Same meaning as
TreeInput in Likelihood section. |
Model | Required if no Likelihood section. Same meaning as in Likelihood section. |
Parameters | Required if no Likelihood section. Same meaning as
ParameterInput in Likelihood section. |
Missing | Path to an alignment that gives the unobserved data. In the same format as the alignment. |
Length | The length of the simulate alignment. |
Output | File to write the reconstructed alignment to. |
Alignment files can be in one of two different formats:
*class*
is
assumed not to be a taxa but rather gives the class of each site (which can be any
single character).*class*
is
assumed not to be a taxa but rather gives the class of each site (which can be any
string).Each line represents a single parameter. Lines are tab separated. The first field is the type of the parameter and the second is the name of the parameter. Subsequent fields depend on the parameter type. Type values are:
EB -
Estimated bound parameter that is in a rate matrix. 3rd field is
the lower bound, 4th the upper.EP -
Estimated positive parameter that is in a rate matrix.E -
Estimated (unbounded) parameter that is in a rate matrix.F -
Fixed parameter. 3rd field is the value.The first line controls the type of model. Possible types and the subsequent format of the rest of the file are:
**G
followed by a tab,
followed by the number of categories desired.
This should be followed a tab and the parameter name the
alpha value is to be called by.**E
**F
To use different models for different site classes the format of this file is different. In this instance each line of the file will represent one class and will contain two fields tab separated. The first field is the class identifier while the second is the file name of a file in the normal model format (above) that defines the model for that class.
The file format is described below. See Equation Format below for a description of the format of the equations that can be in the rate matrix and root distribution.
**S
**Q
**F
File should be a tab delimited file with one ambiguous character per line. The first field on each line is the ambiguous character while also subsequent field represents a character that could be represented by it
Variables are represented by a letter followed by any number of alphanumeric
characters. Multiply (represented by *
) should be stated explicitly, e.g.
a * b
NOT a b
or ab
(the later of which would be parsed as a single variable).
Functions should be represented by f[a,b,...]
where
f
is the function name
and a
,b
etc. are inputs. Inputs cannot
contain other functions but can otherwise
contain an expression. The following functions are defined:
ln[a]
- The natural logarithmg[a,b,c]
- The rate modifier of the b
th
class of c
classes using a gamma
distribution with alpha value of a
as per Yang 1993.