GeLL
GeLL is a General Likelihood Library for
use in phylogenetics. It is intended to provide classes that facilitate
investigating new models and techniques. It is not intended for everyday
use in phylogenetic studies as it is optimised for easy use of new models
rather than speed.
GeLL achieves this versatility by allowing models to be defined using
an array of Strings where each string represents an equation. This allows
new models to be defined quickly and easily. The driver executable included
allows many different models to be run simply by changing the text file
defining the model.
Packages
Show
Hide
A quick start guide, including instalation instructions, is avaliable.
Click here to get it. It is also included in every download package.
(quickstart.pdf).
Capabilities
Show
Hide
GeLL has the following capabilities:
-
Likelihood - Using the method of Felsenstein 1981[1]. Unobserved
data can also be accounted for, per Felsenstein 1992[2]. Confidence intervals
for a parameter can also be calculated as discussed in Felsenstein 2003[3].
-
Ancestral Reconstruction
-
Marginal Reconstruction - Using the basic method of Yang,
Kurma and Nei 1995[4].
-
Joint Reconstruction - Using the methods of Pupko et al 2000[5] and
Pupko et al 2002[6].
-
Simulated Data - Using the standard methods as described, for example,
in Yang 2006[7].
-
Non-reversible Processes - Using the method described by Boussau and Gouy 2006[8].
Packages
Show
Hide
GeLL comes in three different packages:
- Basic Package - This includes just the JAR file
and the javadoc documentation
- Source Package - This additionally includes the
source code
- Test Package - This additionally includes both
the source code and test classes
Documentation
Show
Hide
GeLL comes with full javadoc documentation in the javadoc
directory here.
GeLL includes a driver that should be capable of doing
most basic computations. This is documented in the driver
documentation here.
The GeLL packages also include an example program. As well
as the source code file an annotated version of the source is
here. If you are thinking of writing
your own program using the GeLL library it is recommended you
take a look at this example.
The core capability of GeLL is the ability to define and use
new models. As such how models are defined in covered in more
details here. Also included is a
more in-depth discussion of how parameters are defined and
constrained.
A description of the test cases is included
here in the test package.
Tips & Tricks
Show
Hide
- Gaps
Gaps can be dealt with in two ways. They can be treated
as an extra state in which case the extra state should simply
be added to the model. To sum across all states, like many
models do, simply define the gap character as being an ambiguous
character that could be any one of the states.
Citation
Show
Hide
Money, D. and Whelan, S. (2012) GeLL: Generalized Likelihood Library
(http://phylo.bio.ku.edu/GeLL)
Known Issues
Show
Hide
- Eigendecomposition
Eigendecompositions may not always converge and many not throw
an error. The eigendecomposition code is taken from
JAMA and
has not been debugged. This seems to especially be the case for
gene family size models.
- Parameters (1)
There is an issue where using the same Parameters object in two or more
calls to different Calculator objects will cause an error. This
can be solved by passing a cloned copy of the parameters in the second
- Parameters (2)
The way branch lengths are treated as parameters is a little odd
and could well cause confusion. It is intended to overhaul
this in a future version.
- Exceptions
Some exceptions are thrown in circumstances where it is known
that the exception should not occur, normally because we know
that the calling class has already checked (either implicitly
or explicitly) that the exception can't occur. It is intended
to fix this in a later release.
- Threaded Computation - Likelihood
Currently threaded computation is always used for likelihood
computation. It is known that this can be slower for small
trees and small state-space models. However, it has not been
yet been determined under what conditions threaded calculation
is slower so, for the moment, it is always used.
- Threaded Computation - Ancestral reconstruction
At the moment ancestral reconstruction is not threaded.
- Optimising No Parameters
Trying to optimise a parameters object with no optimisable
parameters (i.e. they are all fixed values) can result in an
infinite loop.
Release Notes
Show
Hide
1.0 2 February 2012
1.1 4 April 2012
- Not publicly release due to memory problems
- Added support for model hypothesis testing
- Allow sites to be in different class
1.2 23 April 2012
- Not (quite) backwards compatible although only minimal changes should be necessary
- Changes to internal storage structures to sort memory problems
- Large amounts of code optimisation
1.3 14 August 2012
- Not (quite) backwards compatible although only minimal changes should be necessary
- Added Conjugate Gradient search
- Added an ID field to Site
- Added split support to Tree and also the ability to calculate
various distances between trees (RF, weighted RF and branch score).
- Minor Changes:
- Optimiser don't store node likelihood until after optimisation
is complete to save memory.
- Attempt to make calculating distributions using the
"repeated" method more stable.
- Fixed bugged in test for quasi-stationary distribution
convergence.
- Parameters toString method now only outputs estimated
parameters by default.
- Ability to write sequence alignments in Phylip Alignments
in PAML format, i.e. taxa names have a maximum of 15 characters.
This is buggy as names are simply truncated and duplicated names
are not checked for.
- Fix to PossibleSettings which meant settings in no group did
not work properly.
- Parameter now throws an error if an attempt is made to set it
to an invalid value.
- Additional error checking in likelihood calculations to check
for probabilities greater than one and other errors.
- Various updates to throw statements caused by the above.
- Moved the storing of whether matrices need to be recalculated
from Parameters to RateCategory. This is more inituative as the
what parameters are in the rate category decides when changes to
parameters needs matrix recalculations. Should, hopefully, be largely
transparent.
2.0 14 April 2014
- Not (quite) backwards compatible although only minimal changes should be necessary
- Added the ability to work with very small likelihoods. This
is optional as it results in a significant slow down. There
is a slight performance decrease even when the small option
is not being used.
- Added the ability to use the FitzJohn (2009) method as the
root distribution.
- Major changes to how calculators etc are structured to
allow more unusual ways of calculating likelihoods.
- Removed code to do with Constraints as it was implemented
in a very odd way and was no longer needed.
- Added BDIE (Birth-Death-Innovation-Extinction) and
Birth-Death (with no zero state) models to the duplication
model factory.
- Minor Changes:
- Removed ArrayMap as it was not providing any speed
increase so was just complicating the code.
- Removed ToDoubleHashMap, again as it was complicating
the code.
- Removed deprecated SequenceAlignment. PhylipAlignment
should be used instead.
- Removed unused MapUtils.
- Added an additional example executable that implements
the model of chromosome evolution of Mayrose et al
(2009). This example shows how to implement a new model.
- DNAModel factory now have two options to create models.
One functions takes in and updates the parameters list
with relevant parameters whereas the other just creates
the model.
- Code now automatically checks whether a parameter is
in the matrix so it no longer part of the Parameter.
Relevant constructors removed.
- Added an ability to generate a random unrooted tree.
License
Show
Hide
GeLL is avaliable under GPL v3.
The EigenvalueDecomposition
and LUDecomposition
classes are from JAMA
and as such are in the public domain.
The test package contains both executables and example datasets from
PAML. This
example dataset is also used for the example execution using
Example.java
. The applicable license for these components is as
follows:
© Copyright 1993-2008 by Ziheng Yang
The software package is provided "as is" without warranty of any kind.
In no event shall the author or his employer be held responsible for any
damage resulting from the use of this software, including but not limited
to the frustration that you may experience in using the package. The program
package, including source codes, example data sets, executables, and this
documentation, is distributed free of charge for academic use only.
Permission is granted to copy and use programs in the package provided no
fee is charged for it and provided that this copyright notice is not removed.
Contact
Show
Hide
References
Show
Hide
- Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood
approach. Journal of Molecular Evolution, 17:368–376.
- Felsenstein, J. 1992. Phylogenies from restriction sites: A Maximum-Likelihood
approach. Evolution, 46:159–173.
- Felsenstein, J. 2003. Inferring phylogenies. Sinauer Associates Sunderland, Mass., USA.
- Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide
and amino acid sequences. Genetics, 141:1641–1650.
- Pupko, T., I. Pe, R. Shamir, and D. Graur. 2000. A fast algorithm for joint reconstruction
of ancestral amino acid sequences. Molecular Biology and Evolution, 17:890–896.
- Pupko, T., I. Pe’er, M. Hasegawa, D. Graur, and N. Friedman. 2002. A branchand-
bound algorithm for the inference of ancestral amino-acid sequences when the
replacement rate varies among sites: Application to the evolution of five gene families.
Bioinformatics, 18:1116–1123.
- Yang, Z. 2006. Computational molecular evolution. Oxford University Press, USA.
- Boussau, B. and Gouy, M. 2006 Efficient Likelihood Computations with Nonreversible
Models of Evolution. Systematic Biology, 55:756-768.