GeLL Example

Here we work through a example use of GeLL code explaining the various components. This is based on example.java included in the distributed packages. Comments have been removed, file names shortened and the taxa names in the Newick tree shortened to allow for a smaller code snippet. In the descriptions the input to functions are not included again for the sake of brevity. Please follow the links for a full description of each function.

public static void main(String[] args) throws Exception { Tree t = Tree.fromNewickString ("(((Hum, Chi)A, Gor)B, Ora, Gib)C;") Alignment a = PhylipAlignment.fromFile(new File("b.nuc")); Parameters p = t.getParametersForEstimation(); Model m = DNAModelFactory.GTR_Gamma(p, 4); StandardCalculator c = new StandardCalculator(m,a,t); Optimizer o = new GoldenSection(); StandardLikelihood l = o.maximise(c, p); p = l.getParameters(); System.out.println("Likelihood: " + l.getLikelihood()); System.out.println(); System.out.println("Parameters:"); System.out.println(p); AncestralJoint aj = AncestralJoint.newInstance(m, a, t); Alignment anc = aj.calculate(p); PhylipAlignment.writeFile(anc, new File("ancestor.dat")); Simulate s = new Simulate(m,t,p); Alignment sim = s.getAlignment(500); PhylipAlignment.writeFile(sim, new File("simulated.dat")); }

Code Explanation

Mouse over any of the highlighted code to see an explanation of it here.

Tree

All of the computation classes of GeLL make use of a tree object. The easiest way to input a tree is in Newick format. The Newick string can either be passed directly to Tree.fromNewickString() or be read from a file using Tree.fromFile().

If the Newick string does not contain names for the internal nodes default names are assigned of the form of an underscore followed by a number, e.g. _1.

Alignment

As well as a tree most GeLL computations need to be supplied with an alignment. Alignments can be read in three formats, each of which has their own class for reading and writing the alignment. In this example we use PhylipAlignment , but FastaAlignment or DuplicationAlignment could also have been used in the same way.

Parameters

A Parameters object contains information on the parameters to be used in the calculation - namely the model parameters and the branch lengths.

Rather than defining each branch length parameter individually the set of branch length parameters for a tree can be returned with a call to getParameters() or getParametersForEstimation() - the first sets the parameter value for any branch lengths given in the tree where as the second returns parameters that will be estimated even if a branch length was provided.

More information on parameters can be found on the Models and Parameters page.

Model

If you wish to use one of the models pre-programmed models then it is simply a case of using one of the packaged model factories - either DNAModelFactory or DuplicationModelFactory. Passing a parameters object to the function, as here, will populate the parameters objects with the parameters that model needs and is the easiest way to do simple calculations.

Defining new models is one of the key capabilities of GeLL. To find out more information on how to use other than pre-programmed models please see the Models and Parameters page.

Calculator

The calculator class is the class that is used to do the actual likelihood calculation. This standard calculator should suffice for the vast majority of cases and it will work on any model that can be defined. It should only be necessary to create bespoke calculators for instances where the calculation does not follow the standard Felsenstein method (e.g. the Rivas-Eddy method).

The calculator is passed a model (m), alignment (a) and tree (t) at creation and then it can be used to perform likelihood calculations used different parameters.

Optimizer

GeLL has three built-in optimisation methods - GoldenSection, NelderMead and ConjugateGradient. Golden Section is the most mature of these and so most likely to produce the expected result but it may not be the fastest. All methods can sometimes not optimise properly for specific datasets so it sometimes necessary to use different methods.

It is possible to create custom optimisation methods. Please either contact the author or read the javadoc and look at the existing optimisers.

Likelihood

By passing a calculator and a set of parameters to an optimisers maximize function the optimiser will attempt to optimise the parameters to find the maximum likelihood. It then returns a likelihood object which contains information on the likelihood obtained and the parameters values used to calculate that likelihood (which is this case will be optimised parameters. Here we replace the unoptimised parameters with the optimised parameters for use in the following calculations by using getParameters() to get the optimised parameters from the likelihood object.

Output

In this section we output the optimised likelihood and the parameters that produced that likelihood. The getLikelihood() function of the likelihood returns the likelihood as a double which is then printed. Parameters has a toString() function that prints each parameter value, one per line, so the line here outputs all the parameters to the screen.

Ancestor

GeLL includes two methods for calculating ancestral sequences joint ( AncestralJoint) and marginal (AncestralMarginal). Both can be created by calls to there newInstance() static method and this takes as input a model, alignment and tree

Calling the calculate method calculates the ancestral sequences. It returns an alignment that contains both the original sequences and sequences for each internal node of the tree.

Simulate

The Simulate class can be used to simulate alignments based on a model (m), tree (t) and a set of parameters (p). Once created the object can be used to create alignments using its getAlignment() method. This method is passed the length of alignment to simulate.
Back to Top Level Documentation