Partitioned GARLI

An experimental version of GARLI that allows the use of partitioned likelihood models is available here.

Gap encoding

This version also allows simultaneous use of molecular sequence data and character data with an arbitrary number of states, for example morphological characters. One use of this functionality is to incorporate information about the location of gaps (indel events) in phylogenetic inference. Gaps are typically treated as equivalent to missing data, despite the fact that their locations convey some information about shared ancestry and phylogenetic relationships.

Such an analysis can easily be run on any existing sequence matrix by first uploading it to this Holder lab web service, which will automatically create a NEXUS formatted file containing characters blocks for both the input matrix and a corresponding "gap-encoded" binary character matrix. The binary characters represent either the presence of a gap (state 0) or the presence of a base (state 1) at a given cell in the sequence matrix. Only variable columns (i.e., those with a gap) will be output.

Phylogenetic analyses can then be performed in GARLI. Any nucleotide, amino acid or codon model may be applied to the sequence data, with the "Mkv" model (Lewis, 2001 "A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Data") applied to the binary gap data. An appropriate GARLI configuration file is here.

Note that the properties of this sort of analysis are not well studied, and we make no claims regarding phylogenetic accuracy. A possible shortcoming of the method is that it treats all gaps as single character events. Thus, a long insertion or deletion will be treated as many individual characters and will be given excessive weight. This factor will become less important when indel events are of shorter length.