Figure 1 the relationship between the per-site LRT and the P-value in a hypothetical case.

Consider the case of a dataset which has 100 characters (or sites in a sequence alignment) and the alternative tree has a log-likelihood that is 5.0 higher than the null hypothesized tree. If we do not use asymptotic assumptions to connect the 2*(the difference in log-likelihoods) to the chi-square distribution, then try to infer a null distribution based on the per-site differences in log-likelihoods.

If we construct the null distribution in this way, we find that the variability of the support across characters if very important to assessing the statistical significance of a log-likelihood difference.

Typically we want to test a null hypothesis that the two trees are equally good explanations of the data. Thus under the null, the expected mean difference log-likelihoods per-site should be 0. So, if we have a method of producing a plausible amount of variability in the per-site log-likelihood differences, we can generate a null distribution at the per-site level.

Because the total difference in log-likelhoods is simply the sum of the differences for all sites, we can get the null distribution of the total log-likelihood difference from the per-site null distribution.

Distribution of per-site differences:
Simulated from null. Difference =
Based on 0 replicates, the approximate P-value = ?

Thanks to


Further information on toplogical testing will be available in a forthcoming "Encyclopedia of Evolution" article by Emily Jane B. McTavish and Mark T. Holder.

Back to the demo table of contents...

Source code at

Thanks to the U.S. National Science Foundation and the Heidelberg Institute for Theoretical Studies for support.