Figure 1 the relationship between the per-site LRT and the P-value in a hypothetical case.

Consider the case of a dataset which has 100 characters (or sites in a sequence alignment) and the alternative tree has a log-likelihood that is 5.0 higher than the null hypothesized tree. If we do not use asymptotic assumptions to connect the 2*(the difference in log-likelihoods) to the chi-square distribution, then try to infer a null distribution based on the per-site differences in log-likelihoods.

If we construct the null distribution in this way, we find that the variability of the support across characters if very important to assessing the statistical significance of a log-likelihood difference.

Typically we want to test a null hypothesis that the two trees are equally good explanations of the data. Thus under the null, the expected mean difference log-likelihoods per-site should be 0. So, if we have a method of producing a plausible amount of variability in the per-site log-likelihood differences, we can generate a null distribution at the per-site level.

Because the total difference in log-likelhoods is simply the sum of the differences for all sites, we can get the null distribution of the total log-likelihood difference from the per-site null distribution.

Distribution of per-site differences:
Simulated from null. Difference =
Based on 0 replicates, the approximate P-value = ?









Thanks to http://bl.ocks.org/mbostock/3048450

.


Further information on toplogical testing will be available in a forthcoming "Encyclopedia of Evolution" article by Emily Jane B. McTavish and Mark T. Holder.

Back to the demo table of contents...

Source code at https://github.com/mtholder/mephytis

Thanks to the U.S. National Science Foundation and the Heidelberg Institute for Theoretical Studies for support.