Consider the case of a dataset which has 100 characters (or sites in a sequence alignment) and the alternative tree has a log-likelihood that is 5.0 higher than the null hypothesized tree. If we do not use asymptotic assumptions to connect the 2*(the difference in log-likelihoods) to the chi-square distribution, then try to infer a null distribution based on the per-site differences in log-likelihoods.
If we construct the null distribution in this way, we find that the variability of the support across characters if very important to assessing the statistical significance of a log-likelihood difference.
Typically we want to test a null hypothesis that the two trees are equally good explanations of the data. Thus under the null, the expected mean difference log-likelihoods per-site should be 0. So, if we have a method of producing a plausible amount of variability in the per-site log-likelihood differences, we can generate a null distribution at the per-site level.
Because the total difference in log-likelhoods is simply the sum of the differences for all sites, we can get the null distribution of the total log-likelihood difference from the per-site null distribution.
|Distribution of per-site differences:|
|Simulated from null. Difference =|
replicates, the approximate |
Thanks to http://bl.ocks.org/mbostock/3048450.
Further information on toplogical testing will be available in a forthcoming "Encyclopedia of Evolution" article by Emily Jane B. McTavish and Mark T. Holder.
Back to the demo table of contents...
Source code at https://github.com/mtholder/mephytis
Thanks to the U.S. National Science Foundation and the Heidelberg Institute for Theoretical Studies for support.