BIOL 848: Phylogenetic Methods

Instructor: Mark Holder

Today's lab will be pretty short. We are going to cover how we can use PAUP and the combination of PAUP+CONSEL to perform some of the topology tests that we have talked about recently.

Get a data file

Download (again) the data file algae.nex.

PAUP's tree testing commands are in the -score commands: pscore, lscore and dscore.

As we discussed yesterday, the testing context is clearest when you have two trees before you conduct your analysis. Then you can see if you reject the null hypothesis that each of them would be just as good an explanation if we had all of the possible data in the world. Expressed this way, it may seem like a strange null hypothesis, but you can think of it this way. If you have 2 trees it is very probable that the dataset will support one of them more than the other (ties in scores are relatively rare). The question is can you reject the tree that appears to be suboptimal? Can you reject the null that suboptimal tree would in fact be just as good an explanation of the data if there were no sampling error?

The key is seeing if we can reject the null that the characters are being drawn from a distribution that does not favor one tree over the other. Any particular data set may favor one tree, but the expectation of the difference in score is 0.

I do not have two a priori trees to test for the algae.nex data set, so we will use to random trees.

Execute the data file in PAUP, and then, to get two trees in memory execute the command:

generate random  model=equiprobable ntrees=2
This draws two trees into memory and the distribution from which they are drawn places an equal probability on each tree topology. Use the PScore command to get their parsimony scores. If (by chance) you end up getting 2 trees that have the same score, use the generate command again to get a couple of new trees in memory (we are going to look at methods to test if we can reject a tree that appears to be suboptimal, and the answer will be pretty obvious if the trees appear to be equally good on our data set).

You might want them back so it is a good idea to save them to a file:

savetrees file = randalg.tre

Templeton and Winning sites tests

First lets look at the very general nonparametric tests that test the hypothesis that characters are equally likely to support each of the trees:

PScore / NonparamTest TestDetails
You should see a table that summarizes the results for all characters that differ in parsimony length between the two trees. You'll see the length on each tree, the difference in lengths, and finally the difference's "rank."

Ranking the differences means that PAUP sorts all of the length differences according to there absolute value (starting at one for a character that has the smallest difference). If some of your characters differ between the trees by 2 steps and others by 1, then you will see differences in the ranks. The numbers may look a little confusing; when more than one character are tied for the same difference, then the rank they are given is the mean rank that would be assigned if you could order them. So if there were 10 variable characters and 8 had an absolute value of 1, while one had a length difference of 2, and the other character showed a length difference of -2. Then a fully ordered list would be assign the ranks 1, 2, 3, 4, 5, 6, 7, and 8 to the characters with an absolute value of the difference of one, and the other two ranks would be 9 and 10. Because of the 8-way tie for the low ranks and the 2 way tie for the high ranks, we use the means. So the low ranked characters would have a rank of 4.5 (which is (1+2+3+4+5+6+7+8)/8), while the other characters would have a rank of 9.5 (the mean of 9 and 10).

Next in the output is PAUP's summary of the winning sites test and the Templeton test. The former test just treats the characters as binary decisions (for or against the first tree -- I used coin-flip analogy in lecture to describe this). The Templeton test is the one that uses the magnitude of the differences as encapsulated by ranking the absolute values of the differences.

Because the Templeton test uses more of the information, it tends to be more powerful. So it usually leads to lower p-values than the winning sites test. Was that the case for your data set?

KH Test

PScore / KHTest
to perform the KHTest using the normal approximation to assess significance. You should see length difference, and the standard deviation of the difference in length (where the standard deviation is calculated over the class of characters whose length differs between the trees).

When we use the Normal approximation for the KHTest, we transform the length difference by dividing it by the standard deviation. This results in a t-statistic. We can get the p-value from a t table (with the degrees of freedom determined by the number of sites that vary between the two trees). Make sure that you understand the output of the test.

Often the KHTest will be more powerful than the Templeton test because it exploits properties of the Normal distribution rather than trying to assume as little as possible (as the non-parametric tests do). Was the p-value lower for your two trees when you conducte the KHTest than the p-value from the Templeton test?

Tests for the ML criterion

KH Tests

Switch PAUP to likelihood (it is in the set command). Do a simple search to get a good tree, and then find an appropriate model by adding parameters and seeing if the increase in lnL justifies the addition of the parameters (you can use the AIC or the LRT to do this -- the AIC is easier to do in your head because the mean increase in lnL per parameter has to be greater than 1.0 to lead to a preference for the more complex model).

To speed of computation execute the LSet BaseFreq=prev RMat = prev so that PAUP does not re-estimate the base frequencies and the rate matrix for all of the subsequent commands.

When we are using ML then we have much larger set of sites that prefer one tree over another. In fact, it is quite common for all of the sites to display different likelihoods between the trees.

Why would a constant site show a higher likelihood on one tree than another?

Because we have a lot more sites that vary in score between trees, it becomes feasible to use bootstrapping to estimate the variance of the distribution of the difference in log-likelihoods (if we have a very small sample, bootstrapping is unreliable).

Why would a constant site show a higher likelihood on one tree than another?

Conduct use LScore ? to see the options and then see if you can conduct the KHTest using the normal approximation, the RELL bootstrap and the bootstrap with full optimization of likelihoods for each resampled dataset.

Do the results of the different forms of the KHTest agree? How do they compare in terms of running time?

The SH Tests

Generate 100 random trees and do the KH and SH Tests. Which test is more powerful? (which rejects more trees).

Important Note: that this is an inappropriate use of the KH Test because we don't have 2 trees a priori. PAUP is comparing the best of the 100 trees to each of other 99 trees and seeing if it can reject each of the other trees. The test is based on the expectation of the site lnL difference being 0, so we have violated the assumptions of the test by focussing on the best of the 100 trees.

Consel and the AU-Test

Consel is software written by Hidetoshi Shimodaira (the "S" is the SH-Test, and the author of the AU test).

Recall from lecture, that the SH test (and the AU test) require you to test the plausible set of trees. For the algae data set, there might be a wide range of trees that we would accept going in to the study (obviously the exact candidate set would depend on your botanical knowledge).

Imagine that our candidate set of trees before we analyze the data is any tree that has land plants monophyletic and vascular plants are monophyletic. To conduct the SH-test and the AU test, we'll need to get the candidate set of trees in memory, calculate the site likelihoods for the trees, and then feed those site likelihoods to CONSEL. Consel uses bootstrapping of these site-likelihoods (and multi-scale bootstrapping in the case of the AU test) to calculate a P-value for each tree in the candidate set.

In PAUP (assuming that you have not quit it, and it still has the algae data set in memory), you will need to use

command to get rid of the previous trees. Next we'll need to set a constraint to make sure that we only consider the trees that are part of our candidate set. It is easiest to do this if you know the numbers for the taxa:
TStatus full
will report the numbers.

You can use

constraint land (Monophyly) = YourNewickTreeDescriptionHere
to specify a constraint named "land". (Rice and Tobacco are the vascular plants, and the vascular plants + Marchantia are the land plants in this data set). Whenever you specify a constraint in PAUP, you should look at the tree to make sure that specified (use the "ShowConstr" command).

Paup can generate all of the trees that are compatible with a constraint:

generate all constraint=land
(don't do this with too many taxa in memory, or you won't be able to store all of the trees). Now we'll save the trees to a file so that we can look at them later (savetrees command).

Choose a simple model (such as HKY with the base frequencies parameters set to their empirical values) so that the tree scoring will not be too slow for the lab. After you've chosen the model, use:

lscore / sitelike = yes scorefile=algaesitelike.txt
To calculate the likelihood scores and save the site likelihoods to the file "algaesitelike.txt"

Reformatting fun

Unfortunately, Consel won't read the sitelikelihood output from the most recent versions of PAUP (it will read output from other software - including RAxML if you specify the "puzzle" option for consel). So we'll have to transform paup's output before we can proceed. I wrote a python script ( that will help. Download the script and run it from a terminal passing the path to algaesitelike.txt to it. It should write a file called "sitelikesforconsel.txt".

On Mac download the mac version of consel that I compiled. Windows users should grab this windows version of consel that I compiled with cygwin.

Consel has a pdf manual included which you may want to look at. Depending on where you unpack consel the directories for the next few commands may differ slightly (I wrote the commands assuming that you unpacked the archive such that they make a subdirectory called consel in the directory that you are working in). Basically want to run:

./consel/bin/makermt --paup sitelikesforconsel.txt 
./consel/bin/consel sitelikesforconsel
./consel/bin/catpv sitelikesforconsel
The first command formats the input for consel (into two scary binary files with extensions .vt and .rmt). consel does the hypothesis testing an stores the output in file with .pv and .ci extensions. Then catpv displays the output of consel in a more readable form.

We have not talked about the WeightedSH test, but the guide at should help you interpret the output (or you can ask me if you have questions).

Which trees can be rejected on the basis of this dataset?

Can you find the non-rejected trees in the tree file that you saved from PAUP?