BIOL 701 Likelihood Methods in Biology: Homework #4
Due Wednesday, March 30
Background:
You are interested in the proportion of crabs in a population that are
asymmetric with respect to claw size. A rival group has reported that half of
the crabs in this species are asymmetric. You would like to test the null
hypothesis that the proportion of asymmetric individuals is 0.5.
You sample individuals by placing a trap out, and checking it every day. You
catch one crab per day and release it after scoring it as 0 if its pincer lengths
are within 5% of each other, and 1 if they are more asymmetric
than that.
Here is your data for each day (conveniently formatted as a python list. The
same data appears at the end in a form that can be read by R):
data = [1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1,
1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]
This dataset consist of 100 observations. 40 of them correspond to "asymmetric."
You get back home and a colleague points out that the species that you worked
on has a strong tendency to learn about small areas that it considers to be
good hunting areas and return to those areas. So your 100 observations are not
necessarily independent - there is a good chance that the same individual will
return to the trap location and be caught again.
The crabs forget after a few hours. So if you do not catch the same crab
immediately then you probably won't recapture it during the rest of the study.
Model:
After further discussions, you decide that it is appropriate to analyze your data
if you consider the there is some probability that the individual that you capture
at day "i" is actually the same individual that was captured on day "i - 1".
Call the probability of recapture of the same individual on consecutive days "r"
and the probability of a randomly selected individual being asymmetric "a"
You are going to treat the recapture events as independent events.
So the probability that you recapture the same individual in (at least)
three consective days would simply be r*r, and the probability of catching the
same individual at least four times in a row would be r to the third power, etc.
You can assume that if you fail to recapture a crab on consective days then it
will be out of your study area for the remainder of the time.
Assignment:
Part 1. Describe how you will calculate the likelihood in written form. Any
combination of prose and equations is fine. The point of this part of the
problem is to describe the likelihood before you start to program it.
Part 2. Replace the log-likelihood calculation code in one of the "template"
scripts. You can use R or python (or any other computational tools you like).
Part 3. Replace the data simulation code so that you create data sets
according to the null model (a=0.5) with the value of "r" that best fits the data
when the null is enforced.
Part 4. Use your modified code to calculate the LRT statistic between for
the null hypothesis that a = 0.5 (which is what the "rival group" reported for
the probability of asymmetry).
Part 5. Run the parametric bootstrapping parts of the script which will use
your simulator to generate new realizations. What is the approximate P-value for
your test of the null hypothesis?
Computational issues and hints:
A python template to work from is posted at:
http://phylo.bio.ku.edu/slides/templateParametricBoot.py.txt
You'll have to change parts that are flagged with TEXT IN ALL CAPS.
After downloading the code, you can run it through python, assuming that you
have installed the SciPy package: a free download from http://www.scipy.org
Python is white-space sensitive (remember that the blocks of code are denoted
using indentation). Edit code in a plain text-editor, not a word processor.
Don't mix tabs and spaces.
For the template script, you can specify the parameters to constrain in the
null hypothesis from the command line. To indicate that a parameter is free
to vary in the null, then you pass in the word "None". The last argument
that you should supply on the command line is the number of parametric
bootstrapping replicates.
So, if you want to:
1. constrain the first parmeter to be 0.6 in the null hypotheses,
2. let the second parameter of the model vary in the null model, and
3. perform 1000 parametric bootstrapping replicates
then the invocation that you would type at the terminal prompt would is:
python templateParametricBoot.py 0.6 None 1000
On Windows:
Install python 2.6
Use can use the CMD.exe program to give you a command-line interface.
Use the "cd" to change the working directory of your terminal
Use the "dir" command lists the contents of your current directory
jEdit is a decent free text editor
On Linux:
- Use any standard terminal.
On Mac:
- Use /Applications/Utilities/Terminal
- TextWrangler is a good free text editor
On Linux or Mac:
- Use:
python --version
to see what version of python you have so that you can choose the appropriate
version of SciPy.
- Use the "cd" to change the working directory of your terminal
- Use the "ls" command lists the contents of your current directory
################################################################################
If you want to use R (or python) and read the data from a file, you can
save the content below the following line of # characters into a file, data.txt
so that R's
d = read.table("data.txt", header=TRUE);
syntax will read the tabular data.
################################################################################
asymm
1
0
0
0
0
0
1
1
1
1
0
0
1
0
0
1
1
0
0
0
1
1
1
1
0
1
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
1
1
0
1
1
0
1
1
1
0
0
0
1
1
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
0
1
1
1
0
0
0
0
0
1
1
0
0
1
1
0
1
1
1