SATé Tutorial

The main SATé site is http://phylo.bio.ku.edu/software/sate/sate.html. This page just has information about the tutorial (first given the Washington DC Workshop on May 21, 2012).

The pdf version of the tutorial is at http://phylo.bio.ku.edu/software/sate/sate_tutorial.pdf it contains step by step instructions.

Additional SATé Instructions:

To run the command-line version on Mac, you'll need to:

  1. Download setuptools for your version of python from http://pypi.python.org/pypi/setuptools#files.
  2. Install it by opening the terminal, using cd to get to the download directory, and running sh followed by the name of the setuptools egg
  3. follow the *nix instructions on page 6 of the tutorial. Mac comes with python, so you can skip the first step (the installation of python).

64-bit linux:

  1. On 64-bit Linux systems, the bundled 32-bit FastTree executables may not work. If you encounter problems when trying to use FastTree, you can download the source or 64-bit binaries from the FastTree website. Rename the newly downloaded or compiled binaries as "fasttree" (for the serial version) and "fasttreeMP" (for the multi-threaded version) and replace the executables of the same names in the sate-tools-linux directory.

You can browse the issues for developers at https://github.com/sate-dev/sate-core/issues?state=open Many of these came from the morning in the workshop.

Please join SATé User Group for announcements about the software


GARLI-2.0 Demo

For Derrick Zwickl's demo of GARLI and the DIMM model in GARLI, download http://phylo.bio.ku.edu/software/garli-SI.zip

Given the limited amount of time that we have to use GARLI today, the goal will be to get the program downloaded and do a quick set of runs with the gap models

Some general things to know about GARLI:

  • GARLI has no user interface, and is generally a command line program (although it can be double-clicked on Windows)
  • All configuration information is read from a text file, by default named garli.conf. The program will look for this file in the current directory, but any filename can be passed on the command line.
  • There are a lot of settings in the configuration file, but very few need to be changed for a basic analysis.
  • All settings are extensively documented on the GARLI wiki: http://www.nescent.org/wg_garli/Main_Page
  • Optionally, you can look at the full-length GARLI tutorial that is used in the Workshop on Molecular Evolution at http://phylo.bio.ku.edu/slides/GarliDemo/garliExerciseDC.html. The files for this optional demo are included with the package that you downloaded above.
Today's demo: Once you've download and expanded the above zip file, you'll see a number of directories. Our quick exercise with the gap models will be done in gapModels or gapModels-WIN. The other directories:
  • distributions: These are just the packages that one would get when downloading the program from its website (http://garli.googlecode.com), included here for your convenience.
  • bin: The executables for Windows and OS X are here, as well as some helper files.
  • generalTutorial: A more extensive tutorial that you could go through if you are interested in learning more about GARLI. Instructions ar available here: http://phylo.bio.ku.edu/slides/GarliDemo/garliExerciseDC.html
Preparing the data:

To use the DIMM or Mkv gap models we first need to create a "gap coded" matrix that is just a direct copy of the DNA matrix with gaps as "0" and bases as "1". This is done with an external program called "gapcode". Scripts are provided to format nexus or fasta datasets for you, and to start the analyses. For the demo the dataset we'll use is called forTutorial.nex.

  • Format data for gap analysis: From the command line in the gapModels or gapModels-Win directory, type ./prepareNexusData.sh forTutorial.nex or prepareNexusData.bat forTutorial.nex
  • This will create some alignments in the preparedData directory. (These files are actually already there in the demo package, in case you have difficulty running gapcode.)
  • You can also use these same scripts to prepare your own nexus or fasta alignment.
Running GARLI gap models:

Now we run the DIMM and Mkv gap analyses by using other scripts:

  • type ./runGarli.dna+gapModels.sh or runGarli.dna+gapModels.bat
  • This will do several GARLI searches on the same data. First a quick run will be done to generate starting trees for the other analyses. Then DNA only searches will be run, as well as the DIMM and Mkv gap model searches.
  • This will create output that you can look at in the garliOutput directory. Looking at the XXX.screen.log files will tell you about the details of the run and parameter estimates. The XXX.best.tre files contain the best trees found in each search.
  • Note that the DIMM model indicates its inferred root by adding a dummy outgroup taxon named ROOT in the tree files. On OS X and Linux the prepareData scripts will create alignment files containing this taxon in the preparedData directory. I haven't managed to automate this on Windows.
  • If you use the above prepareData scripts on your own dataset, these runGarli scripts will also work properly to automatically analyze your data.
That's it for this demo. Some final comments:
  • The gap models are still fairly experimental, so please let me know if you run into any problems or have any questions (garli.support@gmail.com)!
  • The DIMM model is currently very slow relative to the other models, but this will be improved.
  • I will hopefully change GARLI to do the gapcoding internally, so that the external data preparation step is not necessary.

Links

SATé
Datasetshttp://phylo.bio.ku.edu/software/sate/tutorial.zip
SATé user's grouphttp://groups.google.com/group/sate-user?hl=en
Other software for the demo
Mesquite projecthttp://mesquiteproject.org/mesquite/download/download.html
Mesquite manualhttp://mesquiteproject.org/Mesquite_Folder/docs/mesquite/manual.html
FigTreehttp://tree.bio.ed.ac.uk/software/figtree/
SuiteMSAhttp://bioinfolab.unl.edu/~canderson/SuiteMSA/
Java (if you don't have it)http://www.java.com/en/
Text editors
TextWrangler (Mac)http://www.barebones.com/products/TextWrangler/
Notepad++ (Win)http://notepad-plus-plus.org/
jEdithttp://www.jedit.org/
Installation for the command-line
Python (if you don't have it)http://www.python.org/
PIPhttp://pypi.python.org/pypi/pip#downloads
wxPython (if *nix)http://www.wxpython.org/
Other phylo (not needed for the demo)
BAliBASEhttp://www-bio3d-igbmc.u-strasbg.fr/balibase/
FastTreehttp://meta.microbesonline.org/fasttree/\#Support
MrBayeshttp://mrbayes.sourceforge.net/wiki/index.php/Introduction_3.2
MrBayeshttp://sourceforge.net/projects/mrbayes/files/
BAli-Phyhttp://www.biomath.ucla.edu/msuchard/bali-phy/index.php
Prankhttp://code.google.com/p/prank-msa/

We gratefully acknowledge funding by the National Science Foundation grant (DEB-0732920 to Mark Holder) which supported the development of this software. See http://www.cs.utexas.edu/~tandy/ATOL-MSA.html for more information on this collaborative effort.

NSF logo in color