Workshop

Information for a tutorial is at http://phylo.bio.ku.edu/software/sate/tutorial.html

SATé - Simultaneous Alignment and Tree Estimation

About SATé

SATé is a software package for inferring a sequence alignment and phylogenetic tree. The iterative algorithm involves repeated alignment and tree searching operations. The original data set is divided into smaller subproblems by a tree-based decomposition. These subproblems are aligned and further merged for phylogenetic tree inference.

The implementation developed at the University of Kansas is written by Jiaye Yu, Mark Holder, Jeet Sukumaran, Siavash Mirarab, and Jamie Oaks. By default, this implementation uses the "SATé-II fast" settings (but see the "known issues" section below about bugs which affect version prior to 2.2.2).

The alignment and tree searching routines are implemented by calling "external" programs not written by us (but are bundled with the SATé distribution).

The original style of SATé was described in Liu et al. 2009. The version posted here enables the SATé-II searching described in Liu et al. 2012.

Currently, the following tools are supported, and are bundled with the SATé distribution:

Our implementation also uses functionality provided by the DendroPy library (Sukumaran and Holder).

Please, see the Citations section of this page for references.


Please join SATé User Group for announcements about the software


Version 2.2.2 fixed several significant bugs (see the "known issues" section below). Please make sure that you are using version 2.2.2 (or later).


Download and Installation (Latest Release: v2.2.7, Feb 15, 2013)

This implementation of SATé has not been the subject of intensive testing. Please check your results carefully, and notify us of any apparent bugs. Stay current on the status of SATé by joining the SATé User Group

A history of changes in this and previous releases can be found in the change log. Older releases can be downloaded directly from the archives: Archived older versions.

SATé runs on three platforms, including Microsoft Windows, Mac OS X and Linux (source code distribution of SATé on Linux). If clicking the SATé box above does not download the correct version for your platform, follow the instructions below, or check out all of the archived versions in the Downloads section. Simple GUIs are provided for Windows and Mac OS X users, and a command-line version is available on Mac OS X and Linux. If you do need to download an older version, please contact us.

See README.txt for brief instructions, and for frequently asked questions, see here.

  • Microsoft Windows

  1. Install Microsoft Visual C++ 2008 Redistributable Package (it is free), if you do not have it.
  2. Download SATé for Windows here.
  3. Unzip the entire package to your favorite directory.
  4. Double click run_sate_gui.exe to start the program.

Note: No tests were performed on Windows 95/98/2000 by us and we do not intend to support these operating systems.

  • Mac OS X (Universal)

  1. Download SATé for OS X here
  2. Mount the downloaded disk image
  3. Drag SATé into a folder on your hard drive.
  4. Double click SATé to run it.

Note: SATé has been tested on 10.4, 10.5, 10.6 and 10.7. We have not tested it on version 10.3 or earlier.

  • Linux

  1. Install Python 2.6 or Python 2.7 and Dendropy, if you do not have them.
  2. Download SATé source code here
  3. Unzip the tar ball to your favorite directory.
  4. To install SATé, open a terminal, navigate to the unzipped source directory, and enter python setup.py develop. If this fails due to a permission error, try sudo python setup.py develop (you will be prompted to enter your user-account password).
  5. From a terminal within the SATé source directory use the command python run_sate.py to launch SATé.
  6. To run SATé GUI on Linux, please install wxPython first and use command python run_sate_gui.py. We do not provide support for SATé GUI on Linux.

Note: The external alignment and tree inference tools have been pre-compiled on Ubuntu 8.04.

Usage

  • GUI version

Please refer to Liu et al. for a description of the algorithm. The options and labels of the GUI are explained in the README.txt file.

Use "Help" menu of SATé program for more information.

Note: If you want to use OPAL alignment merger in SATé, Java should be installed. SATé 2.0.3 onwards allows you to specify the size of the Java machine (see the "Maximum MB" option). This defaults to 1024 MB. If your machine has less memory than this, you will have to adjust this manually. Note that Opal requires quite a bit of memory, and if SATé fails when using Opal, you will need to manually increase the amount of memory, or, if your machine does not have enough memory, you will have to use a less memory-intensive program such as MUSCLE.

  • Command-line version

First-time users of SATé could try the following command to get a feeling of its command-line version.

python run_sate.py -i data/small.fasta -t data/small.tree -j test --auto

For all the command-line options and help information, please use

python run_sate.py -h

Advanced users may want to try more options with a configuration file, one example is shown here.

[clustalw2]
path = /home/jiaye/Projects/msaml/sate/bin/clustalw2

[mafft]
path = /home/jiaye/Projects/msaml/sate/bin/mafft

[muscle]
path = /home/jiaye/Projects/msaml/sate/bin/muscle

[opal]
path = /home/jiaye/Projects/msaml/sate/bin/opal.jar

[prank]
path = /home/jiaye/Projects/msaml/sate/bin/prank

[raxml]
path = /home/jiaye/Projects/msaml/sate/bin/raxml

[fasttree]
path = /home/jiaye/Projects/msaml/sate/bin/fasttree

[commandline]
datatype = dna
input = /path/to/file.fasta
treefile = /path/to/starting.tre
job = satejob

[sate]
time_limit = 86400
max_subproblem_size = 0.2
aligner = mafft
merger = muscle
tree_estimator = raxml
output_directory = sateout
num_cpus = 2
break_strategy = centroid

License

This implementation of SATé is distributed under the GPL license. The external tools included in SATé package have their own licenses.

Status

At this time, we are still actively adding functions and making changes to SATé. The most recent snapshot can be obtained through GitHub (NOTE: in addition to SATé core, you will also need to get the OS X, Linux, or Windows tool binaries as appropriate for your operating system). Versions found there, but not released here are much more likely to be still in development and less stable than the versions that we post here.

Known Issues

  1. Versions prior to 2.2.2 contain a bug that causes the tree decomposition to use the 'centroid' option, even if the user selects the 'longest' option. The resulting tree and alignment should be reasonable outputs, but the do not conform exactly to the run settings described in the SATé-II paper, rather they should be described as using the centroid decomposition.
  2. Version 2.2.2 fixed multiple issues dealing with the handling very small subsets of taxa. Such subsets arise frequently in the "longest" edge decomposition, but are rare when using centroid decomposition. That version also fixed bugs related to the handling of input sequences which all gaps (which happens in some multilocus analyses).
  3. Multiloci analyses of amino acid sequence data in which some taxa are missing data for some loci should not be performed with SATé version 2.1.x or earlier. In these cases, the concatenated datafile that is the subject of tree searching will have the incorrect padding character inserted. This will result in incorrect tree inference. The bug has been fixed in the version 2.2.0 and higher .
  4. RAxML 7.2.6 fails with large protein sequence data sets on 32-bit machines.
  5. OPAL 1.0.3 can fail with Java version 6 update 21. Use Muscle instead if you get a problem.
  6. OPAL requires a lot of memory to run. If you get errors running OPAL, try increasing the 'Maximum MB' option until you reach the limit of your machine's available physical memory. Java will fail if you set a memory size larger than the amount of physical memory available on your machine. If this happens, you should use MUSCLE instead of Opal. Note that SATé version 2.0.3 or later uses 1024 MB as the default maximum memory for running Java. Previous versions of SATé used 2048 MB.
  7. The GUI currently does not support non-ascii characters. So, please leave accents, umlauts and the like out of your file names. The command-line interface does support non-ascii characters.
  8. On 64-bit Linux systems, the bundled 32-bit FastTree executables may not work. If you encounter problems when trying to use FastTree, you can download the source or 64-bit binaries from the FastTree website. Rename the newly downloaded or compiled binaries as "fasttree" (for the serial version) and "fasttreeMP" (for the multi-threaded version) and replace the executables of the same names in the sate-tools-linux directory.

Features Requested

Listed below are some features requested by users. We will add them into the future releases of SATé if they are implemented.

  1. Mixed-datatype analyses
  2. GARLI as a tree estimation tool
  3. Codon-based analysis

Citations

If you use the software in a publication, please cite the software, the papers describing the method, and the appropriate citation for the external tools.

Algorithm citations

Citations for the SATé software itself and its dependencies

  • Jiaye Yu, and Mark T. Holder "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2 or earlier)
  • Jiaye Yu, Mark T. Holder, Jeet Sukumaran, and Siavash Mirarab "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2.1 to 2.1.0)
  • Jiaye Yu, Mark T. Holder, Jeet Sukumaran, Siavash Mirarab, and Jamie Oaks "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 2.2.0 or later)
  • Sukumaran, J. and Mark T. Holder. 2010. "DendroPy: A Python library for phylogenetic computing". Bioinformatics 26: 1569-1571. (for all SATé versions from this website)

External tool citations

Please remember to cite the aligner and tree inference tools that you use during the course of a SATé run. The exact citation will depend on what tools you choose to use:

Contacts

If you have any questions, want to report a bug, or request a specific feature regarding SATé, please visit the SATé User Group.


Copyright © 2009 - 2012. Jiaye Yu and Mark Holder, University of Kansas

Valid XHTML 1.1 TextWrangler Real sate

We gratefully acknowledge funding by the National Science Foundation grant (DEB-0732920 to Mark Holder) which supported the development of this software. See http://www.cs.utexas.edu/~tandy/ATOL-MSA.html for more information on this collaborative effort.

NSF logo in color