Open Tree of Life - Collecting published trees

One of the first steps is to collect published phylogenies to include in the first draft tree of life. We are doing this using a public Mendeley group.

Adding publications to the list

The first step is to create a Mendeley account, if you don't already have one. Then, join the OpenTree Mendeley group. There are a few ways to add papers to the group:

use the web importer: when you add a paper to your library, you can choose to also add it to a group
in Mendeley desktop: simply drag a paper from your library into the group
from your library on the Mendeley site: select the checkboxes beside the papers in your library, then choose 'OpenTree' from pulldown 'Add selected documents to...' list.

Tagging papers if data available

We can only include a published phylogeny in the draft tree of life if the data (tree and alignment files) is available. We are using a couple of tags to track data availability. Feel free to use other tags, but please document how you are using them.

treebase: the data is in TreeBASE
datadryad: the data is in Dryad
data_other_site: the data is available somewhere other than TreeBASE or Dryad (as per text of publication)
nodata: we can't find any data deposited in either repository :(

Metadata about trees

in collaboration with MIAPA group, we are collecting information on what metadata is important for input phylogenies: please do fill out our survey
initial results from survey
beginnings of a manuscript by Karen Cranston and Elliot Hauser

Pilot project in angiosperms

We've done a pilot project collecting publications, data sets and metadata about angiosperm phylogenies. 21 papers focused on angiosperm framework were used to search for trees, data sets, upload to Mendeley, and compile basic information: citation; Supermatrix vs supertree?; method if matrix (MP, ML, Bayesian); BS values given? Comments if appropriate—e.g., "best phylogenetic tree for this clade"; “multiple trees—which one to use?”.

Basic summary

Of 21 papers 4 had tree files.
From 21 papers only 7 had complete data matrices. Two of these 7 data sets still require manipulation because the data set was submitted in pdf format (not it NEXUS format) . Also only 4 of these 7 papers with data matrices were associated with the available tree files (Dryad/TreeBase).
For 20 of 21 papers the GB/EMBL numbers are provided however.
Basic data for each paper were entered into a file
total time estimate per paper for all tasks ~ 15 minutes (by an expert in the field)
- note that this time estimate does not take into account the fact that we still need to contact the authors of – 15 – 17 papers for data sets and/or trees.

Some problems noted

Some recent papers stated that the data set could be obtained from a link, but this was not really active or working (one was from 2011!).
Sometimes authors said that paper was associated with Supporting data, but nothing was found as a Supplement.
Sometimes a data file was ultimately submitted to TreeBase or Dryad, but this was not noted in the paper itself.
Some papers (3 studies) have multiple trees in a file but it is not always clear which tree to use. The presence of multiple trees was observed to be for several reasons—different methods of analysis (MP, ML, BI), different data sets (e.g., no third positions), or multiple shortest trees.
The same problem as above is true when multiple matrices are available (3 studies)