This file has moved

As of January 2014, the latest version of this file is now kept at the following location:
https://github.com/OpenTreeOfLife/phylesystem-api/wiki/NexSON

The text below (probably obsolete) is retained as a backup copy.








Overview


NexSON is a translation of NeXML to JSON using the BadgerFish conventions. Each NexSON file represents a Phylografter study, though it may contain exactly one or all the trees included in the study. The set of elements used in these files is (currently) limited to nexml, otus, otu, trees, tree, node, and edge - note that there is no current support for data matrices. There is also an associated metadata vocabulary currently used for annotating elements of type nexml, otu, tree, and node.

As of October 2013, discussion of syntax of validation-related annotations is taking place on the api.opentreeoflife.org wikiwhen that discussion stabilizes the content there will be migrated to this page.

BadgerFish

BadgerFish is one of several schemes for rendering XML as JSON. Several sites, including a site that appears to be the original, and several refinements were consulted in developing the mapping appropriate for NeXML.

Correctness of translation was verified by using a backtranslator and validating the resulting XML using the validator on the NeXML home page.

MetaData

OpenTree's NexSON metadata vocabulary uses the URI prefix http://purl.org/opentree-terms#, which is abbreviated to ot:. The vocabulary consists of a number of predicates, and a set of terms specifying a choice of values for a particular predicate. The predicates and the types of their values are listed in Table I.

Table I. Predicate Vocabulary
Element
Name
Type
Description
Nexml (study)
ot:studyPublicationReference
(long) string
A reference (bibliographic citation string) to the publication describing the associated study

ot:studyPublication
URI
URI (doi preferred) identifying the publication describing the associated study

ot:studyYear
integer
Year study was published

ot:curatorName
string
Name of the person who curated this study in opentree

ot:dataDeposit
URI
The data publication in which the data in this nexml object may be found

ot:studyId
string
Short identifier used by phylografter for the study.

ot:focalClade
integer
ottolid of root of clade specified as focal in the study

ot:focalCladeOTTTaxonName
string
label (name) assigned for this node, if any (else empty string)

ot:tag
string
tag attached to study; may indicate deprecation; may occur multiple times

ot:notIntendedForSynthesis
boolean
default = false; curator can choose this to relax validation (allow un-rooted trees, unmapped taxa, etc)

ot:notUsingRootedTrees
boolean
default = false; curator can choose this to relax validation (allow un-rooted trees), eg for microbes

ot:candidateTreeForSynthesis
string
id of the tree marked as a candidate for synthesis; may occur multiple times?

ot:specifiedRoot
string
id of the node tagged as root of the tree. This node should be the same as the node bearing the @root identifier.
If the ot:specifiedRoot property is absent, then the tree should be assumed to be arbitrarily rooted (thus the node
bearing the @root identifier may not necessarily be the biologically correct root.)
Note: phylografter does not write a value for this, but will in the future
otu
ot:ottid (was ottolid)
integer
taxon id from OTT

ot:originalLabel
string
label (name) assigned the otu in uploaded tree

ot:treebaseOTUId
string
Treebase id for otu (for studies from treebase)
tree
ot:branchLengthMode
choice
Table II

ot:inGroupClade
string
id of the node tagged as root of ingroup root (note this node may not have an assigned otu)

ot:tag
string
tag attached to the tree; may indicate deprecation or inference method; may occur multiple times

ot:branchLengthTimeUnit
string
currently phylografter only writes "Myr", which reflects its internal default.
Not meaningful if ot:branchLengthMode is not ot:time

ot:curatedType
string
curator provided type of tree; should specify inference method as text
node
ot:isLeaf
boolean
a boolean set to true on terminal otu nodes. This is redundant with having no edges that refer to the node as a source. It is included to enable fast checking of whether a node is a leaf.

ot:ottTaxonName
string
the name of the ott taxon

Table II - Object (value) vocabulary
Element/Predicate
Name
"meaning"
tree / ot:branchLengthMode
ot:substitutionCount
branch lengths represent number of substitutions

ot:changesCount
branch lengths represent number of changes

ot:time (was ot:years)
branch lengths represent time. Units specified with ot:branchLengthTimeUnit

ot:bootstrapValues
branch lengths represent bootstrap values

ot:posteriorSupport
branch lengths represent posterior support values

ot:other
branch lengths represent defined values but are not among the known types, refer ot:branchLengthDescription

ot:undefined
branch lengths represent undefined values
tree / ot:nodeLabelMode
ot:taxonNames
node labels represent taxon names

ot:bootstrapValues
node labels represent bootstrap values

ot:posteriorSupport
node labels represent posterior support

ot:other
node labels respresent defined values but are not among the known types, refer ot:nodeLabelDescription

ot:undefined
node labels represent undefined values

Table III - Proposed (not yet implemented) Predicates
Element
Name
Priority
Type
Description
nexml
ot:studyLabel

String


ot:studyUploaded
Medium
String (datetime)
Time stamp for when study was initially uploaded

ot:studyModified
Medium
String (datetime)
Time stamp for when study was last modified

ot:studyLastEditor
Medium
String
Username of last user to modify
tree
ot:nodeLabelMode (was ot:cladeLabelMmode)
Medium
choice
see Table II

ot:nodeLabelDescription (was ot:cladeLabelsComment)

String


ot:branchLengthDescription (was ot:branchLengthComment)

String


ot:branchLengthTimeUnit

String (can be choice)
the unit of time used for the branch lengths. has no meaning if the value of ot:branchLengthMode is not ot:time

ot:inferenceMethod

String (can be choice)
the type of inference method used to infer this tree. E.g. parsimony, likelihood, bayesian, distance, etc.

ot:authorContributed
High
choice
Many trees indicate this in a type field. This is boolean.

ot:treebaseTreeId
High



ot:comment
Medium
String


ot:treeModified
Medium
String (datetime)
Time stamp for when tree was modified (not necessary same time as study)

ot:treeLastEdited




ot:curatorType
Medium
String
In many cases this will be the inference method, but may be other free text
node
ot:cladeLabel

String
pertains to clade rooted at node; see ot:clade_label_mode

ot:isIngroup

Boolean
a boolean set to true on the most inclusive ingroup node (ingroup root)

ot:parent
Low



ot:age
Medium
Number
assigned age

ot:ageMin
Low
Number
lower bound of assigned age

ot:ageMax
Low
Number
upper bound of assigned age

ot:bootstrapSupport
Medium
Number
(this appears to be redundant with the branch length mode)

ot:posteriorSupport
Medium
Number
(this appears to be redundant with the branch length mode)

ot:otherSupport
Low
Number
(this appears to be redundant with the branch length mode)

ot:otherSupportType
Low
String
specifies alternative support statistic
(this appears to be redundant with the branch length mode)

ot:originalRoot
High
Boolean
The first time a tree is rerooted, it should note the original rooting position by flagging the node that was the original root of the tree.