The NEXUS Class Library (NCL) is an integrated collection of C++ classes that parses several file formats used in evolutionary biology (NEXUS, PHYLIP, relaxed PHYLIP, and FASTA). NCL does not diagnose the file format, but if you configure a parser to read several formats, then you will be able to parse multiple formats and extract the data from NCL's data structures using the same API regardless of the file format.
This documentation is written for C++ programmers.
Version 2.0 of NCL itself was published as:
Lewis, P. O. 2003. NCL: a C++ class library for interpreting data files in NEXUS format. Bioinformatics 19 (17): 2330-2331. [link to online resource]
See http://hydrodictyon.eeb.uconn.edu/ncl for documentation on version 2.0.
The NEXUS data file format was specified in the publication cited below (this is a link to pdf of that paper). Please read this paper for further information about the format specification itself; the documentation for the NCL does not attempt to explain the structure of a NEXUS data file.
Maddison, D. R., D. L. Swofford, and Wayne P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Systematic Biology 46(4): 590-621.
Despite several fundamental changes in the implementation of the library, we strive to keep NCL v2.1 backward compatible with version 2.0. A programmer that relied on version 2.0 should still work. If you discover that your client code works with version 2.0 of NCL, but not 2.1 please let us know.
Version 2.1 extends the functionality significantly by allowing NCL to parse files that use extended forms of NEXUS. Both Mesquite and MrBayes rely on extensions to NEXUS. Particularly difficult to handle are Mesquite's support for multiple blocks of the same type within a file (accompanied by linking blocks by title).
Version 2.0 of NCL followed a model of creating a NxsReader object and adding NxsBlock objects which handle parsing of particular types of NEXUS content. Client code would typically inherit from base classes such as NxsCharactersBlock, or would extract the information when a block was completely read. The NxsBlock instance would be reset (by NxsBlock::Reset call) before it was asked to handle another block.
Unfortunately, not all NEXUS blocks are autonomous (for example commands in an ASSUMPTIONS block may rely on information in a CHARACTERS block). Combining inter-block dependencies with the need to store information from multiple blocks of the same type means that NCL's version 2.0 API can be quite cumbersome. To read a file with a single instance of a NxsCharactersBlock, the client code must carefully offload all of the information in a block before allowing parsing to proceed to the next block. Subsequent references between blocks have to be corrected so that blocks refer to the new location of the information (rather than the NxsBlock instance that originally held the data).
A more natural design pattern for processing files which may have multiple blocks of the same type is to use a factory method. Users of NCL v2.1 can register NxsBlockFactory instances with a NxsReader. Then the reader can parse an entire file before having to pull the parsed information from the blocks.
The svn branch for version 2.2 is very similar to 2.1. It only contains changes necessary to allow NCL to be callable from other languages via SWIG. Some of the changes needed to ensure this were not backward-compatible. So they were added to branch 2.2.
C++ programmers should probably use 2.1 (although their code will almost certainly work on v2.2 as well)
The best link to use to get the latest (stable) version of NCL is from the sourceforge NCL project page.
If you want SVN access to the latest code, please use branches/v2.1 from the SVN checkout on Sourceforge.
The NCL has been designed to be as portable as possible for a C++ class library. Please let us know if you find a (reasonably modern) compiler that does not accept NCL.
NCL does not rely on any external libraries. There are hooks (such as NxsToken::OutputComment) that allow client code to intercept messages that you may want to display as standard output (in commandline executables) or a graphical window (in GUIs).
NCL can interpret the following Blocks and commands.
Note that TITLE, BLOCKID are handle by all block types. LINK commands are handled where appropriate.
|Block Type ID||Commands||NCL Block Reader type|
|ASSUMPTIONS||CharPartition, CharSet, CodeSet, CodonPosSet, Options, TaxSet, TaxPartition, TreeSet, TreePartition, TypeSet, UserType, WtSet||NxsAssumptionsBlock|
|CHARACTERS||Dimensions, Format, TaxLabels, CharStateLabels, CharLabels, StateLabels, Matrix||NxsCharactersBlock|
|CODONS||(same as ASSUMPTIONS)||NxsAssumptionsBlock|
|DATA||(same as CHARACTERS)||NxsCharactersBlock|
|DISTANCES||Dimensions, Format, TaxLabels, Matrix||NxsDistancesBlock|
|SETS||(same as ASSUMPTIONS)||NxsAssumptionsBlock|
|UNALIGNED||Dimensions, Format, TaxLabels, Matrix||NxsUnalignedBlock|