Introduction to Phylogenetic Data Objects

Primary Phylogenetic Data Objects

Phylogenetic data in DendroPy is represented by one or more objects of the following classes:

Taxon
A representation of an operational taxonomic unit, with an attribute, label, corresponding to the taxon label.
TaxonNamespace
A collection of Taxon objects representing a distinct definition of taxa (for example, as specified explicitly in a NEXUS “TAXA” block, or implicitly in the set of all taxon labels used across a Newick tree file).
Tree
A collection of Node and Edge objects representing a phylogenetic tree. Each Tree object maintains a reference to a TaxonNamespace object in its attribute, taxon_namespace, which specifies the set of taxa that are referenced by the tree and its nodes. Each Node object has a taxon attribute (which points to a particular Taxon object if there is an operational taxonomic unit associated with this node, or is None if not), a parent_node attribute (which will be None if the Node has no parent, e.g., a root node), a Edge attribute, as well as a list of references to child nodes, a copy of which can be obtained by calling child_nodes. In addition, advanced operations with tree data often make use of a Bipartition object associated with each Edge on a Tree (see “Bipartitions” for more information).
TreeList
A list of Tree objects. A TreeList object has an attribute, taxon_namespace, which specifies the set of taxa that are referenced by all member Tree elements. This is enforced when a Tree object is added to a TreeList, with the TaxonNamespace of the Tree object and all Taxon references of the Node objects in the Tree mapped to the TaxonNamespace of the TreeList.
CharacterMatrix
Representation of character data, with specializations for different data types: DnaCharacterMatrix, RnaCharacterMatrix, ProteinCharacterMatrix, StandardCharacterMatrix, ContinuousCharacterMatrix, etc. A CharacterMatrix can treated very much like a dict object, with Taxon objects as keys and character data as values associated with those keys.
DataSet
A meta-collection of phylogenetic data, consisting of lists of multiple TaxonNamespace objects (DataSet.taxon_namespaces), TreeList objects (DataSet.tree_lists), and CharacterMatrix objects (DataSet.char_matrices).
TreeArray
A high-performance container designed to efficiently store and manage (potentially) large collections of structures of (potentially) large trees for processing.

Creating New (Empty) Objects

All of the above names are imported into the the the dendropy namespace, and so to instantiate new, empty objects of these classes, you would need to import dendropy:

>>> import dendropy
>>> tree1 = dendropy.Tree()
>>> tree_list11 = dendropy.TreeList()
>>> dna1 = dendropy.DnaCharacterMatrix()
>>> dataset1 = dendropy.DataSet()

Or import the names directly:

>>> from dendropy import Tree, TreeList, DnaCharacterMatrix, DataSet
>>> tree1 = Tree()
>>> tree_list1 = TreeList()
>>> dna1 = DnaCharacterMatrix()
>>> dataset1 = DataSet()

More details on how to create and populate new objects of various kinds programmatically are given in later chapters (e.g., “Trees”, “Character Matrices”, “Data Sets”).

Reading, Writing, and Annotating Phylogenetic Data

DendroPy provides a rich set of tools for reading and writing phylogenetic data in various formats, such as NEXUS, Newick, PHYLIP, etc., with many options to customize and control how the data is ingested and parsed, as well as formatted and written-out. For example:

>>> import dendropy
>>> tree_list1 = dendropy.TreeList()
>>> tree_list1.read_from_path("pythonidae.mcmc1.nex",
...     schema="nexus",
...     collection_offset=0,
...     tree_offset=100)
>>> tree_list1.read_from_path("pythonidae.mcmc2.nex",
...     schema="nexus",
...     collection_offset=0,
...     tree_offset=100)
>>> tree_list1.write_to_path("combined.newick",
...     suppress_edge_lengths=True,
...     schema="newick")

These are covered in detail in the next chapter, “Reading and Writing Phylogenetic Data”.

Support is also available for adding, accessing, and managing rich and expressive metadata annotations to many of the above objects and components of those objects. This is covered in detail in the “Working with Metadata Annotations” chapter.

Table Of Contents

Previous topic

The DendroPy Primer

Next topic

Reading and Writing Phylogenetic Data

Documentation

Obtaining

DiscussionGoogle Groups

Join the " DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.

AnnouncementsGoogle Groups

Join the " DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the " dendropy-announce" group, or click here to visit this group page directly.

DevelopmentGitHub

  • Issues - Report bugs or request features
  •  Watch - Follow development activity
  •   Fork - Contribute and collaborate
  •   Star - Throw some glitter, add some glamour