DendroPy 4 Changes and Migration Primer

Introduction

  • Updated for full (and exclusive) Python 3.x compatibility.
  • Faster, better, stronger! Core serialization/deserialization infrastructure rewritten from the ground up, with many optimizations for speed and reliability.

Python Version Compatibility

  • Compatibility: Python 3 is fully supported. The only version of Python 2 supported is Python 2.7.

    • Python 2: Python 2.7
    • Python 3: Python 3.1, 3.2, 3.3, 3.4

Library-Wide Changes

Public Module Reorganization

A number of modules have been renamed, moved, or split into multiple modules. Calls to the old module should continue to work, albeit with warnings exhorting that you update to the latest configuration.

  • dendropy.treecalc has been split into three submodules depending on whether the statistic or value being calculated is on a single tree, a single tree and a dataset, or two trees:

  • The functionality provided dendropy.treesplit has been largely subsumed by the new Bipartition class.

  • The functionality provided by dendropy.treesum has been largely subsumed by the new TreeArray class, a high-performance class for efficiently managing and carrying out operations on large collections of large trees.

  • dendropy.reconcile has been moved to dendropy.model.reconcile.

  • dendropy.coalescent has been moved to dendropy.model.coalescent.

  • dendropy.popgenstat has been moved to dendropy.calculate.popgenstat.

  • dendropy.treesim has been moved to dendropy.simulate.treesim.

  • dendropy.popgensim has been moved to dendropy.simulate.popgensim.

Behind-the-Scenes Module Reorganization

Unique Object Identifier (“oid”) Attributes Removed

  • The entire oid system (“object identifier”), i.e., the unique id assigned to every data object, has been removed. This was an implementation artifact from NEXML parsing that greatly slowed down a number of operations without any benefit or utility for most normal operations.

TaxonSet is now TaxonNamespace

  • The dendropy.TaxonSet class has been renamed TaxonNamespace, (and the corresponding taxon_set attribute of phylogenetic data objects that reference a taxonomic context has been renamed taxon_namespace).

  • The TaxonNamespace class replaces the TaxonSet class as the manager for the Taxon objects.

  • The API is largely similar with the following differences:

    • Calls to the __getitem__ and __delitem__ methods (e.g. TaxonNamespace[x]) now only accept integer values as arguments (representing indexes into the list of Taxon objects in the internal array).

    • TaxonSet.has_taxon and TaxonSet.has_taxa have been

      replaced by TaxonNamespace.has_taxon_label and TaxonNamespace.has_taxa_labels respectively.

    • Various new methods for accessing and managing the collection of

      Taxon objects (e.g., findall, remove_taxon, remove_taxon_label, discard_taxon_label, __delitem__, etc.)

    • Numerous look-up methods took ‘case_insensitive‘ as an argument that determined whether the look-up was case sensitive or not (when retrieving, for example, a Taxon object corresponding to a particular label), which, if not specified, default to False, i.e. a non-caseless or a case-sensitive matching criteria. In all cases, this has been changed to to ‘case_sensitive‘ with a default of True. That is, searches by default are still case-sensitive, but now you will have to specify ‘case_sensitive=False‘ instead of ‘case_insensitive=True‘ to perform a case-insensitive search. This change was for consistency with the rest of the library.

  • In most cases, a simple global search-and-replace of “TaxonSet” with “TaxonNamespace” and “taxon_set” with “taxon_namespace” should be sufficient to bring existing code into line with DendroPy 4.

  • For legacy support, a class called TaxonSet exists. This derives with no modifications from TaxonNamespace. Instantiating objects of this class will result in warnings being emitted. As long as usage of TaxonSet does conforms to the above API change notes, old or legacy code should continue to work unchanged (albeit, with some warning noise). This support is temporary and will be removed in upcoming releases: code should update to using TaxonNamespace as soon as expedient.

  • For legacy support, “taxon_set” continues to be accepted and processed as an attribute name and keyword argument synonymous with “taxon_namespace”. Usage of this will result in warnings being emitted, but code should continue to function as expected. This support is temporary and will be removed in upcoming releases: code should update to using “taxon_namespace” as soon as expedient.

The Node Class

  • Constructor now only accepts keyword arguments (and oid is not one of them!).
  • add_child no longer accepts pos as an argument to indicate position in which a child should be inserted. Use insert_child which takes a position specified by index and a node specified by node for this functionality instead.

The Edge Class

  • Constructor now only accepts keyword arguments (and oid is not one of them!).
  • Because tail_node is no longer an independent attribute but a dynamic property, bound to Node._parent_node attribute of the head_node (see below), the Edge constructor does not accept tail_node as an argument.
  • The tail_node of an Edge object is now a dynamic property, referencing the Node._parent_node attribute of the Edge._head_node of the Edge object. So, now updating Edge._tail_node of an Edge object will set the Node._parent_node of its Edge._head_node to the new value, and vice versa. This avoids the need for independent book-keeping logic to ensure that Node._parent_node and Edge._tail_node are always synchronized to reference the same Node object and all the potential errors this might cause.

The Tree Class

NEWICK-format Reading

  • The suppress_external_taxon_labels and suppress_external_node_labels keyword arguments have been replaced by suppress_leaf_taxon_labels and suppress_leaf_node_labels, respectively. This is for consistency with the rest of the library (including writing in NEWICK-format), which uses the term “leaf” rather than “external”.

  • The various boolean rooting directive switches (as_rooted, default_as_rooted, etc.) have been replaced by a single argument: rooting. This can take on one of the following (string) values:

    • rooting=”default-unrooted”

      Interpret trees following rooting token (“[&R]” for rooted, “[&U]” for unrooted) if present; otherwise, intrepret trees as unrooted.

    • rooting”default-rooted”

      Interpret trees following rooting token (“[&R]” for rooted, “[&U]” for unrooted) if present; otherwise, intrepret trees as rooted.

    • rooting=”force-unrooted”

      Unconditionally interpret all trees as unrooted.

    • rooting=”force-rooted”

      Unconditionally interpret all trees as rooted.

    The value of the “rooting” argument defaults to “default-unrooted”, i.e., all trees are assumed to be unrooted unless a rooting token is present that explicitly specifies the rooting state.

NEWICK-format Writing

  • Previously, if annotations_as_nhx was True, metadata annotations would be written out even if suppress_annotations was True. Now, suppress_annotations must be True for annotations to be written out, even if annotations_as_nhx is True.

The DataSet Class

  • Constructor no longer supports they stream keyword argument to construct the new DataSet object from a data source. Use the factory class method: DataSet.get_from_stream instead.
  • Constructor only accepts one unnamed (positional) argument: either a DataSet instance to be cloned, or an iterable of TaxonNamespace, TreeList, or CharacterMatrix-derived instances to be composed (added) into the new DataSet instance.
  • TaxonNamespace no longer managed.