Collections of Trees¶
TreeList objects are collections of
Tree objects constrained to sharing the same
Tree object added to a
TreeList will have its
taxon_namespace attribute assigned to the
TaxonNamespace object of the
TreeList, and all referenced
Taxon objects will be mapped to the same or corresponding
Taxon objects of this new
TaxonNamespace, with new
Taxon objects created if no suitable match is found.
Objects of the
TreeList class have an “
annotations” attribute, which is a
AnnotationSet object, i.e. a collection of
Annotation instances tracking metadata.
More information on working with metadata can be found in the “Working with Metadata Annotations” section.
TreeList class supports the “
get” factory class method for simultaneously instantiating and populating
TreeList instances, taking a data source as the first argument and a schema specification string (“
fasta”, or “
phylip”, etc.) as the second:
import dendropy treelist = dendropy.TreeList.get(path='pythonidae.mcmc.nex', schema='nexus')
import dendropy trees = dendropy.TreeList() trees.read(path="sometrees.nex", schema="nexus", tree_offset=10) trees.read(data="(A,(B,C));((A,B),C);", schema="newick")
import dendropy treelist = dendropy.TreeList.get( path="trees1.nex", schema="nexus", ) treelist.write( path="trees1.newick", schema="newick", )
It can also be represented as a string using the “
import dendropy treelist = dendropy.TreeList.get( path="trees1.nex", schema="nexus", ) print(treelist.as_string(schema="newick",)
More information on reading operations is available in the Reading and Writing Phylogenetic Data section.
Using and Managing the Collections of Trees¶
TreeList behaves very much like a list, supporting iteration, indexing, slices, removal, indexing, sorting, etc.:
import dendropy from dendropy.calculate import treecompare trees = dendropy.TreeList.get( path="pythonidae.random.bd0301.tre", schema="nexus") for tree in trees: print(tree.as_string("newick")) print(len(trees)) print(trees.as_string("nexus")) print(treecompare.robinson_foulds_distance(trees, trees)) print(treecompare.weighted_robinson_foulds_distance(trees, trees)) first_10_trees = trees[:10] last_10_trees = trees[-10:] # Note that the TaxonNamespace is propogated to slices assert first_10_trees.taxon_namespace is trees.taxon_namespace assert first_10_trees.taxon_namespace is trees.taxon_namespace print(id(trees)) print(id(trees)) trees = trees print(id(trees)) print(id(trees)) print(trees in trees) trees.remove(trees[-1]) tx = trees.pop() print(trees.index(trees)) trees.sort(key=lambda t:t.label) trees.reverse() trees.clear()
TreeListclass supports the native Python
listinterface methods of adding individual
insert, and other methods, but with the added aspect of taxon namespace migration:
import dendropy from dendropy.calculate import treecompare trees = dendropy.TreeList.get( path="pythonidae.random.bd0301.tre", schema="nexus") print(len(trees)) tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus") # As we did not specify a |TaxonNamespace| instance to use above, by default # 'tree' will get its own, distinct |TaxonNamespace| original_tree_taxon_namespace = tree.taxon_namespace print(id(original_tree_taxon_namespace)) assert tree.taxon_namespace is not trees.taxon_namespace # This operation adds the |Tree|, 'tree', to the |TreeList|, 'trees', # *and* migrates the |Taxon| objects of the tree over to the |TaxonNamespace| # of 'trees'. This will break things if the tree is contained in another # |TreeList| with a different |TaxonNamespace|! trees.append(tree) # In contrast to before, the |TaxonNamespace| of 'tree' is not the same # as the |TaxonNamespace| of 'trees. The |Taxon| objects have been imported # and/or remapped based on their label. assert tree.taxon_namespace is trees.taxon_namespace print(id(original_tree_taxon_namespace))
You can make a shallow-copy of a
dendropy.datamodel.treecollectionmodel.TreeList.clone with a “
depth” argument value of 0 or by slicing:
import dendropy # original list s1 = "(A,(B,C));(B,(A,C));(C,(A,B));" treelist1 = dendropy.TreeList.get( data=s1, schema="newick") # shallow copy by calling Tree.clone(0) treelist2 = treelist1.clone(depth=0) # shallow copy by slicing treelist3 = treelist1[:] # same tree instances are shared for t1, t2 in zip(treelist1, treelist2): assert t1 is t2 for t1, t2 in zip(treelist1, treelist3): assert t1 is t2 # note: (necessarily) sharing same TaxonNamespace assert treelist2.taxon_namespace is treelist1.taxon_namespace assert treelist3.taxon_namespace is treelist1.taxon_namespace
For a taxon namespace-scoped deep-copy, on the other hand, i.e., where the
Tree instances are also cloned but the
TaxonNamespace references are preserved, you can call
dendropy.datamodel.treecollectionmodel.TreeList.clone with a “
depth” argument value of 1 or by copy construction:
import dendropy # original list s1 = "(A,(B,C));(B,(A,C));(C,(A,B));" treelist1 = dendropy.TreeList.get( data=s1, schema="newick") # taxon namespace-scoped deep copy by calling Tree.clone(1) # I.e. Everything cloned, but with Taxon and TaxonNamespace references shared treelist2 = treelist1.clone(depth=1) # taxon namespace-scoped deep copy by copy-construction # I.e. Everything cloned, but with Taxon and TaxonNamespace references shared treelist3 = dendropy.TreeList(treelist1) # *different* tree instances for t1, t2, t3 in zip(treelist1, treelist2, treelist3): assert t1 is not t2 assert t1 is not t3 assert t2 is not t3 # Note: TaxonNamespace is still shared # I.e. Everything cloned, but with Taxon and TaxonNamespace references shared assert treelist2.taxon_namespace is treelist1.taxon_namespace assert treelist3.taxon_namespace is treelist1.taxon_namespace
import copy import dendropy # original list s1 = "(A,(B,C));(B,(A,C));(C,(A,B));" treelist1 = dendropy.TreeList.get( data=s1, schema="newick") # Full deep copy by calling copy.deepcopy() # I.e. Everything cloned including Taxon and TaxonNamespace instances treelist2 = copy.deepcopy(treelist) # *different* tree instances for t1, t2 in zip(treelist1, treelist2): assert t1 is not t2 # Note: TaxonNamespace is also different assert treelist2.taxon_namespace is not treelist1.taxon_namespace for tx1 in treelist1.taxon_namespace: assert tx1 not in treelist2.taxon_namespace for tx2 in treelist2.taxon_namespace: assert tx2 not in treelist1.taxon_namespace
Efficiently Iterating Over Trees in a File¶
If you need to process a collection of trees defined in a file source, you can, of course, read the trees into a
TreeList object and iterate over the resulting collection:
import dendropy trees = dendropy.TreeList.get(path='pythonidae.beast-mcmc.trees', schema='nexus') for tree in trees: print(tree.as_string('newick'))
In the above, the entire data source is parsed and stored in the
trees object before being processed in the subsequent lines.
In some cases, you might not need to maintain all the trees in memory at the same time.
For example, you might be interested in calculating the distribution of a statistic over a collection of trees, but have no need to refer to any of the trees after the statistic has been calculated.
In this case, it will be more efficient to use the
This takes a list or any other iterable of file-like objects or strings (giving filepaths) as the first argument (“
files”) and a mandatory schema specification string as the second argument (“
Additional keyword arguments to customize the parsing are the same as that for the general “
get” and “
For example, the following script reads a model tree from a file, and then iterates over a collection of MCMC trees in a set of files, calculating and storing the symmetric distance between the model tree and each of the MCMC trees one at time:
#! /usr/bin/env python import dendropy from dendropy.calculate import treecompare distances =  taxa = dendropy.TaxonNamespace() mle_tree = dendropy.Tree.get( path='pythonidae.mle.nex', schema='nexus', taxon_namespace=taxa) burnin = 20 source_files = [ open("pythonidae.mcmc1.nex", "r"), # Note: for 'Tree.yield_from_files', open("pythonidae.mcmc2.nex", "r"), # sources can be specified as file "pythonidae.mcmc3.nex", "r", # objects or strings, with strings "pythonidae.mcmc4.nex", "r", # assumed to specify file paths ] tree_yielder = dendropy.Tree.yield_from_files( files=source_files, schema='nexus', taxon_namespace=taxa, ) for tree_idx, mcmc_tree in enumerate(tree_yielder): if tree_idx < burnin: # skip burnin continue distances.append(treecompare.symmetric_difference(mle_tree, mcmc_tree)) print("Mean symmetric distance between MLE and MCMC trees: %d" % float(sum(distances)/len(distances)))
Note how a
TaxonNamespace object is created and passed to both the
get and the
yield_from_files functions using the
taxon_namespace keyword argument.
This is to ensure that the corresponding taxa in both sources get mapped to the same
Taxon objects in DendroPy object space, so as to enable comparisons of the trees.
If this was not done, then each tree would have its own distinct
TaxonNamespace object (and associated
Taxon objects), making comparisons impossible.
When the number of trees are large or the trees themselves are large or both, iterating over trees in files using
yield_from_files is almost always going to give the best performance, sometimes orders of magnitude faster.
This is due to avoiding the Python virtual machine itself from slowing down due to memory usage.