A phylogenetic data object that coordinates collections of TaxonNamespace, TreeList, and (various kinds of) CharacterMatrix objects.
A DataSet has three attributes:
- taxon_namespaces
- A list of TaxonNamespace objects, each representing a distinct namespace for operational taxononomic unit concept definitions.
- tree_lists
- A list of TreeList objects, each representing a collection of Tree objects.
- char_matrices
- A list of CharacterMatrix-derived objects (e.g. DnaCharacterMatrix).
Multiple TaxonNamespace objects within a DataSet are allowed so as to support reading/loading of data from external sources that have multiple independent taxon namespaces defined within the same source or document (e.g., a Mesquite file with multiple taxa blocks, or a NeXML file with multiple OTU sections). Ideally, however, this would not be how data is managed. Recommended idiomatic usage would be to use a DataSet to manage multiple types of data that all share and reference the same, single taxon namespace.
This convention can be enforced by setting the DataSet instance to “attached taxon namespace” mode:
ds = dendropy.DataSet()
tns = dendropy.TaxonNamespace()
ds.attach_taxon_namespace(tns)
After setting this mode, all subsequent data read or created will be coerced to use the same, common operational taxonomic unit concept namespace.
Note that unless there is a need to collect and serialize a collection of data to the same file or external source, it is probably better semantically to use more specific data structures (e.g., a TreeList object for trees or a DnaCharacterMatrix object for an alignment). Similarly, when deserializing an external data source, if just a single type or collection of data is needed (e.g., the collection of trees from a file that includes both trees and an alignment), then it is semantically cleaner to deserialize the data into a more specific structure (e.g., a TreeList to get all the trees). However, when deserializing a mixed external data source with, e.g. multiple alignments or trees and one or more alignments, and you need to access/use more than a single collection, it is more efficient to read the entire data source at once into a DataSet object and then independently extract the data objects as you need them from the various collections.
The constructor can take one argument. This can either be another DataSet instance or an iterable of TaxonNamespace, TreeList, or CharacterMatrix-derived instances.
In the former case, the newly-constructed DataSet will be a shallow-copy clone of the argument.
In the latter case, the newly-constructed DataSet will have the elements of the iterable added to the respective collections (taxon_namespaces, tree_lists, or char_matrices, as appropriate). This is essentially like calling DataSet.add on each element separately.
x.__delattr__(‘name’) <==> del x.name
default object formatter
x.__getattribute__(‘name’) <==> x.name
helper for pickle
helper for pickle
x.__repr__() <==> repr(x)
x.__setattr__(‘name’, value) <==> x.name = value
size of object in memory, in bytes
x.__str__() <==> str(x)
Generic add for TaxonNamespace, TreeList or CharacterMatrix objects.
Adds a CharacterMatrix or CharacterMatrix-derived instance to this dataset if it is not already there.
| Parameters: | char_matrix (CharacterMatrix) – The CharacterMatrix object to be added. |
|---|
Adds a taxonomic unit concept namespace represented by a TaxonNamespace instance to this dataset if it is not already there.
| Parameters: | taxon_namespace (TaxonNamespace) – The TaxonNamespace object to be added. |
|---|
Adds a TreeList instance to this dataset if it is not already there.
| Parameters: | tree_list (TreeList) – The TreeList object to be added. |
|---|
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
- schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Forces all read() calls of this DataSet to use the same TaxonNamespace. If taxon_namespace If taxon_namespace is None, then a new TaxonNamespace will be created, added to self.taxon_namespaces, and that is the TaxonNamespace that will be attached.
Creates and returns a copy of self.
| Parameters: | depth (integer) – The depth of the copy:
|
|---|
Copies annotations from other, which must be of Annotable type.
Copies are deep-copies, in that the Annotation objects added to the annotation_set AnnotationSet collection of self are independent copies of those in the annotate_set collection of other. However, dynamic bound-attribute annotations retain references to the original objects as given in other, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found in attribute_object_mapper. In dynamic bound-attribute annotations, the _value attribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs the Annotation object to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which the AnnotationSet belongs (i.e., self). When a copy of Annotation is created, the object reference given in the first element of the _value tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo
| Parameters: |
|
|---|
Note that all references to other in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references to self. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
Instantiate and return a new TreeList object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One Required):
- file (file) – File or file-like object of data opened for reading.
- path (str) – Path to file of data.
- url (str) – URL of data.
- data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
Optional General Keyword Arguments:
- exclude_trees (bool) – If True, then all tree data in the data source will be skipped.
- exclude_chars (bool) – If True, then all character data in the data source will be skipped.
- taxon_namespace (TaxonNamespace) – The TaxonNamespace instance to use to manage the taxon names. If not specified, a new one will be created.
- ignore_unrecognized_keyword_arguments (bool) – If True, then unsupported or unrecognized keyword arguments will not result in an error. Default is False: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Examples:
dataset1 = dendropy.DataSet.get(
path="pythonidae.chars_and_trees.nex",
schema="nexus")
dataset2 = dendropy.DataSet.get(
url="http://purl.org/phylo/treebase/phylows/study/TB2:S1925?format=nexml",
schema="nexml")
Factory method to return new object of this class from file specified by string src.
| Parameters: |
|
|---|---|
| Returns: | pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source. |
Factory method to return new object of this class from file-like object src.
| Parameters: |
|
|---|---|
| Returns: | pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source. |
Factory method to return new object of this class from string src.
| Parameters: |
|
|---|---|
| Returns: | pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source. |
Factory method to return a new object of this class from URL given by src.
| Parameters: |
|
|---|---|
| Returns: | pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source. |
Creation and accession of new CharacterMatrix (of class char_matrix_type) into chars of self.”
Creates a new TaxonNamespace object, according to the arguments given (passed to TaxonNamespace()), and adds it to this DataSet.
Creates a new TreeList instance, adds it to this DataSet.
| Parameters: | |
|---|---|
| Returns: | t (|TreeList|) – The new TreeList instance created. |
Add data to self from data source.
Mandatory Source-Specification Keyword Argument (Exactly One Required):
- file (file) – File or file-like object of data opened for reading.
- path (str) – Path to file of data.
- url (str) – URL of data.
- data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
- schema (str) – Identifier of format of data given by the “file”, “path”, “data”, or “url” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
- exclude_trees (bool) – If True, then all tree data in the data source will be skipped.
- exclude_chars (bool) – If True, then all character data in the data source will be skipped.
- taxon_namespace (TaxonNamespace) – The TaxonNamespace instance to use to manage the taxon names. If not specified, a new one will be created unless the DataSet object is in attached taxon namespace mode (self.attached_taxon_namespace is not None but assigned to a specific TaxonNamespace instance).
- ignore_unrecognized_keyword_arguments (bool) – If True, then unsupported or unrecognized keyword arguments will not result in an error. Default is False: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Examples:
ds = dendropy.DataSet()
ds.read(
path="pythonidae.chars_and_trees.nex",
schema="nexus")
ds.read(
url="http://purl.org/phylo/treebase/phylows/study/TB2:S1925?format=nexml",
schema="nexml")
Reads data from file specified by filepath.
| Parameters: |
|
|---|---|
| Returns: | n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:
|
Reads from file (exactly equivalent to just read(), provided here as a separate method for completeness.
| Parameters: |
|
|---|---|
| Returns: | n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:
|
Reads a string.
| Parameters: |
|
|---|---|
| Returns: | n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:
|
Reads a URL source.
| Parameters: |
|
|---|---|
| Returns: | n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:
|
Reindices taxa across all subcomponents, mapping to single taxon set.
Writes out self in schema format.
Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
- file (file) – File or file-like object opened for writing.
- path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
- schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Examples
d.write(path="path/to/file.dat",
schema="nexus",
preserve_underscores=True)
f = open("path/to/file.dat")
d.write(file=f,
schema="nexus",
preserve_underscores=True)
Writes to file specified by dest.
Writes to file-like object dest.