Population Genetic Summary StatisticsΒΆ

The dendropy.calculate.popgenstat module provides functions that calculate some common population genetic summary statistics.

For example, given a DnaCharacterMatrix as an argument, the num_segregating_sites function returns the raw number of segregating sites, average_number_of_pairwise_differences returns the average number of pairwise differences, and nucleotide_diversity returns the nucleotide diversity.

More complex statistics are provided by the PopulationPairSummaryStatistics class. Objects of this class are instantatiated with two lists of DnaCharacterDataSequence objects as arguments, each representing a sample of DNA sequences drawn from two distinct but related populations. Once instantiated, the following attributes of the PopulationPairSummaryStatistics object are available:

average_number_of_pairwise_differences
The average number of pairwise differences between every sequence across both populations.
average_number_of_pairwise_differences_between
The average number of pairwise differences between every sequence between both populations.
average_number_of_pairwise_differences_within
The average number of pairwise differences between every sequence within each population.
average_number_of_pairwise_differences_net
The net number of pairwise differences.
num_segregating_sites
The number of segregating sites.
wattersons_theta
Watterson’s theta.
wakeleys_psi
Wakeley’s psi.
tajimas_d
Tajima’s D.

The following example calculates the suite of population genetic summary statistics for sequences drawn from two populations of sticklebacks. The original data consists of 23 sequences, with individuals from Eastern Pacific populations identified by their taxon labels beginning with “EPAC” and individuals from Western Pacific populations identified by their taxon labels beginning with “WPAC”. The taxon labels thus are used as the basis for sorting the sequences into the required lists of DnaCharacterDataSequence objects, p1 and p2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#! /usr/bin/env python
# -*- coding: utf-8 -*-

import dendropy
from dendropy.calculate import popgenstat

seqs = dendropy.DnaCharacterMatrix.get(
        path="orti1994.nex",
        schema="nexus")
p1 = []
p2 = []
for idx, t in enumerate(seqs.taxon_namespace):
    if t.label.startswith('EPAC'):
        p1.append(seqs[t])
    else:
        p2.append(seqs[t])
pp = popgenstat.PopulationPairSummaryStatistics(p1, p2)

print('Average number of pairwise differences (total): %s' \
    % pp.average_number_of_pairwise_differences)
print('Average number of pairwise differences (between populations): %s' \
    % pp.average_number_of_pairwise_differences_between)
print('Average number of pairwise differences (within populations): %s' \
    % pp.average_number_of_pairwise_differences_within)
print('Average number of pairwise differences (net): %s' \
    % pp.average_number_of_pairwise_differences_net)
print('Number of segregating sites: %s' \
    % pp.num_segregating_sites)
print("Watterson's theta: %s" \
    % pp.wattersons_theta)
print("Wakeley's Psi: %s" \
    % pp.wakeleys_psi)
print("Tajima's D: %s" \
    % pp.tajimas_d)

Lines 6-12 build up the two lists of DnaCharacterDataSequence objects by sorting the original sequences into their source populations based on the taxon label (with operational taxonomic units with labels beginning with “EPAC” coming from the Eastern Pacific, and assigned to the list p1, while those that begin with “WPAC” coming from the Western Pacific, and assigned to the list p2). These lists are then passed as the instantiation arguments to the PopulationPairSummaryStatistics constructor in line 14. The calculations are performed immediately, and the results reported in the following lines.

Previous topic

Phylogenetic Character Analyses

Next topic

Phylogenetic Distance-Based Operations

Documentation

Obtaining

DiscussionGoogle Groups

Join the " DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.

AnnouncementsGoogle Groups

Join the " DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the " dendropy-announce" group, or click here to visit this group page directly.

DevelopmentGitHub

  • Issues - Report bugs or request features
  •  Watch - Follow development activity
  •   Fork - Contribute and collaborate
  •   Star - Throw some glitter, add some glamour