On Monophyletic Classification
Peter Coxhead
NOTE: this page uses SVG graphics and so requires a browser capable of displaying them, e.g. Safari, Opera or Firefox 4.
Terminology
Trees
In talking about trees, the following terminology will be used.
- A tree consists of either a single node or two or more nodes connected by links. Note that a single node is a tree according to this definition.
- Each link connects exactly two nodes, one of which is the parent node, the other the child node. The length of a link is of no significance. Exactly one node, the root, has no parent. This node defines the direction of the tree: parents are closer to the root than their children. Every other node has exactly one parent.
- Nodes with no children are exterior nodes. Nodes with children are interior nodes.
- All the nodes which can be reached from a given node by a sequence of links each following the parent-to-child direction are the descendants of that node. All the nodes which can be reached from a given node by a sequence of links each following the child-to-parent direction are the ancestors of that node.
- The most recent common ancestor of a set of nodes is the ancestor node which can be reached from all of the nodes by the least total number of links.
- A subtree of a tree is any part of the tree which is itself a tree.
- To make absolutely clear that a tree with the definition given here is meant, the term proper tree will sometimes be used. There is no difference between a tree and a proper tree.
Classifications
Given a particular tree, we can select groups (sets) of its nodes. These groups can be exhaustively divided into three kinds.
- A monophyletic group of nodes consists of all of the descendants of the most recent common ancestor of the members of the group, plus the most recent common ancestor itself, and no other nodes. (It can be argued that the literal meaning of the term 'monophyletic' is that there should be a single ancestor but not that all of its descendants should included, so that for this situation, the term 'holophyletic' should be employed. However, the use of monophyletic in the first sense is now well-established.)
- A paraphyletic group consists of a monophyletic group with one or more smaller monophyletic groups removed. A paraphyletic group of nodes thus contains only some of the descendants of the most recent common ancestor of the members of the group, with the missing descendants forming one or more monophyletic groups. Paraphyletic groups are thus 'nearly' monophyletic. (When one group is removed, the result is a singly paraphyletic group; when two, a twice paraphyletic group; and so on. Singly paraphyletic groups are perhaps the most commonly used in classification.)
- A polyphyletic group of nodes is any group of nodes which is other than monophyletic or paraphyletic.
As defined here, these terms form a hierarchy:
- Monophyletic
- Non-monophyletic
- Paraphyletic
- Polyphyletic
Classifying exterior nodes
Consider a group of entities to be classified where the relationships between the entities can be shown as a tree. The entities might be species and the tree one produced by phylogenetic analysis. All that is necessary is that the entities are considered both discrete and related by a tree.
An example with nine species is shown below (Fig. 1). The nine species (shown as coloured circles) are to be 'classified', i.e. divided into groups. Note that at this stage the focus is only on the nine species which form the exterior nodes of the tree; the interior nodes present in the interior of the tree (shown as smaller open circles) may or may not be included in any particular classification. The tree is treated only as a way of showing the pattern of descent of the exterior nodes.
Fig. 1 – Example phylogenetic tree
A monophyletic classification must ensure that every group is monophyletic, consisting of all the descendants of the group's most recent common ancestor. In tree terms, the exterior nodes must be partitioned such that each group forms the exterior nodes of a subtree of the original tree. Thus the following (Fig. 2) is a monophyletic classification with four groups, where the blue triangles enclose the four groups. Notice that each of the four groups is described by a proper tree and that there is also a proper tree describing the four groups.
Fig. 2 – a 4 group monophyletic classification
There are other equally valid monophyletic classifications, for example the rather extreme two group classification shown below (Fig. 3). Again, each of the two groups is described by a proper tree and there is a proper tree describing the two groups (given that formally a tree can consist of a single node).
Fig. 3 – a 2 group monophyletic classification
A further monophyletic classification, this time with three groups, is shown below (Fig. 4).
Fig. 4 – a 3 group monophyletic classification
The diagram below (Fig. 5) shows a non-monophyletic classification with two groups. The group to the right is monophyletic (i.e. can be described by a proper tree). The group to the left is not. The most recent common ancestor of the species in this group is at the root of the tree. The left group does not include all the descendants of that ancestor, since six of them have been placed in the right group. The left group is paraphyletic.
Non-monophyletic classifications produce links which are neither strictly internal to the groups (dashed black lines in the figures), nor strictly external, linking groups (solid black lines as in Fig. 4), but instead produce some links which cross group boundaries (the solid red line in Fig. 5).
Fig. 5 – a 2 group 'crown and stem' classification
The topology of the classification above (Fig. 5) can also be described in terms of 'crown' and 'stem' groups. (The term 'crown group' when applied to real species is usually restricted to extant species and 'stem group' to extinct species. Only the shape produced by the classification is at issue here.) The right group is a 'crown group': a monophyletic subtree of the original full tree. The left group is a 'stem group': a paraphyletic grouping of all the remaining species descended from the common ancestor of the stem group and the crown group.
It should be noted that forming any monophyletic group from the original group of entities produces a 'crown and stem' classification, although this may not be immediately obvious. Consider the diagram shown below (Fig. 6). At first sight it looks as though the embedded group consisting of only two species is not a crown group, so that the other larger group is not a stem group.
Fig. 6 – another 2 group 'crown and stem' classification
However, the ordering of species in a tree is arbitrary, so that the tree below (Fig. 7) is entirely equivalent. Now there is more obviously a very small crown group and a large stem group.
Fig. 7 – re-ordered 2 group 'crown and stem' classification
By contrast the diagram shown below (Fig. 8) shows a truly polyphyletic classification. The embedded group of five nodes (species) is not monophyletic as it does not contain all the descendants of its members' most recent common ancestor. Nor is it paraphyletic, since the missing descendants do not form a monophyletic group. Hence it is polyphyletic.
Fig. 8 – a non-monophyletic classification
No amount of re-ordering alters this conclusion. If we re-order the tree to move the polyphyletic group of five species to the outside, as shown below (Fig. 9), it remains polyphyletic.
Fig. 9 – re-ordered non-monophyletic classification
Monophyletic classifications based entirely on exterior nodes have desirable properties. The groups clearly match the phylogenetic tree: every group has a proper internal tree, and the groups themselves are the nodes of a proper tree. This latter property means that the classification is hierarchical: we can produce a monophyletic classification of the groups. Thus, for example, species can be grouped into genera, genera into families, families into orders, and so on, and proper trees can be drawn at each level.
Including interior (ancestor) nodes
If a tree truly represents the evolution of the species represented as the terminal nodes, then there should be some internal ancestor species. There should, for example, be at least one species along the link between two interior nodes, representing the ancestral species which later divided at the node above into two different species. Consider the four group monophyletic classification presented above, but now including an internal ancestor node (black circle in Fig. 10 below) which is to be included in the classification.
Fig. 10 – a 4 group monophyletic classification with an unclassified internal node
Including this species in a non-hierarchical monophyletic classification is straightforward. The smallest monophyletic group which includes the ancestor species is shown in yellow in the diagram below (Fig. 11). The classification is non-hierarchical: if solid nodes represented species, blue groups genera and the yellow group a family, then the ancestral species included in the classification belongs to a family but not to a genus. This is not allowed in the Linnean system.
Fig. 11 – a 4 group monophyletic classification with a classified internal node
If we try to include the ancestor node in one of the existing groups, this group will no longer be monophyletic. For example, the diagram below (Fig. 12) shows what happens if the ancestor of the two rightmost groups is included in the first of these. The expanded group is now paraphyletic.
Fig. 12 – non-monophyletic classification with 1 classified internal node
If a classification includes an ancestor of more than one monophyletic group, the resulting classification cannot be both hierarchical and monophyletic.
There are several possible ways of dealing with this problem.
- Those who work only with extant species can simply ignore ancestor species and always produce hierarchical monophyletic classifications. Clearly this 'solution' is not possible for paleontologists or those interested in the whole tree of life.
- Hierarchy can be maintained by allowing paraphyletic 'stem' groups alongside monophyletic 'crown' groups.
- Maintaining monophyletic groups requires the abandonment of the hierarchical Linnean approach to classification (which goes back at least to Aristotle). The PhyloCode is a proposal for an non-hierarchical monophyletic classification system. It does not seem that most biologists are (yet?) willing to forgo the familiarity and advantages of the Linnean system.
Consequences of a strictly monophyletic classification
Suppose a widely dispersed species (Species A) becomes divided into four populations by geographical barriers, such that although there remains gene flow across all four populations, this is markedly less than within the populations, which devolop genetic and morphological differences. We would then say that the four populations form subspecies. The temporal order in which the geographical barriers came into effect will define the branching order of the subspecies. Suppose this is as shown in Fig. 13. The classification into one species with four subspecies is both hierarchical and monophyletic.
Fig. 13 – 4 subspecies of one species
Now suppose that a population within Subspecies I becomes completely separated and undergoes rapid evolution, producing distinct morphological and behavioural differences which reduce to zero the gene flow between this population and the rest of Subspecies I. A new species (Species B) has been formed. The rest of the population of Subspecies I continues to exchange genes with the other subspecies of Species A. There is some evidence (Waits et al. 2008) that this may the case with the Brown/Grizzly Bear and the Polar Bear. The Polar Bear appears more closely related to some populations of Brown Bear than these populations are to other populations of Brown Bear. If this is correct, the situation is as shown in Fig. 14, with Species A = Ursus arctos (Brown/Grizzly Bear) and Species B = Ursus maritimus (Polar Bear). However U. arctos is then paraphyletic. A strictly monophyletic classification must use one, three or five groups. If Species B in the diagram is indeed treated as a species, then at least "Species A subsp. I" must also be treated as a species.
Fig. 14 – Non-monophyletic classification
Even if this is not the correct phylogeny for the Brown and Polar Bears, the problem remains. If one entity within an initially relatively homogenous group undergoes much faster diversification than the rest, the unity of the remaining members of the group cannot be maintained in a monophyletic classification, regardless of their biological similarities. Why should the existence of one divergent member of a group force us to ignore the similarity of the others?