Phylogenetic Networks are Fundamentally Different from Other Kinds of Biological Networks (pp. 23-68)
Authors: (David A. Morrison, Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences, Uppsala, Sweden)
Abstract: Complex networks are found in all parts of biology, but there are at least two distinct
types of biological networks. In the most common type, the nodes and edges are
empirically observed, and the network analysis involves summarizing the characteristics
of the network. In the second type, only the leaf nodes are observed, and the internal
nodes and all of the edges must be inferred from information available about the leaf
nodes. Perhaps the most widespread of this inferred type of network is the phylogenetic
network, which illustrates the genealogical history and the connection of all life.
Evolution involves a series of unobservable historical events, each of which is unique,
and we can neither make direct observations of them nor perform experiments to
investigate them. This makes a phylogenetic study one of the hardest forms of data
analysis known, as there is no mathematical algorithm for discovering unique historical
accidents. This chapter summarizes the essential differences of this network type and
discusses the consequences of these differences.
Due to the complexity of evolutionary history, two types of phylogenetic networks
have been developed, which have been actively used in parallel by biologists for 150
years: (1) rooted evolutionary networks, in which the internal nodes represent ancestors
of the leaf nodes, and the directed edges represent historical pathways of transfer of
genetic information between ancestors and their descendants; and (2) unrooted datadisplay
networks, in which the internal nodes do not represent ancestors, and the
undirected edges represent affinity (e.g. similarity) relationships among the leaf nodes.
The latter type of network is the most commonly encountered in phylogenetics, because
there is a wide range of available mathematical techniques that work well. They have
been put to a number of uses by phylogeneticists, including exploratory data analysis,
displaying similarity patterns, displaying data conflicts, summarizing analysis results, and
testing phylogenetic hypotheses; and I illustrate each of these with an empirical example.
There are, as of yet, few mathematical techniques available for evolutionary networks,
and recent focus has therefore been on the development of practical and effective
methods. There are, however, a wide range of methodological questions that need to be
answered before this can happen; and I raise a number of these here, along with a
preliminary discussion of them. There are also issues related to the realism of the
common mathematical constraints, the evolutionary units in a network, and the concept
of a most recent common ancestor.