Chapter 7: Phylogenetic Prediction
A phylogenetic analysis of a family of related nucleic acid or protein sequences is a determination of how the family members might have been derived during evolution. The evolutionary relationships among the sequences are depicted by using a graph called a tree. Sequences are placed as outer branches of a tree, and the branching relationships of the inner part of the tree then reflect the degree to which different sequences are related. For example, two sequences that are very much alike will be located at neighboring outside branches and will be joined to a common branch beneath them. Less related sequences will be on branches that are more distant from each other on the tree. The object of phylogenetic analysis is to discover the branch arrangements and branch lengths in trees that best represent the relationship among all the sequences.
Phylogenetic analysis of nucleic acid and protein sequences is an important area of sequence analysis, for example, in the study of the evolution of a family of sequences. Using this type of analysis, sequences that are the most closely related can be identified by their occupying neighoring branches on a tree. Thus, when a gene family is found in an organism or group of organisms, phylogenetic relationships among the genes can help to predict which ones might have an equivalent function that has been conserved during evolution of the corresponding organisms. Such functional predictions can then be tested by genetic experiments.
Phylogenetic analysis may also be used to follow the changes occurring in a rapidly changing species, such as a virus. Analysis of the types of changes within a population can reveal, for example, whether or not a particular gene is under selection (McDonald and Kreitman 1991; Nielsen and Yang 1998), or the timing of genetic variation in the human genome (Toomajian et al. 2003).
Procedures for phylogenetic analysis are strongly linked to those for sequence alignment, which was already discussed in Chapters 3 and 5. Similar problems are also encountered. For example, just as two very similar sequences can be easily aligned even by eye, a group of sequences that are very similar but with a small level of variation throughout can easily be organized into a tree. Conversely, as sequences become more and more different through evolutionary change, they can be much more difficult to align. A phylogenetic analysis of very different sequences is also difficult to do because there are so many possible evolutionary paths that could have been followed to produce the observed sequence variation. Because of the complexity of this problem, considerable expertise is required for difficult situations.