Theoretical And Practical Utility Of Gene Sequences in Phylogenetic and Phylogeographic Analysis
Makowsky, Robert Aaron
MetadataShow full item record
Phylogenetics, or the study of evolutionary relationships among organisms, is a rapidly changing field due primarily to the dramatic increase in available molecular characters and increasingly sophisticated theoretical and computational methods. Current phylogenetic methods, though, poorly handle such large datasets due to the extremely large number of calculations required. In this dissertation, I focus on a method that can reduce datasets with a large number of molecular characters and at the same time optimize the performance of phylogenetic methods. Chapter 1 provides background information about phylogenetic methods, with a specific emphasis on Bayesian phylogenetic methods. MrBayes is the most common program used for Bayesian phylogenetic analyses and has promise with larger datasets due to its ability to partition datasets and easily utilize parallel processing techniques. Therefore, I focus on the methods implemented in MrBayes. Specifically, I discuss some of the aspects associated with search parameters, the utilization of Metropolis Coupled Markov Chain Monte Carlo analyses, and well as the calculation of Bayes factors. Chapter 2 focuses on determining the appropriate genes for phylogeny reconstruction, which can be a difficult process. Rapidly evolving genes tend to perform best for resolving of recent relationships, but suffer from alignment issues and increased homoplasy (e.g., sequence saturation) among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 widely accepted (based on multiple independent sources) relationships within the subphylum Vertebrata. Sequences of 12 genes were downloaded from Genbank and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of nucleotide sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene(s) to resolve a wide range of relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. In Chapter 3, the findings in Chapter 2 were taken into account when deciding what genes to include in the analysis. This chapter is a detailed analysis of the evolutionary history of the plain-bellied watersnake, Nerodia erythrogaster. Here, I sought to determine if the currently defined subspecies in the plain-bellied watersnake are concordant with results based on relatively neutral genetic markers. Species with morphological varieties (such as the plain-bellied watersnake) that are subdivided geographically have often been divided into subspecies. The morphological pattern, though, may not be congruent with the organism's evolutionary history (i.e. genetic drift, environmentally determined instead of selection). I choose this species because it occurs across multiple biogeographic barriers (Mississippi River, Apalachicola River) and contains multiple subspecies. My goals are to 1) provide a rigorous genetic analysis of N. erythrogaster throughout its range and determine what, if any, genetic lineages can be identified using mitochondrial DNA; 2) test whether monophyletic lineages are concordant with the current taxonomy or probable biogeographic barriers (Mississippi and Apalachicola River); and 3) assess the degree of ecological niche differentiation among lineages. To identify evolutionary lineages, we sequenced three genes (NADH II, Cyt-b, Cox I) from 156 geo-referenced specimens. Ecological niches were defined using bioclimatic layers for the five recovered genetic lineages, only one of which is concordant with a currently recognized subspecies (N. e. erythrogaster) and biogeographic barrier (Apalachicola River). The recovered phylogeny is weakly supported overall, although some major genetic lineages exist. All previous taxonomic and biogeographic hypotheses are strongly disfavored compared to the best tree and ecological separation among lineages is minimal. Overall, we found no genetic support for the subspecies based on geography and conclude while some genetic and niche differentiation is evident, it is not enough to warrant taxonomic changes.