Payton, Adam , Burleigh, Gordon .
Creating an Updatable Community Resource for Chloroplast Phylogenomics.
We created a Perl-based pipeline to automate the creation and updating of very large chloroplast based phylogenetic datasets. Chloroplast genes have played a substantial role in uncovering the phylogenetic relationships among land plants and represent one of the most taxon rich sequence-sets on GenBank. Our initial alignments and subsequent supermatrix tree consist of protein coding sequences from 78 chloroplast genes with over 50,000 taxa, including representatives from all major clades of embryophytes. The pipeline consists of 6 components 1) mine GenBank for all chloroplast sequences from embryophytes, 2) extract the coding regions from each sequence of interest, 3) align protein sequences to reference alignments, 4) using the original nucleotide sequence for each taxa, back translate the protein alignment to create an in-frame nucleotide alignment for each gene, 5) evaluate names of taxa against the iPlant Taxonomic Name Resolution Service to allow for synonymizing, identification of misspellings, or identify sequences with names of concern for further evaluation, 6) use these alignments for generating either gene trees or concatenate alignments for a supermatrix based phylogentic analysis. At various points in the pipeline sequences and alignments are evaluated to ensure a high quality final product. Collectively the pipeline will serve as the back end for a web-based resource that will provide access to updatable, high quality, curated sequence data, gene alignments, and trees reflecting the current chloroplast phylogenetic data.
1 - University of Florida, Biology, Carr Hall, Gainesville, FL, 32611, USA
2 - University Of Florida, P.O. Box 118526, Gainesville, FL, 32611, USA
Presentation Type: Oral Paper:Papers for Topics
Candidate for Awards:None