Payton, Adam , Burleigh, Gordon .
Creating an Updatable Community Resource for Chloroplast Phylogenomics.
We created a Perl-based pipeline to automate the creation and updating of very large chloroplast based phylogenetic datasets. Chloroplast genes have played a substantial role in uncovering the phylogenetic relationships among land plants and represent one of the most taxon rich sequence-sets on GenBank. Our initial alignments and subsequent supermatrix tree consist of protein coding sequences from 78 chloroplast genes with over 50,000 taxa, including representatives from all major clades of embryophytes. The pipeline consists of 6 components 1) mine GenBank for all chloroplast sequences from embryophytes, 2) extract the coding regions from each sequence of interest, 3) align protein sequences to reference alignments, 4) using the original nucleotide sequence for each taxa, back translate the protein alignment to create an in-frame nucleotide alignment for each gene, 5) evaluate names of taxa against the iPlant Taxonomic Name Resolution Service to allow for synonymizing, identification of misspellings, or identify sequences with names of concern for further evaluation, 6) use these alignments for generating either gene trees or concatenate alignments for a supermatrix based phylogentic analysis. At various points in the pipeline sequences and alignments are evaluated to ensure a high quality final product. Collectively the pipeline will serve as the back end for a web-based resource that will provide access to updatable, high quality, curated sequence data, gene alignments, and trees reflecting the current chloroplast phylogenetic data.
Log in to add this item to your schedule
1 - University of Florida, Biology, Carr Hall, Gainesville, FL, 32611, USA
2 - University Of Florida, P.O. Box 118526, Gainesville, FL, 32611, USA
Presentation Type: Oral Paper:Papers for Topics
Location: Salon 11/The Shaw Conference Centre
Date: Tuesday, July 28th, 2015
Time: 10:30 AM
Candidate for Awards:None