Abstract Detail



Comparative Genomics/Transcriptomics

Ou, Shujun [1], Jiang, Ning [2].

The genomic composition and domestication of Asian rice revealed by 3,400 rice genomes.

Rice is a model species for studying monocot plants and understanding the mechanisms of crop domestication. Since the publication of the rice genome in 2002, many efforts have focused on understanding its genetic variations in the population level. In 2012, a research group from China published a study about resequencing of 1,543 rice varieties and wild rice accessions. Two years later, an international consortium released the sequencing of 3,024 rice varieties with an average sequencing depth of 14X. These studies have identified genomic regions undergone selective sweeps; however, the sequential order of rice domestication events remains largely unknown. To understand the dynamic of genomic landscape shift during rice domestication, we combined these public “big data” for a better understanding of rice domestication history. In this study, a total of 26,000 sequencing files (SRA format) with the total size of 17.6 TB were trimmed, mapped, and PCR duplicate-removed and InDel-realigned using Cutadapt, BWA, and Picard tools, respectively, at the MSU High-Performance Computation Center (HPCC). To accurately call variants in these mapping files, a joint variant-calling procedure was carried out using GATK. Genomic Variant Call Format (GVCF) files were first produced for each sample to accelerate the whole procedure, then joint genotyping was performed on all samples. To filter out unreliable variants, the machine learning-guided Variant Quality Score Recalibration (VQSR) from the GATK package was carried out using the rice SNP database (dbSNP) from NCBI. With further hard filtering, finally, a total of 23.8 million high-confident variants were retained which was equivalent to only 3% of the original data size. With this high-quality and high-density variant dataset for the largest sequenced plant population, phylogenetic analysis, genomic admixture analysis, and principal component clustering were performed to understand the genomic composition of Asian rice. The ultimate goal of this project is to recover the temporal model of how rice being domesticated.


Related Links:
Raw sequencing data of 3000 rice (NCBI)
Raw sequencing data for 446 wild rice (ENA)


1 - Michigan State University, 1066 Bogue St, Room A326, East Lansing, MI, 48823, United States
2 - Michigan State University, 1066 Bogue St, Room A330, East Lansing, MI, 48823, United States

Keywords:
Oryza
domestication
Rice
selective sweep
population structure.

Presentation Type: Oral Paper
Number: 0009
Abstract ID:657
Candidate for Awards:Margaret Menzel Award


Copyright © 2000-2018, Botanical Society of America. All rights reserved