Abstract Detail

Crops and Wild Relatives

Soriano Chavez, Emilio [1], Bellis, Emily [1].

Multimodal Data Fusion for Phenotype Prediction across Environments.

A challenging problem in biology is integrating large-scale data from multiple sources to understand and predict trait variation across diverse environments. Often, the combination of high-dimensionality datasets impedes model fitting because there may be far more predictors than observations. One example involves integrating genomic features (hundreds of thousands of single nucleotide polymorphisms) and environmental variables (from multivariate environment data taken across different time intervals) to predict phenotypes. We explore the use of multiple dimensionality reduction techniques and their impact on the performance of phenotype prediction models in untested environments, using publicly available data from the Maize Genomes2Fields Initiative. To account for population structure, genomic data was condensed into meaningful predictor features using either a principal component analysis or a variational autoencoder, a deep learning-based generative model. To summarize one axis of environmental variation (temperature), we performed a discrete wavelet transform, a mathematical analysis technique. The obtained representations of genomic and environmental information were then used to train phenotype prediction models using gradient-boosted regression trees. During plant height prediction, comparisons between models that used population structure data from the variational autoencoder showed an improved prediction capability for unobserved phenotyping locations over models using principal components. For predicting grain yield, genomic features were less important for prediction in most unobserved locations when compared to environmental information, which was more significant for model accuracy. These results demonstrate the utility of deep learning-based methods to improve plant trait prediction in new environments.

1 - Arkansas State University, Computer Science

Machine Learning
Phenotype Prediction
Environment Interactions.

Presentation Type: Poster
Number: PCW002
Abstract ID:272
Candidate for Awards:None

Copyright © 2000-2022, Botanical Society of America. All rights reserved