Abstract
As soon as different strains of the same species of prokaryotes had their genomes sequenced, the enormous intraspecific variability within their genomic content became apparent. The high rates of horizontal gene transfer observed in prokaryotes, combined with other evolutionary and ecological factors (high mutation rates, large effective population size, adaptation to environmental changes and migration across different ecological niches) would explain the genomic fluidity in these organisms. It became necessary to describe this variability with new terms such as “core genome” (the set of genes shared by all the strains of the same species), “accessory genome” (all genes that are not included in the core genome) or “pangenome” (total genome). A bigger accessory genome of a specie could be explained by a broader ecological range. This work will study such association by comparing the size of the pangenome and the beta-diversity of around 250 species of prokaryotes.
We used the metagenomic dataset from the Earth Microbiome Project. Due to the currently lack of standardized protocols to quantify the beta-diversity across highly dissimilar environments, we have explored several alternative methods. One is based on supervised classification of the environmental samples according to broad physicochemical properties (Shannon Entropy); the rest are based on assessing differences in the biological composition of the samples (weighted prevalence, dendogram-based diversity, PCoA-based diversity). To measure pangenome sizes, we built orthologous gene clusters for each species from the available genomes of the NCBI genome database. The results obtained in this study show a positive although weak association between pangenome size and beta-diversity and guide future investigation towards other sources of genome variability in prokaryotes for a better understanding of the pangenome size.