Genome content predicts the carbon catabolic preferences of heterotrophic bacteria

Nature Microbiology (2023)Cite this article

5 Altmetric

Metrics details

Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse across environments. It is challenging to map metabolic interactions and niches within microbial communities due to the large number of metabolites that could serve as potential carbon and energy sources for heterotrophs. Whether their metabolic niches can be understood using general principles, such as a small number of simplified metabolic categories, is unclear. Here we perform high-throughput metabolic profiling of 186 marine heterotrophic bacterial strains cultured in media containing one of 135 carbon substrates to determine growth rates, lag times and yields. We show that, despite high variability at all levels of taxonomy, the catabolic niches of heterotrophic bacteria can be understood in terms of their preference for either glycolytic (sugars) or gluconeogenic (amino and organic acids) carbon sources. This preference is encoded by the total number of genes found in pathways that feed into the two modes of carbon utilization and can be predicted using a simple linear model based on gene counts. This allows for coarse-grained descriptions of microbial communities in terms of prevalent modes of carbon catabolism. The sugar–acid preference is also associated with genomic GC content and thus with the carbon–nitrogen requirements of their encoded proteome. Our work reveals how the evolution of bacterial genomes is structured by fundamental constraints rooted in metabolism.

This is a preview of subscription content, access via your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

$119.00 per year

only $9.92 per issue

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

All growth and genomic data are available at https://doi.org/10.17632/xfh8t8568g.1. All isolates are available from either M.G. (Europe) or O.X.C. (USA) on request. All genome assemblies are available under BioProjects PRJNA319196 and PRJNA478695, with the exception of strains 1A06 (PRJNA318805), 12B01 (PRJNA13568), 13B01 (PRJNA318805), DSS-3 (BioSample SAMN02604003) as well as AS40, AS56, AS88 and AS94 (PRJNA996876). Source data are provided with this paper.

All code needed to reproduce the figures are available at https://doi.org/10.17632/xfh8t8568g.1.

Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

Article CAS Google Scholar

Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

Article CAS PubMed PubMed Central Google Scholar

Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).

Article PubMed Google Scholar

Pontrelli, S. et al. Metabolic cross-feeding structures the assembly of polysaccharide degrading communities. Sci. Adv. 8, eabk3076 (2022).

Article CAS PubMed PubMed Central Google Scholar

Gralka, M., Szabo, R., Stocker, R. & Cordero, O. X. Trophic interactions and the drivers of microbial community assembly. Curr. Biol. 30, R1176–R1188 (2020).

Article CAS PubMed Google Scholar

Pollak, S. et al. Public good exploitation in natural bacterioplankton communities. Sci. Adv. 7, eabi4717 (2021).

Article CAS PubMed PubMed Central Google Scholar

Moran, M. A. The global ocean microbiome. Science 350, aac8455 (2015).

Article PubMed Google Scholar

Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to rapid micro-scale successions on model marine particles. Nat. Commun. 7, 11965 (2016).

Article CAS PubMed PubMed Central Google Scholar

Enke, T. N. et al. Modular assembly of polysaccharide-degrading marine microbial communities. Curr. Biol. 29, 1528–1535 (2019).

Article CAS PubMed Google Scholar

Fahimipour, A. K. & Gross, T. Mapping the bacterial metabolic niche space. Nat. Commun. 11, 4887 (2020).

Article CAS PubMed PubMed Central Google Scholar

Kehe, J. et al. Positive interactions are common among culturable bacteria. Sci. Adv. 7, eabi7159 (2021).

Article CAS PubMed PubMed Central Google Scholar

Kirchman, D. L. The ecology of Cytophaga–Flavobacteria in aquatic environments. FEMS Microbiol. Ecol. 39, 91–100 (2002).

CAS PubMed Google Scholar

Buchan, A., LeCleir, G. R., Gulvik, C. A. & González, J. M. Master recyclers: features and functions of bacteria associated with phytoplankton blooms. Nat. Rev. Microbiol. 12, 686–698 (2014).

Article CAS PubMed Google Scholar

Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).

Article CAS PubMed PubMed Central Google Scholar

Barberán, A., Caceres Velazquez, H., Jones, S. & Fierer, N. Hiding in plain sight: mining bacterial species records for phenotypic trait information. mSphere 2, e00237-17 (2017).

Article PubMed PubMed Central Google Scholar

Mende, D. R. et al. ProGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Res. 48, D621–D625 (2020).

CAS PubMed Google Scholar

Sueoka, N. Correlation between base composition of deoxyribonucleic acid and amino acid composition of protein. Proc. Natl Acad. Sci. USA 47, 1141–1149 (1961).

Article CAS PubMed PubMed Central Google Scholar

Hellweger, F. L., Huang, Y. & Luo, H. Carbon limitation drives GC content evolution of a marine bacterium in an individual-based genome-scale model. ISME J. 12, 1180–1187 (2018).

Article CAS PubMed PubMed Central Google Scholar

Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).

Article CAS PubMed Google Scholar

Mende, D. R. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat. Microbiol. 2, 1367–1373 (2017).

Article CAS PubMed Google Scholar

Musto, H. et al. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem. Biophys. Res. Commun. 347, 1–3 (2006).

Article CAS PubMed Google Scholar

Estrela, S. et al. Functional attractors in microbial community assembly. Cell Syst. 13, 29–42 (2022).

Article CAS PubMed Google Scholar

Amarnath, K. et al. Stress-induced metabolic exchanges between complementary bacterial types underly a dynamic mechanism of inter-species stress resistance. Nat. Commun. 14, 3165 (2023).

Article CAS PubMed PubMed Central Google Scholar

Estrela, S., Diaz-Colunga, J., Vila, J. C., Sanchez-Gorostiaga, A., & Sanchez, A. Diversity begets diversity under microbial niche construction. Preprint at bioRxiv https://doi.org/10.1101/2022.02.13.480281 (2022).

Schink, S. J. et al. Glycolysis/gluconeogenesis specialization in microbes is driven by biochemical constraints of flux sensing. Mol. Syst. Biol. 18, e10704 (2022).

Article CAS PubMed PubMed Central Google Scholar

Basan, M. et al. A universal trade-off between growth and lag in fluctuating environments. Nature 584, 470–474 (2020).

Article CAS PubMed PubMed Central Google Scholar

Plucain, J. et al. Epistasis and allele specificity in the emergence of a stable polymorphism in Escherichia coli. Science 343, 160–164 (2014).

Article Google Scholar

Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proc. Natl Acad. Sci. USA 105, 7899–7906 (2008).

Article CAS PubMed PubMed Central Google Scholar

Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & Schneider, D. Ecological and evolutionary dynamics of coexisting lineages during a long-term experiment with Escherichia coli. Proc. Natl Acad. Sci. USA 109, 9487–9492 (2012).

Article PubMed PubMed Central Google Scholar

Hershberg, R. & Petrov, D. A. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 6, e1001115 (2010).

Article PubMed PubMed Central Google Scholar

Ely, B. Genomic GC content drifts downward in most bacterial genomes. PLoS ONE 16, e0244163 (2021).

Article CAS PubMed PubMed Central Google Scholar

Maddamsetti, R. & Grant, N. A. Divergent evolution of mutation rates and biases in the long-term evolution experiment with Escherichia coli. Genome Biol. Evol. 12, 1591–1603 (2020).

Article CAS PubMed PubMed Central Google Scholar

Yakovchuk, P., Protozanova, E. & Frank-Kamenetskii, M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 34, 564–574 (2006).

Article CAS PubMed PubMed Central Google Scholar

Lassalle, F. et al. GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 11, e1004941 (2015).

Article PubMed PubMed Central Google Scholar

Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code. Science 370, 683–687 (2020).

Article CAS PubMed Google Scholar

Smriga, S., Ciccarese, D. & Babbin, A. R. Denitrifying bacteria respond to and shape microscale gradients within particulate matrices. Commun. Biol. 4, 570 (2021).

Article CAS PubMed PubMed Central Google Scholar

Gowda, K., Ping, D., Mani, M. & Kuehn, S. Genomic structure predicts metabolite dynamics in microbial communities. Cell 185, 530–546 (2022).

Article CAS PubMed Google Scholar

Moran, M. A. et al. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment. Nature 432, 910–913 (2004).

Article CAS PubMed Google Scholar

Ben-Haim, Y. et al. Vibrio coralliilyticus sp. nov., a temperature-dependent pathogen of the coral Pocillopora damicornis. Int. J. Syst. Evol. Microbiol. 53, 309–315 (2003).

Article CAS PubMed Google Scholar

Hehemann, J. H. et al. Adaptive radiation by waves of gene transfer leads to fine-scale resource partitioning in marine microbes. Nat. Commun. 7, 12860 (2016).

Article CAS PubMed PubMed Central Google Scholar

Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

Article CAS PubMed PubMed Central Google Scholar

Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

Article CAS PubMed PubMed Central Google Scholar

Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).

Article Google Scholar

Huerta-Cepas, J. et al. EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).

Article CAS PubMed Google Scholar

Zhang, H. et al. DbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).

Article CAS PubMed PubMed Central Google Scholar

Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36, 1925–1927 (2020).

Article CAS Google Scholar

Shen, W. & Ren, H. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).

Article PubMed Google Scholar

Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: COnstraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 74 (2013).

Article PubMed PubMed Central Google Scholar

Wolfram Mathematica v. 13.2 (Wolfram, 2022).

R: A Language and Environment for Statistical Computing (R Core Team, 2022).

Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).

Article Google Scholar

Paradis, E. & Schliep, K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).

Article CAS PubMed Google Scholar

Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).

Article CAS PubMed PubMed Central Google Scholar

Schliep, K. P. phangorn: Phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).

Article CAS PubMed Google Scholar

Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

Article CAS PubMed Google Scholar

Heinken, A. et al. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01628-0 (2023).

Heinken, A., Magnúsdóttir, S., Fleming, R. M. T. & Thiele, I. DEMETER: efficient simultaneous curation of genome-scale reconstructions guided by experimental data and refined gene annotations. Bioinformatics 37, 3974–3975 (2021).

Article CAS PubMed PubMed Central Google Scholar

Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

Article CAS PubMed PubMed Central Google Scholar

Hubert, B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. Sci. Data 9, 92 (2022).

Article CAS PubMed PubMed Central Google Scholar

Lagadec, E., Småge, S. B., Trösse, C. & Nylund, A. Phylogenetic analyses of Norwegian Tenacibaculum strains confirm high bacterial diversity and suggest circulation of ubiquitous virulent strains. PLoS One 16, e0259215 (2021).

Article CAS PubMed PubMed Central Google Scholar

Ekborg, N. A. et al. Saccharophagus degradans gen. nov., sp. nov., a versatile marine degrader of complex polysaccharides. Int. J. Syst. Evol. Microbiol. 55, 1545–1549 (2005).

Article CAS PubMed Google Scholar

Download references

We thank S. Estrela (Yale University and Stanford University) for providing community composition data from their enrichment experiments (Fig. 4d); A. Sichert for assembling genomes; and M. d. Bello, X. Shan, T. Hwa as well as all members of the Cordero laboratory and Simons PriME collaboration for their enriching discussions. We acknowledge funding from the Simons Collaboration: Principles of Microbial Ecosystems (PriME) award number 542395 (O.X.C.) and Simons Foundation Postdoctoral Fellowship Award number 599207 (M.G.).

Matti Gralka

Present address: Systems Biology Group, Amsterdam Institute for Life and Environment (A-LIFE) and Amsterdam Institute of Molecular and Life Sciences (AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

Shaul Pollak

Present address: Division of Microbial Ecology, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

Matti Gralka, Shaul Pollak & Otto X. Cordero

You can also search for this author in PubMed Google Scholar

M.G. designed the study, performed all experiments, analysed all data and wrote the initial manuscript. S.P. analysed the genomic data from the proGenomes database. M.G., S.P. and O.X.C. discussed the results. O.X.C. directed the project and edited the manuscript.

Correspondence to Matti Gralka or Otto X. Cordero.

The authors declare no competing interests.

Nature Microbiology thanks Sara Mitri, Seppe Kuehn and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The tree and taxonomy were created using the GTDB-tk classify workflow using standard parameters from an alignment of 120 marker genes. The legend corresponds to expected substitutions per site. A full list of all strains is provided in Supplementary Table 1.

a, Number of carbon sources supporting growth per strain. b, Fraction of all strains that were able to use a given substrate as their sole carbon and energy source. c, There was a lack of strong correlation between the number of carbon sources that support growth, growth rate and yield. Average yield (blue dots) and rate (red squares) binned by the number of carbon sources that supported growth, shown as the mean ± s.d. (for a total of n = 182 strains showing growth on at least one substrate). Lines and P values are derived from linear regressions. More generalist species (more carbon sources consumed) achieve slightly higher average yield but the effect size is likely not practically relevant. d, For each condition (substrates × strain), we plotted the growth rate and yield, which are very slightly positively correlated (linear regression P = 2 × 10−6, R2 = 0.005). Points on the far right correspond to the maximal detectable growth rate given our spacing of experimental time points. e, Linear slopes for the per strain regression of yield with growth rate; only 3/186 strains exhibited a statistically significant correlation (linear regression) between rate and yield. The vertical line corresponds to the slope of the regression over all conditions.

a–c, Phenotype distance, defined as the cosine distance between consumption vectors, as a function of genomic distance between pairs of strains, where the genomic distance is the GTDB-tk distance (a), the Bray–Curtis distance between gene content (b; based on copy numbers of KEGG KO) or module content (c; based on abundance of KEGG modules). Points are the mean ± s.d. of logarithmic bins; n = 16,471 total comparisons.

a, Principal component analysis of the full growth rate matrix, reproduced from Fig. 1 in the main text. b, Averaged loadings of fine-grained categories of substrates normalized to unit length. Detailed loadings of all substrates in the principal component analysis in a. The full principal component analysis shows a clear separation of preferences for organic (including alcohols and aromatics) and amino acids. c,d, Individual loadings per substrate for each principal component (PC; left). Note that all acids have negative loadings on PC1 but all but two organic acids switch sign on PC2 relative to amino acids.Scatter plots of the first principal component (based on full growth rate matrix) versus the SAP as defined in the main text, and the second principal component versus the amino acid–organic acid preference defined analogously (right). Each point is a different isolate, coloured by taxonomic order (as in Fig. 1). P values are derived from linear regressions.

a, Re-analysis of data from Kehe and colleagues11. The heat map corresponds to their extended data fig. 2 (final optical density in each condition) except with rows and columns sorted by cosine similarity. b, Principal component analysis of this matrix shows the clustering of the two taxonomic orders and their alignment with the average loadings of acids and sugars. c, Phylogenetic tree based on GTDB-tk of species contained in the IJSEM and DEMETER trait databases as well as proGenomes (by species name). Note that two large phyla, Actinobacteriota and Firmicutes, are not at all represented in our strain library.

a, Smooth histograms of the pairwise correlation coefficients between the growth vectors of strains across all three experiments (V1, V2, V3; V3 is the experiment primarily discussed in the main text). b, Scatter plots of the SAP measured for each strain between all three replicate experiments. P values are derived from linear regressions.

Completeness, coverage, and duplication are defined in detail in Methods. a, Predicting coverage from completeness (linear model) generally yields higher quality fits than predicting coverage from duplication. b, After correcting for completeness, duplication tends to explain more of the residuals than completeness does after correcting for duplication. c, Neither duplication nor coverage of any individual pathway correlated very strongly with SAP, and whether duplication or coverage of a given pathway was more predictive of SAP depended on the pathway. d, Illustrating the concept of functional duplication on the example of the galactose degradation pathway (KEGG pathway ko00052). Shown is the central part of the pathway that converts lactose and other oligosaccharides first to β-d-galactose, which is transformed through multiple steps to α-d-glucose-6-phosphate, which then enters glycolysis. For some reaction, we found multiple orthologues in the same strains (for example, up to six orthologues of K01785 (galM, aldose 1-epimerase, EC:5.1.3.3). These orthologues are not exact duplicates, as illustrated by the tree on the right. The tree is based on a multiple sequence alignment of all sequences annotated K01785 across all strains. We have highlighted the six copies found in the Zobellia strains A2M03, which are spread around the tree and often grouped with orthologues found in distantly related species. In fact, across all highly duplicated orthologues (maximum number of orthologues per strains of at least six), the pairwise distance (computed from the multiple sequence alignments for each KEGG orthologue using the dist.ml function of the phangorn package in R), was about equally likely to be greater between orthologues in the same strain relative to orthologues in different strains, as it was to be smaller. Thus, ‘duplicated’ orthologues in a strain probably represent functional variants of different evolutionary origin. e,f, Average distances between KEGG orthologues within and between strains for genes associated with sugar and acid catabolism. The KEGG orthologues in black have a more than 10% difference between the two distances. Points represent the mean ± s.e.m.; the number of comparisons differs for each gene, from n = 496 to n = 179,101. g, Comparison between measured and predicted growth on individual substrates. Predicted growth was derived from FBA simulations of genome-scale metabolic models created using CarveMe using standard parameters (no gapfilling). This procedure yielded 58% correct predictions (vertical line), which was within the range of correct predictions achieved when the comparison was performed with shuffled labels (distribution, obtained by shuffling labels 1,000 times, each time measuring the proportion of correct predictions).

a–d, Number of CAZymes (a,b, glycosyl hydrolases; and c,d, polysaccharide lyases) and their correlation with SAPs (b,d). b,d, The insets show −log10P per order, the negative log10 of the P value obtained from linear regressions of CAZyme number with SAP within each order; −log10P > 2 (vertical line) corresponds to a significant correlation at the 5% level, Bonferroni corrected for multiple testing. b, The square symbols correspond to the squares in Fig. 1d. These are exceptions to the median metabolic preference per order, such as the acid-specialist Tenacibaculum genus in the Flavobacteriales, which includes fish pathogens60. Conversely, the orders Pseudomonadales and Rhodobacterales (commonly thought to specialize in simple substrates13) tended to prefer acids (SAP < 0), but we also found the sugar-specialist Pseudomonadales genus Saccharophagus, which are known sugar degraders61. The Flavobacteriales and Pseudomonadales strains with atypical phenotypes for their taxonomy tended to have fewer/more CAZymes than their close relatives, respectively. Small points correspond to individual isolates, large points with error bars indicate the mean ± s.d. for each order (a,c, n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales), 32 (Flavobacteriales)) or SAP bin (b,d, total number of strains n = 182).

a,The GC content (measured across all predicted coding regions) is relatively conserved at the order level across our strain library (n = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales) and 32 (Flavobacteriales)). b, The GC content predicts the carbon and nitrogen requirements per coded amino acid. All protein sequences were manually scored according to the number of carbon and nitrogen atoms of each amino acid. c, Same data as Fig. 3b without binning: GC content is correlated with genomic GC content across the whole set of strains but not within orders, possibly because GC content evolves very slowly and is thus relatively conserved below the order level. Notably, this correlation was much stronger than the correlation between GC content and other basic characteristics of the genomes, such as the number of coding regions (linear model fit, P = 0.2), and there was no practically significant difference between the GC content of genes in sugar- and acid-catabolic pathways (e). d, Because of the correlation between GC content and both nutrient requirements and SAP, SAP is positively/negative correlated with the number of carbon/nitrogen atoms per coded amino acid. Small points correspond to individual strains, large points with error bars indicate the mean ± s.d. for the five main orders. Lines and P values are derived from linear regressions. e, The average GC content of sugar- and acid-catabolic genes are very similar. Scatter plot of the GC content of all genes annotated as sugar/acid genes (Supplementary Table 5), extracted from the genomes and averaged per strain. The line corresponds to equal GC content in sugar/acid genes. f, Residuals of the linear fit in a, showing a weak but statistically significant (P = 6 × 10−16) trend for high GC genomes to have a slightly higher GC content in sugar genes than acid genes. g, Example for the correlation and linear regression of pathway abundance with GC content in more than n = 11,000 diverse reference genomes (proGenomes). h, Extracting the linear regression coefficients (slopes) for each pathway, all of which were highly significant, yields a picture similar to Fig. 2b, that is, sugar pathways tended to decrease and acid pathways tended to increase in abundance as a function of GC content. The slopes for sugar (n = 7) and acid (n = 26) pathways are significantly different from each other (t-test, dof = 31, T = −4.26, P = 0.00017).

a, Taxonomic distribution and distribution of SAPs in the synthetic communities, coloured by order (Fla, Flavobacteriales; Vib, Vibrionales; Alt, Alteromonadales; Pse, Pseudomonadales; Rho, Rhodobacterales; Cyt, Cytophagales). b, Richness over time in synthetic communities growing on one of four carbon sources (Fig. 4a). Points with error bars indicate the mean ± s.d. across six replicates. c, Abundance-weighted average GC content of communities enriched on acids or sugars. Genome-average GC for individual OTUs was estimated using SkewDB (Methods). The distributions are statistically significantly different (two-sided Welch’s t-test $T=6.95,{\rm{dof}}=13.8,{P}=7.5\times {10}^{-6}$). d, Final richness in synthetic communities growing on four different concentrations of GlcNAc. The communities consisted of a complex mixture of strains, of which only about half were capable of consuming GlcNAc in monoculture (consumers). The remaining species (crossfeeders) therefore must have been crossfeeding on metabolites excreted by the consumers. e,f, Average number of C or N atoms per coded amino acid in the communities, weighted by the abundance of each strain. Shown is the average over the last five time points. Asterisks indicate significant differences between conditions (P = {2, 0.2, 5.8, 6.2} × 10−6 from top to bottom in e and P = {0.01, 3.0, 3.8, 1,4} × 10−5 from top to bottom in f) in a two-tailed Mann–Whitney test (using Bonferroni correction for multiple testing). d–f,h, Small points correspond to replicates (including different dilution factors, n = 12 points per condition), large points with error bars indicate the mean ± s.d. g, Functional composition of synthetic communities growing on four different concentrations of GlcNAc as the sole carbon (but not nitrogen) source. Final species compositions are shown as bar charts, where each species is coloured according to its SAP. At low GlcNAc concentrations, more acid-specialist species (negative SAP, green tones) dominated. This trend was driven not by a change in the relative abundance of consumers (which was roughly constant across conditions) but by both consumers and crossfeeders with lower SAP dominating at lower carbon concentrations. h, This pattern was remained when perturbing the communities. All four replicate communities at the intermediate dilution factor (grown for six cycles at the highest and lowest concentration (20 and 0.02 mM GlcNAc, respectively) were transferred into all of the other concentrations, in parallel to the unperturbed communities. Consistently with the unperturbed observation, an increase/decrease in GlcNAc concentration led to an increase/decrease in cSAP, respectively. This effect was overall stronger for more severe perturbation, for example, compare the 20 mM to 2 mM switched communities (yellow) to the 20 mM to 0.02 mM switched communities (red).

Supplementary Table 1. List of strains. Supplementary Table 2. List of substrates. Supplementary Table 3. Full dataset of growth rates. Supplementary Table 4. KEGG pathways used for SAP predictions. Supplementary Table 5. KOs used for SAP predictions. Supplementary Table 6. List of sugar/acid KOs in our strains. Supplementary Table 7. Predicted SAP for reference genomes. Supplementary Table 8. OTUs for synthetic communities on four carbon sources.

Source data for Figs. 1–4.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

Gralka, M., Pollak, S. & Cordero, O.X. Genome content predicts the carbon catabolic preferences of heterotrophic bacteria. Nat Microbiol (2023). https://doi.org/10.1038/s41564-023-01458-z

Download citation

Received: 08 February 2023

Accepted: 24 July 2023

Published: 31 August 2023

DOI: https://doi.org/10.1038/s41564-023-01458-z

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Blog

Genome content predicts the carbon catabolic preferences of heterotrophic bacteria