Friday, September 16, 2022
HomeMicrobiologyDrivers and determinants of pressure dynamics following fecal microbiota transplantation

Drivers and determinants of pressure dynamics following fecal microbiota transplantation


Information overview

The research dataset comprised 22 unbiased cohorts recruited in facilities in the US, the Netherlands and Australia, with a complete of 316 FMTs performed in 311 sufferers affected by rCDI (n = 62 FMTs26,27,28,32,36), an infection with ESBL (n = 59 (refs. 37,38,39)), MetS (n = 50 (refs. 18,25,40)), UC (n = 42 (refs. 29,41,42,43)), anti-PD1 remedy resistance in sufferers with melanoma (n = 37 (refs. 9,10)), IBS (n = 30 (ref. 44)), Crohn’s illness (n = 18 (ref. 45)), chemotherapy-induced diarrhea in sufferers with renal carcinoma (n = 10 (ref. 46)), Tourette’s syndrome (n = 5 (ref. 47) and in wholesome volunteers (n = 3 (ref. 48)). On common, 4.11 recipient stool samples have been accessible per FMT time sequence, together with baseline samples taken earlier than the intervention (pre-FMT). General, 7.9 Terabases (Tb) of sequencing information have been analyzed throughout 1,492 fecal metagenomes, of which 269 (for 76 time sequence) have been generated as a part of the current research (for cohorts UC_NL, ESBL_NL, MetS_NL_1 and div_AU).

Three cohorts (UC_NL, MetS_NL_1 and MetS_NL_Koopen) have been randomized managed trials throughout which a subset of sufferers obtained autologous FMTs (transplantation of the recipient’s personal stool, n = 33 FMTs). All different FMTs (n = 283) have been allogenic, utilizing stool donors. For 228 FMT time sequence, a full complement of donor baseline, recipient baseline and not less than one recipient post-FMT pattern have been accessible after filtering.

A full description of all cohorts is supplied in Supplementary Desk 1, detailed info per FMT time sequence in Supplementary Desk 2 and per-sample info in Supplementary Desk 3.

Pattern assortment, processing and metagenomic sequencing

Research design and fecal pattern assortment for cohorts MetS_NL_1 (refs. 18,25), UC_NL41,61 and ESBL_NL37 have been described beforehand. rCDI_AU and UC_AU samples have been obtained from a single-center, proof-of-concept, parallel and managed research in collaboration with the Centre for Digestive Ailments (Sydney, Australia), which aimed to evaluate donor microbiota implantation in two sufferers with CDI and three with UC as much as 28 days following a 2-day fecal microbiota transplantation infusion by way of transcolonoscopy and rectal enema. The research is registered with the Australian New Zealand Scientific Trials Registry underneath ACTRN12614000503628 (Common Trial no, U1111-1156-5909). Written, knowledgeable participant consent and moral approval have been obtained by way of the Centre for Digestive Ailments Human Analysis Ethics Committee. Deidentified participant information related to the research are supplied in Supplementary Tables 2 and 3.

For cohorts MetS_NL_1 and UC_NL, fecal DNA extraction was described within the authentic research. DNA from ESBL_NL samples was extracted utilizing the GNOME DNA Isolation Package (MP Biomedicals) with the next minor modifications: cell lysis/denaturation was carried out (30 min, 55 °C) earlier than protease digestion was carried out in a single day (55 °C), and RNAse digestion (50 μl, 30 min, 55 °C) was carried out after mechanical lysis. After ultimate precipitation, DNA was resuspended in TE buffer and saved at −20 °C for additional evaluation.

Metagenomic sequencing libraries for MetS_NL_1, UC_NL, ESBL_NL and div_AU samples have been ready to a goal insert dimension of 350–400 base pairs (bp) on a Biomek FXp Twin Hybrid with high-density structure adapters, orbital shaker, static peltier and shaking peltier (Beckman Coulter) and a robotic PCR cycler (Biometra), utilizing SPRIworks HT kits (Beckman Coulter) in line with the provider’s advice, with the next modifications: 500 ng of DNA initially, adapter dilution 1:25, equipment chemical dilution 1:1 in course of. For samples with low-input DNA concentrations, libraries have been as a substitute ready manually utilizing NEBNext Extremely II DNA Library Prep kits with NEBNext Singleplex primers. Libraries have been sequenced on an Illumina HiSeq 4000 platform with 2 × 150-bp paired-end reads.

Public datasets

Primarily based on a literature search, 18 datasets on FMT cohorts that met the next standards have been included within the research: (1) public availability of metagenomic sequencing information in January 2022; (2) ample accessible description to unambiguously match donors and recipients per FMT time sequence; and (3) no restrictions on information reuse. They have been included on this research as RCDI_US_Smillie (n = 22 FMT time sequence26), RCDI_US_Aggarwala (n = 14 (ref. 28)), RCDI_US_Watson (n = 10 (ref. 32)), RCDI_US_Podlesny (n = 8 (ref. 27)), RCDI_US_Moss (n = 6 (ref. 36)), MetS_NL_Koopen (n = 24 (ref. 40)), UC_US_Damman (n = 6 (ref.43)), UC_US_Nusbaum (n = 4 (ref. 42)), UC_US_Lee (n = 2 (ref. 29)), CD_US_Vaughn (n = 18 (ref. 45)), ABXR_div_Leo (n = 26 (ref. 39)), ABXR_IS_BarYoseph (n = 14 (ref. 38)), IBS_NO_Goll (n = 30 (ref. 44)), MEL_US_Davar (n = 27 (ref. 10)), MEL_US_Baruch (n = 109), REN_IT_Ianiro (n = 10 (ref. 46)), TOU_CN_Zhao (n = 5 (ref. 47)) and CTR_RU_Goloshchapov (n = 3 (ref. 48)). Contextual information, together with donor–recipient matchings and details about medical response, have been curated from the research publications and, in some circumstances, kindly amended by the research’ authentic authors on request (Supplementary Tables 13).

Metagenomic information processing and taxonomic and purposeful profiling

Metagenomic reads have been high quality trimmed to take away base calls with a Phred rating of <25. Reads have been then discarded in the event that they have been <45 nucleotides or in the event that they mapped to the human genome (GRCh38.p10) with not less than 90% id over 45 nucleotides. This processing was carried out utilizing NGLess62. Taxonomic profiles per pattern have been obtained utilizing mOTUs v.2 (ref. 63). For purposeful profiling, reads have been mapped in opposition to the World Microbial Gene Catalog v.1 intestine subcatalogue (gmgc.embl.de64) with a minimal match size of 45 nucleotides with not less than 97% id, and summarized based mostly on antimicrobial resistance gene (ARG) annotations and Kyoto Encyclopedia of Genes and Genomes orthologs (KOs) by way of eggNOG annotations65. Primarily based on the ensuing KO profiles, GMMs66 have been quantified in every pattern utilizing omixer-rpmR (v.0.3.2)67. Taxonomic and GMM profiles per pattern, normalized by learn depth, can be found in Supplementary Tables 7 and 8.

MAGs

We demarcated MAGs from samples of research MetS_NL_1, UC_NL, ABXR_NL, div_AU, RCDI_US_Smillie, RCDI_US_Moss, UC_US_Damman, UC_US_Nusbaum, UC_US_Lee and CD_US_Vaughn utilizing a number of complementary methods to acquire each excessive decision from sample-specific assemblies and deep protection of lowly ample species from coassemblies of a number of samples. Except in any other case indicated, all instruments within the following have been run with default parameters.

To generate single-sample MAGs, fecal metagenomes have been assembled individually utilizing metaSPAdes v.3.12.0 (ref. 68), reads have been mapped again to contigs utilizing bwa-mem v.0.7.17 (ref. 69) and contigs have been binned utilizing metaBAT v.2.12.1 (ref. 70). Multisample MAGs have been constructed for every cohort individually. Reads have been first coassembled utilizing megahit v.1.1.3 (ref. 71) and mapped again to contigs utilizing bwa-mem v.0.7.17. Coassembled contigs have been then binned utilizing each CONCOCT v.0.5.0 (ref. 72) and metaBAT v.2.12.1. The ensuing coassembled MAG units have been additional refined utilizing DAS TOOL73 and metaWRAP74. In complete, 47,548 MAGs have been demarcated utilizing these 5 approaches (single-sample MAGs, multisample coassembled CONCOCT, metaBAT2, DAS TOOL and metaWRAP MAGs). As well as, we included 25,037 high-quality reference genomes from the proGenomes database75,76 in downstream analyses.

Genome high quality was estimated utilizing CheckM77 and GUNC v.0.1 (ref. 78), and all genomes have been taxonomically categorized utilizing GTDB-tk79. Open studying frames (ORFs) have been predicted utilizing prodigal80 and annotated by way of prokka workflow v.1.14.6 (ref. 81). Orthologs to recognized gene households have been detected utilizing eggNOG-mapper v.1 (ref. 82). ARGs have been annotated utilizing a workflow combining info from databases CARD v.3.0.0 (by way of rgi v.4.2.4 (ref. 83) and ResFams v.1.2.2 (ref. 84), as described beforehand76. The ‘specI’ set of 40 near-universal single-copy marker genes have been detected in every genome utilizing fetchMG85.

The total set of generated MAGs and contextual information can be found by way of Zenodo (DOI 10.5281/zenodo.5534163 (ref. 86)).

Genome clustering, species metapangenomes and phylogeny

Genomes have been clustered into species-level teams utilizing an ‘open-reference’ strategy in a number of steps. Preliminary prefiltering utilizing lenient high quality standards (CheckM-estimated completeness ≥70%, contamination ≤25%; further standards have been utilized downstream) eliminated 57.7% of MAGs. The remaining 20,093 MAGs have been mapped to the clustered proGenomes v.1 (ref. 75) and mOTUs v.2 (ref. 63) taxonomic marker gene databases utilizing MAPseq v.1.2.3 (ref. 87). A complete of 17,720 MAGs have been confidently assigned to a ref-mOTU (specI cluster) or meta-mOTU based mostly on the next standards: (1) detection of not less than 20% of the screened taxonomic marker genes and (2) a majority of markers assigning to the identical mOTU at a conservative MAPseq confidence threshold of ≥0.9.

In an unbiased strategy, quality-filtered MAGs and reference genomes have been additionally clustered by common nucleotide id (ANI) utilizing a modified and scalable reimplementation of the dRep workflow88. Utilizing pairwise distances computed with mash v.2.1 (ref. 89), sequences have been first preclustered to 90% mash-ANI utilizing the single-linkage algorithm, asserting that each one genome pairs sharing ≥90% mash-ANI have been grouped collectively. Every mash precluster was then resolved to 95 and 99% common linkage ANI clusters utilizing fastANI v.1.1 (ref. 90). For every cluster, a consultant genome was picked as both the corresponding reference specI cluster consultant within the proGenomes database or the MAG with the best dRep rating (calculated based mostly on estimated completeness and contamination). Genome partitions based mostly on 95% common linkage ANI clustering and specI marker gene mappings matched virtually completely, at an adjusted Rand index of >0.99. We subsequently outlined a complete of 1,089 species-level clusters (‘species’) from our dataset (Supplementary Desk 4), based totally on marker gene mappings to precomputed ref-mOTUs (or specI clusters, n = 295) and meta-mOTUs (n = 528), and as 95% common linkage ANI clusters for genomes that didn’t map to both of those databases (n = 233).

Species pangenomes have been generated by clustering all genes inside every species-level cluster at 95% amino acid id, utilizing Roary 3.12.0 (ref. 91). Spurious and putatively contaminant gene clusters (as launched by misbinned contigs in MAGs) have been eliminated by asserting that the underlying gene sequences originated (1) from a reference genome within the proGenomes database or (2) from not less than two unbiased MAGs, assembled from distinct samples or research. To account for incomplete genomes, ‘prolonged core genes’ have been outlined as gene clusters current in >80% of genomes in a species-level cluster. If too few gene clusters glad this criterion, as was the case for some pangenomes containing many incomplete MAGs, the 50 most prevalent gene clusters have been used as a substitute. Consultant sequences for every gene cluster have been picked as ORFs originating from specI consultant genomes (that’s, high-quality reference genomes), or in any other case because the longest ORF within the cluster.

A phylogenetic tree of species-level cluster representatives was inferred based mostly on the ‘mOTU’ set of ten near-universal marker genes63. Marker genes have been aligned in amino acid sequence area throughout all species utilizing Muscle v.3.8.31 (ref. 92), concatenated after which used to assemble a species tree with FastTree2 (v.2.1.11)93 with default parameters.

Inference of microbial pressure populations

Metagenomic reads for every pattern have been mapped in opposition to gene cluster consultant sequences for all species pangenomes utilizing bwa-mem v.0.7.17 (ref. 69). Mapped reads have been filtered for matches of ≥45 bp and ≥97% sequence id, sorted and filtered in opposition to a number of mappings utilizing samtools v.1.7 (ref. 94). Horizontal (‘breadth’) and vertical (‘depth’) protection of every gene cluster in every pattern have been calculated utilizing bedtools v.2.27.1 (ref. 95).

A species was thought-about current in a pattern if not less than three mOTU taxonomic marker genes have been confidently detected both by way of the mOTU v.2 profiler (for specI clusters and meta-mOTUs) or based mostly on pangenome-wide learn mappings (for non-mOTU species-level clusters). Gene clusters inside every pangenome have been thought-about current in a pattern if (1) the species was detectable (see above), (2) horizontal protection exceeded 100 bp and 20% of the consultant gene’s size and (3) common vertical protection exceeded 0.5. Gene clusters have been thought-about confidently absent if they didn’t entice any mappings in samples the place the species’ set of prolonged core genes (see above) was coated at >1 median vertical protection (that’s, current with excessive confidence). Utilizing these standards, pressure population-specific gene content material profiles have been computed for every species in every pattern.

Uncooked microbial SNVs have been referred to as from uniquely mapping reads utilizing metaSNV v.1.0.3 (ref. 96) with permissive parameters (-c 10 -t 2 -p 0.001 -d 1000000). Candidate SNVs have been retained in the event that they have been supported by two or extra reads every in two or extra samples through which the focal gene cluster was confidently detected (see above), earlier than differential downstream filtering. At multiallelic positions the frequency of every noticed allele (A, C, G, T) was normalized by the whole learn depth for all alleles.

Primarily based on these information, pressure populations have been represented based mostly on each their particular gene content material profile and SNV profile in every pattern.

Every species’ native pressure inhabitants variety (SPD) and allele distances (AD) between pressure populations throughout samples have been estimated as follows. SPD was calculated based mostly on the inverse Simpson index of allele frequencies p(ACGT) at every variant place i within the prolonged core genome (nvar), normalized by complete horizontal protection (variety of coated positions) covhor:

$${mathrm{SPD}} = frac{{mathop {sum}nolimits_{i = 1}^{n_{{mathrm{var}}}} {left( {p_{mathrm{A}}^2 + p_{mathrm{C}}^2 + p_{mathrm{G}}^2 + p_{mathrm{T}}^2} proper)^{ – 1} – 1} }}{{{mathrm{cov}}_{{mathrm{hor}}}}}$$

Thus outlined, SPD could be interpreted as the typical efficient variety of nondominant alleles in a pressure inhabitants. SPD ranges between 0 (just one dominant pressure detected—that’s, no multiallelic positions) and three (all 4 doable alleles current at equal proportions at every variant place). Normalization by complete horizontal protection, covhor of the prolonged core genome ensures that values are comparable between samples even when a species’ protection in a pattern is incomplete.

Intraspecific ADs between pressure populations throughout samples have been calculated as the typical Euclidean distance between noticed allele frequencies at variant positions within the species’ prolonged core genome, requiring not less than 20 variant positions with shared protection between samples. If a species was not noticed in a pattern, ADs to that pattern have been set to 1.

Quantification of strain-level outcomes

Colonization by donor strains, persistence of recipient strains and inflow of novel strains (environmental or beforehand under detection restrict) within the recipient microbiome following FMT have been quantified for each species based mostly on determinant microbial SNVs and gene content material profiles utilizing an strategy extending earlier work25,97. In complete, 261 FMT time sequence (228 allogenic and 33 autologous transfers) for which a donor baseline (in allogenic FMTs; ‘D’), a recipient pre-FMT baseline (‘R’) and not less than one recipient post-FMT (‘P’) pattern have been accessible have been taken under consideration, and every FMT was represented as a D-R-P pattern triad. If accessible, a number of time factors put up FMT have been scored independently. By definition, as a result of no donor samples have been accessible for autologous FMTs, recipient pre-FMT samples have been used as a substitute. An outline of potential strain-level FMT outcomes is supplied in Fig. 1c,d.

For every D-R-P pattern triad, conspecific pressure dynamics have been calculated if a species was noticed in all three samples (see above) with not less than 100 informative (determinant) variant positions both coated with two or extra reads or confidently absent (see under). Donor determinant alleles have been outlined as variants distinctive to the donor (D) relative to the recipient pre-FMT (R) pattern, and vice versa. Publish-FMT determinant alleles have been outlined as variants distinctive in P relative to each D and R. On condition that intraspecific fecal pressure populations are sometimes heterogeneous—that’s, encompass a couple of pressure per species—a number of noticed alleles on the similar variant place have been taken under consideration. As well as, if a gene containing a putative variant place was absent from a pattern though the species’ prolonged core genome was detected, the variant was thought-about ‘confidently absent’ and handled as informative (and probably determinant) as properly, thereby considering differential gene content material between strains.

The fractions of donor and recipient strains put up FMT have been quantified based mostly on the detection of donor- and recipient-determinant variants throughout all informative positions within the P pattern. The fraction of novel strains (environmental or beforehand under detection restrict in donor and recipient) was quantified because the fraction of post-FMT determinant variants. Primarily based on these three readouts (fraction of donor, recipient and novel strains) and cutoffs beforehand established by Li et al.25, FMT outcomes have been scored categorically as ‘donor colonization’, ‘recipient persistence’, ‘donor–recipient coexistence’ or ‘inflow of novel (beforehand undetected) strains’ for each species (Supplementary Desk 5).

Along with conspecific pressure dynamics (that’s, the place a species was current in D, R and P), we additionally quantified FMT outcomes that concerned the acquisition or lack of whole pressure populations. For instance, if a species was current within the recipient at baseline however not put up FMT, this was thought-about a ‘species loss’ occasion. See Fig. 1c and Supplementary Desk 5 for a full overview of how totally different FMT end result eventualities have been scored.

To say the accuracy of our strategy, we simulated FMT time sequence by shuffling (1) the donor pattern, (2) the recipient pre-FMT pattern or (3) each. Randomizations have been stratified by topic (accounting for the truth that some donors have been utilized in a number of FMTs and that some recipients obtained repeated therapies) and geography. For every noticed D-R-P pattern triad, we simulated ten triads per every of the above setups.

Outcomes have been additional summarized throughout species by calculating a sequence of pressure population-level metrics for every FMT, outlined as follows.

Persistence index: common fraction of persistent recipient strains amongst all species noticed put up FMT (that’s, fraction of post-FMT pressure populations attributable to recipient baseline strains).

Colonization index: common fraction of donor strains amongst all species put up FMT.

Modeling and prediction of FMT outcomes

We explored a big set of covariates as putative predictor variables for FMT outcomes, grouped into the next classes: (1) host medical and procedural variables (for instance, FMT indication, pre-FMT bowel preparation, FMT route and so forth); (2) community-level taxonomic variety (species richness, neighborhood composition and so forth); (3) community-level metabolic profiles (abundance of particular pathways); (4) abundance profiles of particular person species; (5) strain-level outcomes for different species within the system; and (6) focal species traits, together with strain-level variety; see Supplementary Desk 6 for a full checklist of covariates and their definitions. We additional categorized covariates as both predictive ex ante variables (that’s, knowable earlier than the FMT is performed) or put up hoc variables (that’s, pertaining to the post-FMT state, or the relation between pre- and post-FMT states).

We constructed two varieties of mannequin to foretell FMT strain-level outcomes based mostly on these covariates: (1) FMT-wide fashions, utilizing abstract end result metrics throughout all species in a time sequence (persistence index, colonization index; see above) as response variables; and (2) per-species fashions for 307 species noticed in ≥50 FMTs, utilizing every species’ strain-level end result in each scored time sequence as response variable. Except in any other case indicated, the final accessible time level for every FMT time sequence was used. Fashions have been constructed for every covariate class individually, in addition to for mixtures of all ex ante and all put up hoc variables, respectively.

On condition that the variety of covariates vastly exceeded the variety of accessible FMT time sequence, and that a number of covariates have been correlated with one another (Supplementary Fig. 3), FMT outcomes have been modeled utilizing ten occasions fivefold cross-validated LASSO-regularized regression, as carried out within the R package deal glmnet (v.4.1.3)98. Regression coefficients have been chosen at one normal error from the cross-validated minimal lambda worth and averaged throughout validation folds.

Linear LASSO regression was used to mannequin outcomes with steady response variables, each for FMT-wide outcomes (persistence index and shortly) and for the fraction of colonizing, persisting and coexisting strains per species throughout FMTs. For linear fashions, R2 of predictions on check units was averaged throughout validation folds. Furthermore, logistic LASSO regression was used to moreover mannequin binarized FMT outcomes per species, outlined as recipient pressure resilience, recipient pressure turnover and donor pressure takeover, based mostly on additional summarizing end result classes in Supplementary Desk 5. For logistic fashions, accuracy was assessed as space underneath the receiver working attribute curve (AUROC) averaged throughout validation folds.

Statistical analyses

Affiliation of medical outcomes (excluding a subset of cohorts for which medical success was not reported; Supplementary Desk 3) with FMT strain-level outcomes was examined utilizing Wilcoxon checks (responders versus nonresponders), and in addition by sequential ANOVA on linear regression fashions (accounting for extra variables), in every case adopted by Benjamini–Hochberg correction for a number of speculation checks. Variations in strain-level outcomes between species throughout taxonomic clades and inferred species phenotypes have been examined utilizing ANOVA on linear regression fashions.

Reporting abstract

Additional info on analysis design is offered within the Nature Analysis Reporting Abstract linked to this text.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments