Tuesday, September 6, 2022
HomeMicrobiologyIntensive intestine virome variation and its associations with host and environmental components...

Intensive intestine virome variation and its associations with host and environmental components in a population-level cohort


Pattern assortment and metagenomic sequencing

Written knowledgeable consent was obtained previous to participation within the venture. The research protocol for the Japanese (Illness, Drug, Eating regimen, Each day life) microbiome venture was permitted by the medical ethics committees of the Tokyo Medical College (Approval No: T2019-0119), Nationwide Heart for International Well being and Drugs (Approval No: 1690), the College of Tokyo (Approval No: 2019185NI), Waseda College (Approval No: 2018-318), and the RIKEN Heart for Integrative Medical Sciences (Approval No: H30-7). We carried out a potential cross-sectional research of 4198 people taking part within the Japanese 4D microbiome venture, which commenced in January 2015 and is ongoing20.

Members registered within the venture have been those that visited hospitals within the space for illness analysis or a well being checkup. Faecal samples are collected from each wholesome and diseased individuals. The eligibility standards for individuals are as follows: (1) born and raised in Japan; (2) age >15 years; (3) written knowledgeable consent offered; and (4) having an endoscopic analysis on colonoscopy; both having undergone a colonoscopy throughout the final 3 years or planning to endure colonoscopy for colorectal most cancers screening, surveillance, and analysis of assorted gastrointestinal signs. The exclusion standards have been as follows: (1) suspected acute infectious illness based mostly on medical findings (e.g., acute enterocolitis, pneumonia, tuberculosis and so on.); (2) acute bleeding; (3) listening to loss; (4) unable to know written paperwork; (5) unable to jot down and (6) restricted potential to carry out actions of every day residing. No compensation was paid to individuals.

Members collected faecal samples utilizing a Cary–Blair medium-containing tube60 at house, and the samples have been refrigerated for as much as 2 days earlier than the hospital go to. Instantly after individuals arrived on the hospital, their faecal samples have been frozen at −80 °C till DNA extraction. We prevented accumulating samples inside 1 month of administering bowel preparation for colonoscopy as a result of it has a profound impact on the intestine microbiome and metabolome61. Well being professionals checked that the quantity of stool was adequate for evaluation. Shotgun metagenomic sequencing was carried out for 4241 faecal samples and qc have been carried out20, from which 43 samples have been excluded from additional analyses because of the low variety of high-quality reads (<5 million reads) as described intimately beforehand20 (Supplementary Information 1). To discover the viral profiles of VLPs and entire metagenomes from the identical samples, we collected extra faecal samples from 24 people in the identical method as described earlier.

Metadata assortment

Particulars for metadata assortment have been described beforehand20. Briefly, the individuals accomplished a self-reported questionnaire on physique weight, top, alcohol consumption, smoking, dietary habits, bodily exercise, and Bristol Stool Scale rating62. Well being professionals checked the entries to right apparent inaccuracies and procure any lacking knowledge. BMI was categorised into 5 teams in keeping with the usual World Well being Group (WHO) classification and contemplating the edge worth of mortality threat63 (0, underweight, <18.5 kg/m2; 1, regular weight [low], 18.5–20.0 kg/m2; 2, regular weight [high], 20.1–24.9 kg/m2; 3, obese, 25.0–29.9 kg/m2; and 4, overweight [≥ 30.0 kg/m2]). Dietary habits have been assessed utilizing a 7-point Likert scale (1, by no means or hardly ever; 2, 1–3 occasions/month; 3, 1–3 occasions/week; 4, 4–6 occasions/week; 5, 1 time/day; 6, 2 occasions/day; and seven, ≥3 occasions/day). Bodily exercise was evaluated with the Worldwide Bodily Exercise Questionnaire–Quick Kind64. Train reported as vigorous depth, average depth, or strolling was denoted as 1, and <60 min/week was denoted as 0. The entire sitting hours per day and the overall metabolic equal of duties65 have been divided into 4 teams based mostly on quartiles for all the dataset. For the analysis of gastrointestinal ailments, an digital high-resolution video endoscope was used. Comorbidities, or a historical past of hypertension, dyslipidemia, and any element of the Charlson comorbidity index66, have been evaluated. The particular analysis of the illness was based mostly on histopathological or cytological examinations or imaging modalities (e.g. computed tomography, magnetic resonance imaging and ultrasound). For treatment, well being professionals evaluated entries within the participant’s treatment pocketbook (the Okusuri-techo) made by pharmacists when filling prescriptions20 to make sure that there have been no omissions or discrepancies with the self-reported knowledge. Digital medical information have been additionally checked to establish medicines used. Drug use was outlined as oral or self-injected administration throughout the earlier month. All medicines with pharmaceutical model names have been grouped in keeping with the WHO’s Anatomical Therapeutic Chemical classification system (4th degree)67. In whole, 232 metadata have been assessed and used on this research.

Preparation of VLP DNA and sequencing

Frozen faecal samples (30–500 mg) have been suspended in a 2.5 mL SM buffer with 0.01% gelatine by vortexing and centrifuged at 5000 × g for 10 min at 4 °C to take away particles. The supernatant was filtered with 5.0 μm and 0.45 μm PVDF pore membrane filters (Millex-HP Syringe Filter; Merck Millipore) to take away bacterial cells. An equal quantity of 20% polyethylene glycol answer (PEG-6000-2.5 M NaCl) was added to the filtrate and saved in a single day at 4 °C. The answer was centrifuged at 20,000 × g for 45 min at 4 °C, and the supernatant was discarded to gather VLPs. The VLP pellet was suspended in 1 mL SM buffer with lysozyme (10.0 mg/response; Sigma Aldrich) and incubated for 60 min at 37 °C with mild shaking to degrade unfiltered bacterial cells. The lysate was incubated with 10 U DNase (NIPPON GENE), 5 U TURBO DNase (Thermo Fisher Scientific), 5 U Baseline-ZERO DNase (Epicentre), 25 U Benzonase (Sigma Aldrich), and RNase (25 g/pattern; NIPPON GENE) in DNase buffer (1× focus) for 1 h at 37 °C with mild shaking. To inactivate the DNases, EDTA (remaining focus 20 mM) was added to the DNase-treated lysate and heated for 15 min at 70 °C. Proteinase Okay (0.5 mg/response; Sigma Aldrich) and SDS (remaining focus 0.1%) have been added to the VLPs and gently combined at 55 °C for 30 min. An equal quantity of phenol/chloroform/isoamyl alcohol (Life Applied sciences Japan, Ltd) was added to the lysate and gently combined for 10 min at room temperature (20–25 °C). The lysate was centrifuged at 9000 × g for 10 min at 25 °C, and the aqueous section was collected. Sodium acetate (remaining focus 0.3 M) and an equal quantity of isopropanol with Dr. GenTLE precipitation provider (Takara Bio) have been added to the DNA answer and pelleted by centrifugation at 12,000 × g for 15 min at 4 °C. The DNA pellet was rinsed with 75% ethanol and dissolved in TE buffer (10 mM Tris-HCl, 10 mM EDTA). An equal quantity of polyethylene glycol answer (20% PEG6000-2.5 M NaCl) was added and saved on ice for at the least 10 min, and the DNA was pelleted by centrifugation at 12,000 × g for 10 min at 4°C. Lastly, the DNA was rinsed with 75% ethanol, dried, and dissolved in TE buffer. For NovaSeq shotgun metagenomic sequencing, libraries have been constructed from 2.5 ng VLP DNA utilizing a KAPA HyperPrep Equipment (KAPA Biosystems) with 12 cycles of amplification. The libraries have been subjected to 150-bp paired-end sequencing on a NovaSeq platform.

Complete metagenomic DNA was additionally ready from the identical faecal samples (10 to 250 mg faeces) with an enzymatic lysis methodology as described beforehand68. Libraries have been constructed from 100 ng entire metagenomic DNA and sequenced by NovaSeq utilizing the identical methodology as for VLP DNA.

Identification of phages within the metagenomic knowledge

To assemble a high-quality double-stranded DNA (dsDNA) phage catalogue with minimal contamination of bacterial chromosome and plasmid sequences, we developed a customized pipeline and utilized it to the 4198 entire intestine metagenomes as described beneath. The metagenomic reads of every particular person have been assembled into contigs utilizing the MEGAHIT assembler (v1.2.9)69. The circularity of the assembled contigs (>10 kb) was assessed utilizing the check_circularity.pl script, included within the sprai assembler package deal (https://sprai-doc.readthedocs.io/en/newest/index.html), by modifying the edge for terminal redundancy as follows: >97% id and >130 bp. Encoded genes within the contigs have been predicted by MetaGeneMark (3.38)70. Assembled contigs have been outlined as phages in the event that they handed all the following six standards.

  1. 1.

    A genome measurement threshold was utilized, and contigs lower than 10 Kb have been excluded, as typical dsDNA phages have genomes bigger than >10 Kb71.

  2. 2.

    Viral-specific k-mer patterns have been checked by DeepVirFinder (v1.0)22. Contigs with p-values >0.05 have been excluded from additional evaluation.

  3. 3.

    To detect viral hallmark genes (VHGs) and plasmid hallmark genes, we carried out a extremely delicate HMM-HMM search in opposition to the Pfam database72. First, the encoded genes have been aligned to the viral protein database, collected from full (round) viral genomes (n = 13,628) within the IMG/VR v2 database30 utilizing JackHMMER. The obtained HMM profiles have been searched in opposition to the Pfam database utilizing hhblits73 with a >95% chance cut-off. These procedures have been carried out utilizing the pipeline_for_high_sensitive_domain_search script (https://github.com/yosuken/pipeline_for_high_sensitive_domain_search)74,75. Contigs with plasmid hallmark genes or these with out VHGs have been excluded. The hallmark genes used on this evaluation are summarised in Supplementary Information 3.

  4. 4.

    The presence of housekeeping marker genes of prokaryotic species was checked by fetchMG (v1.0)76, and ribosomal RNA genes (5 S, 16 S and 23 S) have been recognized by barrnap (0.9) (https://github.com/tseemann/barrnap). Contigs with the marker genes and ribosomal RNA genes have been excluded from additional evaluation.

  5. 5.

    The encoded genes of every contig have been aligned to the viral protein database and a plasmid protein database constructed from the reference plasmids in RefSeq (n = 16,136, in April 2020) utilizing DIAMOND (v0.9.29.130)77 with the more-sensitive choice. The variety of genes aligned to every database was in contrast, and contigs with extra genes aligned to the plasmid protein database have been excluded from additional evaluation.

  6. 6.

    The proportion of provirus areas was assessed by CheckV (v0.7)24, and contigs estimated with <80% of provirus areas have been excluded.

First, we screened full phage genomes from the round contigs utilizing these six standards (Supplementary Fig. 1a). To establish phage genomes that weren’t full however have been of excessive or medium high quality, we subsequent screened attainable phage contigs within the linear contigs. We aligned genes recognized within the linear contigs to gene units obtained from the entire phage genomes recognized on this research (n = 1125) and the IMG/VR database (n = 13,628). The alignment was carried out utilizing DIAMOND with the more-sensitive choice and e-value <1E-5 as a threshold. Contigs have been outlined as attainable phage contigs if >40% of the genes have been aligned to genes from a whole phage genome and the dimensions of the contig was >70% and <120% of the entire genome. For these attainable phage contigs, the above six standards have been utilized, and people who didn’t move have been excluded. Lastly, CheckV was used to display for extra host bacterial genomes and exclude linear contigs outlined as low high quality or having >10% contamination.

To guage the efficiency of this practice pipeline, we utilized the pipeline to reference phage genomes (n = 2609, as optimistic knowledge) and plasmid sequences (n = 16,136, as detrimental knowledge) in Refseq. The true optimistic price was outlined because the variety of phages detected as phages by the pipeline divided by the variety of reference phages. The false optimistic price was outlined because the variety of plasmids detected as phages by the pipeline divided by the variety of reference plasmids. DeepVirFinder22, VirSorter (v1.0.3)23 Virsorter2 (2.2.3)25, VIBRANT (v1.2.1)26, Seeker (v1.0.3)27 and ViralVerify (v1.1)28 have been additionally utilized to the identical datasets with the default parameters, and the efficiency was in contrast amongst them.

Evaluation of phage genomes

Viral operational taxonomic items (vOTUs) have been constructed by clustering phage genomes with a > 95% id29 utilizing dRep (v2.2.3)78 with the default choices. Consultant sequences of every vOTU chosen by dRep have been additional clustered with reference sequences in RefSeq, IMG/VR30, intestine virome database (GVD)15, intestine phage database (GPD)9, and metagenomic intestine virus (MGV) database31 with >95% id and >85% size protection utilizing aniclust.py script within the CheckV package deal to establish frequent sequences among the many databases.

To additional assemble broader viral clusters (VC), proportions of protein clusters shared between phages have been assessed. First, to outline protein clusters, similarity searches of all protein sequences from all of the phages recognized on this research have been carried out utilizing DIAMOND with the more-sensitive choice (e-value <1E-5). Based mostly on the similarities between proteins, protein clusters have been outlined by MCL (v14-137)79 with an inflation issue of two. The proportion of shared protein clusters was calculated for every phage pair, and phages sharing >20% of clusters have been grouped as a VC, which corresponds roughly to family- or subfamily-level clusters7,37. Rarefaction curves of the vOTUs and VCs have been estimated with the iNEXT perform within the iNEXT package deal (v2.0.20)80. The similarity matrix of the phages based mostly on the proportion of shared protein clusters was additional projected by tSNE utilizing the tsne perform within the Rtsne package deal (v0.16).

Taxonomy annotation of phages was carried out with a voting method described beforehand16 with minor modifications. First, the protein sequences of every phage have been aligned to viral proteins detected from phage genomes in RefSeq (n = 2609, in April 2020) utilizing DIAMOND with the more-sensitive choice. Then, the best-hit taxonomy of every protein (household ranges) was counted, and the most typical taxonomy was assigned to the phage if >20% of proteins within the phage have been aligned to the identical taxonomy.

Phage life (i.e. virulent or temperate) have been predicted by BACPHLIP40 and alignments to reference bacterial genomes within the RefSeq. Phages have been outlined as temperate if the BACPHLIP rating was >0.8 or the phage genome was aligned to any reference genomes with >1000 bp alignment size with >95% id.

Host prediction

Bacterial and archaeal genomes have been downloaded from the RefSeq database (in April 2019). To scale back the redundancy of genomes from carefully associated strains in the identical species (e.g. Escherichia coli), 10 genomes have been chosen randomly for species with greater than 10 genomes, and different genomes have been excluded from the dataset. The reference dataset consisted of 33,215 bacterial and 822 archaeal genomes.

Host prediction of the recognized phages was carried out utilizing CRISPR spacers81. CRISPR spacers have been predicted from the reference microbial genomes and assembled contigs (>10,000 bp) from the 4198 metagenomic datasets utilizing PILER-CR (1.06)82. Quick (<25 bp) or lengthy (>100 bp) spacers have been discarded. In whole, 679,323 and 283,619 spacers have been recognized from the reference microbial genomes and assembled contigs, respectively. Taxonomy info was assigned to the assembled contigs in the event that they have been aligned to the microbial reference genomes with >90% id and >70% size protection thresholds utilizing MiniMap283. The CRISPR spacers have been mapped to the phage genomes utilizing BLASTN with the choice for brief sequences: -a20 -m9 -e1 -G10 -E2 -q1 -W7 -F F81. CRISPR spacers, which have been mapped with 100% id or 1 mismatch/indel with >95% sequence alignment, have been used for host task on the genus degree. Assignments of host species have been checked manually, and if any of the next non-human intestinal species have been assigned, the host was excluded: Dickeya, Anaerobutyricum, Rubellimicrobium, Eisenbergiella, Harryflintia, Leucothrix, Photorhabdus, Spirosoma, Syntrophobotulus, Thermincola, Algoriphagus, Franconibacter, Kandleria, Lawsonibacter, Methylomonas, Provencibacterium, Pseudoruminoccoccus, Rhodanobacter, Romboutsia, Sharpea, Varibaculum and Thioalkalivibrio.

Quantification of viral abundance and evaluation of the virome profile

To quantify the viral abundances in every pattern, metagenomic reads have been mapped to the gene set of VHGs (Supplementary Information 3) of every consultant vOTU utilizing Bowtie2 with a > 95% id threshold, and reads per kilobase million (RPKM) have been calculated for every vOTU. The explanation for utilizing solely VHGs within the evaluation was to keep away from over-counting of viral reads, which might be brought on by spurious mapping of reads from horizontally transferred genes of different phages or bacterial species. The α-diversity (Shannon variety) of the vOTU-level viral profile was calculated utilizing the variety perform within the vegan package deal. The β-diversity (Bray-Curtis distance) between people was assessed utilizing the vegdist perform, and the typical distance in opposition to different people was calculated for every particular person. The VC-level viral profile was obtained by summing all of the RPKM of vOTUs for every VC.

Phylogenetic evaluation of novel VCs

To assemble phylogenetic timber for the vOTUs and reference genomes, protein sequences of huge terminases, portal proteins, and main capsid proteins (Supplementary Information 3), which are sometimes used to assemble phage phylogenetic timber7,9, have been extracted from the vOTUs within the 10 most considerable VCs (VC_19, 1, 2, 24, 12, 15, 3, 44, 18, 6), and their homologues have been looked for within the reference phage genomes in RefSeq utilizing DIAMOND with the more-sensitive choice (e-value <1E-5). The collected protein sequences have been aligned by MAFFT (v7.458)84 with the linsi choice, and the alignments have been trimmed by Trimal (v1.4.rev15)85 with the automated1 choice. Phylogenetic timber have been constructed by FastTree (2.1.10)86. The phylogenetic timber have been visualised with iTOL (v5)87. For every VC, vOTUs with the best variety of genomes have been chosen, and their genomic constructions have been visualised by the circlize package deal (v0.4.15)88.

Taxonomic and purposeful evaluation of the bacteriome

Taxonomic and purposeful profiles of the bacteriome have been obtained as described beforehand20. Briefly, bacterial profiles on the species and genus ranges have been obtained with the one copy marker gene-based methodology utilizing mOTUs (v2.1.1)89. Useful profiles on the Kyoto Encyclopaedia of Genes and Genomes (KEGG) orthology (KO) degree have been obtained by mapping the metagenomic reads to a non-redundant gene set constructed from the 4198 topics’ metagenomic knowledge20. Useful annotation of the non-redundant genes was carried out utilizing eggNOG-mapper90, by which DIAMOND was used for alignment to the eggNOG orthology database (model 4.5)91.

KOs concerned in prokaryotic defence mechanisms, similar to CRISPR-Cas and RM (Supplementary Information 8)58, have been collected, and their whole relative abundance in every system was calculated. Since purposeful annotation for the Abi system is just not included within the KEGG database, we collected genes annotated as ‘abortive phage an infection’ and ‘abortive phage resistance’ within the eggNOG annotation and calculated the whole abundance. The 4198 people have been categorized into three teams (excessive, center, and low) based mostly on tertiles of the whole abundance, and Shannon variety of the virome was in contrast among the many three teams by the Wilcoxon rank-sum check.

Phage-host correlation evaluation

To discover the phage-host affiliation locally, Spearman correlations between relative abundances of vOTUs and microbial species on the genus degree have been evaluated. If the vOTU was predicted to contaminate multiple genus (i.e. generalist phage), the correlation was calculated for each predicted host. If a phage-host pair was absent (0 abundance) in a pattern, the pattern was excluded from the correlation evaluation. vOTUs with common relative abundance >0.01% (n = 865) and genera with common relative abundance >0.5% (n = 32) have been included within the evaluation.

Evaluation of VLPs and entire metagenomes from 24 faecal samples

High quality filtering of sequenced reads from the 24 VLPs and entire metagenomes was carried out utilizing fastp (model 0.20.1)92 with the default parameters. Contamination with human (hg38) or phiX genomes was excluded by mapping the reads to the genomes utilizing Bowtie2.

To exclude bacterial DNA contamination within the VLP dataset, we carried out additional filtering. First, the VLP reads have been assembled into contigs utilizing MEGAHIT and the contigs have been checked for virus or not. Contigs have been outlined as viral contigs in the event that they have been predicted as viruses by DeepVirFinder (P-value <0.05) and didn’t encode rRNA and marker genes checked by barrnap and fetchMG, respectively. Then, VLP reads have been mapped to the viral contigs utilizing Bowtie2, and people not mapped to the viral contigs have been excluded from the VLP dataset. Viral profiles on the vOTU and VC ranges for the de-contaminated VLP and entire metagenomic datasets have been obtained with the identical methodology for the 4198-subject metagenomic dataset described earlier.

Affiliation evaluation between the virome and varied host/environmental components

The affiliation between every vOTU/VC and age/intercourse was assessed by multivariable regression evaluation contemplating the results of different covariates as described earlier than20. Briefly, the relative abundance of every vOTU/VC was log10-transformed, and single linear-regression evaluation was carried out utilizing the reworked abundance as a response variable and metadata as an explanatory variable. This single linear-regression evaluation was carried out for age, intercourse, and different metadata (n = 230, Supplementary Information 2), and metadata considerably related to the vOTU/VC have been decided (FDR < 0.05) by making an allowance for the overall variety of single regression analyses (variety of vOTU/VCs multiplied by variety of metadata). Then, a number of regression evaluation was carried out together with all the numerous metadata within the single regression evaluation as explanatory variables. To exclude confounding components, stepwise variable choice was carried out based mostly on Akaike’s info criterion with the step perform. Metadata was outlined as considerably related to the vOTU/VCs in the event that they remained within the mannequin with a P-value <0.05. All regression fashions have been constructed utilizing the glm2 perform within the glm2 package deal (v1.2.1). In whole, 390 vOTUs and 112 VCs, whose common relative abundances within the 4198 metagenomic dataset have been >0.05% and >0.1%, respectively, have been included within the evaluation. For visualization, people youthful than 20 years (n = 2) and older than 80 years (n = 6) have been excluded because of the low numbers of such people (Fig. 5a, b).

Stepwise redundancy evaluation was carried out to guage the overall variance of the virome and bacteriome (relative abundance knowledge) defined by every metadata class utilizing the ordriR2step perform within the vegan package deal (v2.5.7)93. To analyze the associations between the virome/bacteriome and every single metadata merchandise, permutational evaluation of variance was carried out utilizing the adonis perform within the vegan package deal based mostly on the Bray–Curtis distance with 10,000 permutations. P-values have been corrected for a number of comparisons by the Benjamini–Hochberg methodology94.

Statistics

All statistical analyses have been carried out utilizing R (v3.5.0) with two-sided check and the Benjamini–Hochberg methodology for a number of comparisons until in any other case acknowledged. Of the 4211 metagenomic samples sequenced, 43 samples have been excluded as a result of much less reads (5 million) than the others. No statistical methodology was used to predetermine pattern measurement.

Reporting abstract

Additional info on analysis design is obtainable within the Nature Analysis Reporting Abstract linked to this text.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments