Intensity of sample processing methods impacts wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes

https://doi.org/10.1016/j.scitotenv.2023.162572Get rights and content

Highlights

  • A mixture of three processing/extraction methods and two sequencing methods were used.

  • Random forest-based algorithm was used to assess important features affect sequencing outcomes.

  • Sample processing method is the main technical factor affecting sequencing outcomes.

  • More intensive processing method could lead to less genome recovery due to more fragmented RNA.

Abstract

Wastewater SARS-CoV-2 surveillance has been deployed since the beginning of the COVID-19 pandemic to monitor the dynamics in virus burden in local communities. Genomic surveillance of SARS-CoV-2 in wastewater, particularly efforts aimed at whole genome sequencing for variant tracking and identification, are still challenging due to low target concentration, complex microbial and chemical background, and lack of robust nucleic acid recovery experimental procedures. The intrinsic sample limitations are inherent to wastewater and are thus unavoidable. Here, we use a statistical approach that couples correlation analyses to a random forest-based machine learning algorithm to evaluate potentially important factors associated with wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes, with a specific focus on the breadth of genome coverage. We collected 182 composite and grab wastewater samples from the Chicago area between November 2020 to October 2021. Samples were processed using a mixture of processing methods reflecting different homogenization intensities (HA + Zymo beads, HA + glass beads, and Nanotrap), and were sequenced using one of the two library preparation kits (the Illumina COVIDseq kit and the QIAseq DIRECT kit). Technical factors evaluated using statistical and machine learning approaches include sample types, certain sample intrinsic features, and processing and sequencing methods. The results suggested that sample processing methods could be a predominant factor affecting sequencing outcomes, and library preparation kits was considered a minor factor. A synthetic SARS-CoV-2 RNA spike-in experiment was performed to validate the impact from processing methods and suggested that the intensity of the processing methods could lead to different RNA fragmentation patterns, which could also explain the observed inconsistency between qPCR quantification and sequencing outcomes. Overall, extra attention should be paid to wastewater sample processing (i.e., concentration and homogenization) for sufficient and good quality SARS-CoV-2 RNA for downstream sequencing.

Keywords

Wastewater SARS-CoV-2
Amplicon sequencing
Sample processing methods
RNA fragmentation
Illumina COVIDseq
QIAseq DIRECT

Data availability

Data has been uploaded in the "Attach file" step.

Cited by (0)

1

Current address: Department of Microbiology and Immunology, Loyola University Chicago, Maywood, IL USA.

View Abstract