ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article
Revised

Methods for sequencing the pandemic: benefits of rapid or high-throughput processing

[version 2; peer review: 2 approved]
Previously titled: Sequencing the pandemic: rapid and high-throughput processing and analysis of COVID-19 clinical samples for 21st century public health
PUBLISHED 21 Feb 2022
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cell & Molecular Biology gateway.

This article is included in the Emerging Diseases and Outbreaks gateway.

This article is included in the Bioinformatics gateway.

This article is included in the Health Services gateway.

This article is included in the Coronavirus collection.

Abstract

Genomic epidemiology has proven successful for real-time and retrospective monitoring of small and large-scale outbreaks. Here, we report two genomic sequencing and analysis strategies for rapid-turnaround or high-throughput processing of metagenomic samples. The rapid-turnaround method was designed to provide a quick phylogenetic snapshot of samples at the heart of active outbreaks, and has a total turnaround time of <48 hours from raw sample to analyzed data. The high-throughput method, first reported here for SARS-CoV2, was designed for semi-retrospective data analysis, and is both cost effective and highly scalable. Though these methods were developed and utilized for the SARS-CoV-2 pandemic response in Arizona, U.S, we envision their use for infectious disease epidemiology in the 21st Century.

Keywords

Genomic epidemiology, SARS-CoV2, targeted genomics, sequencing methods, phylogenetics,

Revised Amendments from Version 1

In this version we have made changes to the text to reflect suggestions and omissions that were pointed out by reviewers. Specific changes include the clarification of the novel contributions of this work, as well as its scope, and statistical analysis of sequencing metrics reported. Additionally, we have highlighted improvements to the protocol that have been made since the time of first submission. In this revision, we have amended the title, and ALL tables (revised versions are included at the end of the attached revised document). Figures have not been changed, but have been re-uploaded for formatting purposes. We have also updated the supplementary data for this manuscript, and a new DOI has been added to the existing reference.

See the authors' detailed response to the review by Bronwyn L. MacInnis
See the authors' detailed response to the review by Keith Crandall

Introduction

With the advent of rapid and inexpensive next-generation sequencing, genomic epidemiology has proven to be an invaluable resource for the elucidation of disease outbreaks. Extending beyond traditional shoe-leather approaches, rapid-turnaround sequencing methods have allowed researchers to quickly gain insight into the genetic nature of pathogens at the heart of active outbreaks16. By monitoring pathogen evolution over the course of an outbreak, large-scale genomics have the potential to allow for transmission mapping for infection control and prevention7,8, to distinguish independent cases from those part of active clusters9, and to identify epidemiological patterns in time and space on both local and global scales7,1012.

The most recent example of this has been the collaborative genomic efforts mounted in response to the SARS-CoV-2 outbreak. Not long after the initial cases were identified, whole-genome sequencing quickly established the etiologic agent as a novel coronavirus13. Following the rapid spread of SARS-CoV-2, current next-generation technology and analysis pipelines allowed viral sequencing to take place on an unprecedented global scale, with collaborative consortia forming world-wide for the specific purpose of tracking and monitoring the pandemic1416.

In most instances, including with the SARS-CoV-2 outbreak, genomic epidemiology has provided a retrospective view of pathogen spread and evolution well after the information is useful in the public health response to the outbreak1,2. Genomic epidemiology should guide contemporaneous outbreak control measures, but can only do so if the data are generated and interpreted in real-time quickly enough to inform a response17. As technology has advanced, the potential exists to move beyond providing a retrospective genomic snapshot of an outbreak months after its occurrence, to providing actionable data in real-time for current outbreaks within hours after cases are identified4. Real-time genomic tracking has already proven valuable in a number of instances, including the recent West-African Ebola outbreak3,17.

This is not to discredit the value of large-scale, retrospective studies. While rapid-turnaround genomics may prove essential for outbreak containment, retrospective studies will continue to be necessary to track pathogen evolution, gauge success of public health interventions, and to evaluate pathogen/host movement and behavior. With the recent SARS-CoV-2 pandemic, retrospective studies have so-far proven successful in identifying the timing and sources of outbreaks on a local18 and global scale5,19, in evaluating the effectiveness of early interventions5, and in identifying super-spreader events20. Thus, in addition to real-time monitoring, high-throughput, cost-effective sequencing and analysis are needed to gain a better understanding of pandemics.

Here, we report two Illumina-based sequencing and analysis strategies for either real-time monitoring or large-scale, high-throughput targeted genomic sequencing of complex samples. The plexWell method is a novel means of library construction that can facilitate high-throughput sample processing. Though these strategies were developed for use with the current SARS-CoV-2 pandemic, we envision their potential use in any situation in which a genomic response is needed.

Methods

Sample information

Remnant nasopharyngeal swab specimens or extracted RNA were obtained from, or received by, the TGen North Laboratory in Flagstaff, AZ. All samples had previously tested positive for SARS-CoV-2 by RT-PCR.

RNA extraction

RNA was extracted using the MagMax Viral Pathogen II kit and a Kingfisher Flex automated liquid handler (ThermoFisher Scientific), with a DNase treatment incorporated to maximize viral RNA recovery from low viral-burden samples, defined as having an RT-PCR cycle-threshold (Ct) value above 33.0. These methods allowed for the rapid, scalable processing of a small number to hundreds of samples at once with minimal personnel, and prevented RNA extraction from becoming a bottleneck to overall throughput (https://doi.org/10.17504/protocols.io.bnkhmct6). Remnant RNA was obtained from the TGen North Laboratory, and had been extracted following their FDA-authorized protocol for the diagnosis of COVID-19.

Targeted amplification

SARS-CoV2 RNA was amplified for both of the sequencing methods described below following the nCoV-2019 sequencing protocol V.121 and using the ARTIC v3 primer set22. Adapters were added to the resulting amplicons by one of the following means described below. The full sample processing workflow, starting with raw RNA and ending with deliverable data, is illustrated in Figure 1 for reference.

9857c800-4597-4311-999b-db17efa33ad2_figure1.gif

Figure 1. General workflow for the processing of complex samples.

Basic workflow for processing of either high-throughput or rapid-response samples.

Rapid-turnaround adapter addition and sequencing

For samples requiring immediate attention, e.g. those from patients potentially involved in an active outbreak, adapters were added with the DNA Prep kit (Illumina) as previously described23. Amplicons were sequenced on the MiSeq platform, using a Nano 500 cycle kit with v2 chemistry (Illumina) (https://doi.org/10.17504/protocols.io.bnnbmdan). The batch size used for this method was 2–8 samples, as that was the typical sample volume requiring rapid turnaround in our facility. However, the maximum number of clusters obtainable from a Nano 500 cycle kit is 1 million, thus, with a theoretical minimum target number of reads of ~42,000, this method would be suitable for up 24 samples per kit. For samples with low Ct values, (less than 33) this was demonstrated to be sufficient to obtain a complete genome, based on results obtained from the high-throughput sequencing method, which was optimized first. For Ct values less than 33, read counts as low as ~19,000 per sample were sufficient to provide a 99% coverage breadth (Supplementary data).

High-throughput adapter addition and sequencing

In instances where retrospective data were needed from large numbers of samples, adapters were added with the plexWell384 kit (Seqwell). Samples were multiplexed in batches of 1,152 and sequenced on a NextSeq 550 with v2 chemistry and 150 X 150 bp reads (Illumina). When batch sizes were not large enough to fill a NextSeq run, samples were sequenced on a MiSeq, with V3 chemistry (Illumina) (https://doi.org/10.17504/protocols.io.bnkimcue). At the time of submission of this manuscript, a maximum of 1,152 barcode combinations were available (this has since increased to 2,304) and thus, the targeted read depth was an order of magnitude higher than for rapid sequencing, at nearly 348,000 reads per sample. The increased number of reads was necessary to account for high Ct value (ie, low viral load) samples, which were not separated into distinct sequencing runs, and which were pooled equally by the plexWell system.

Data processing and analysis

Read data were analyzed using the statistics package “RStudio” (version 4.1.1)24 The “stats” package was used to perform Kruskal-Wallis, and determine standard deviation25. Confidence intervals based on z-scores were calculated using the “BSDA” package26. Read data are reported in terms of mean, standard deviation (SD) and 95% confidence interval (95%CI).

Virus genome consensus sequences were built using the Amplicon Sequencing Analysis Pipeline (ASAP)27,28. First, reads were adapter-trimmed using bbduk29, and mapped to the Wuhan-Hu-1 genome30 with bwa mem31 using local alignment with soft-clipping. Bam alignment files were then processed to generate the consensus sequence and statistics on the quality of the assembly by the following: 1) Individual basecalls with a quality score below 20 were discarded. 2) Remaining basecalls at each position were tallied. 3) If coverage ≥10X and ≥80% of the read basecalls agreed, a consensus basecall was made. 4) If either of these parameters were not met, an ‘N’ consensus call was made. 5) Deletions within reads, as called during the alignment, were left out of the assembly, while gaps in coverage (usually the result of a missing amplicon) were denoted by lowercase ‘n’s. Only consensus genomes covering at least 90% of the reference genome with an average depth of ≥30X were used in subsequent analyses. Consensus genomes generated using these methods include those from Ladner et al.18 and are similar to those in Peng et al.32 and Stoddard et al.33. Statistics reported for each sample included: total reads, number of reads aligned to reference, percent of reads aligned to reference, coverage breadth, average depth, and any SNPs and INDELs found in ≥10% of the reads at that position.

Phylogenetic inference

Rapid phylogenetic inference was used when it was critical to rapidly answer two initial genomic epidemiological questions: i) are the samples in this set closely related, and ii) to what samples in the public database are these most closely related? To answer the former, SNP comparisons were made among a sample set of interest and output in text format directly from the ASAP results, foregoing the computational time of generating phylogenetic trees.

To answer the latter, a database of all SNPs in the GISAID global collection15 was generated and periodically updated by downloading all the genomes and filtering them for quality and completeness, then aligning to the WuHan-Hu-1 reference to identify any variants, as described34. The list of all variants and metadata for each sample were then put into a relational database for fast querying. The format of this database was one table of SNPs, as generated by Chan et al.34 with columns for GISAID ID, position, reference base, and variant base, and a second table of metadata which contained the exact data fields, as downloaded from GISAID. The two tables were linked by the GISAID ID number. The GISAID database cannot be shared via secondary publications as per the database access agreement, however the database is available for download by the public from GISAID’s website15. The list of SNPs common to samples in a given set was then compared to this global SNP database by first querying the number of global genomes containing each of the individual SNPs, sorting them from most common to least common, then further querying for number of global genomes containing combinations of SNPs by adding in each successive SNP in turn, to determine how many GISAID genomes shared all or a subset of the SNPs from the set. A stacked Venn diagram to illustrate globally-shared SNPs was constructed using the statistical package “R” (Version 4.0.2)25 and the “ggplot2”35 and “ggforce”36 R-packages. This information was then relayed to public health partners, who used these preliminary analyses to guide contact tracing efforts and track the spread of the virus in real-time.

For subsequent, more sophisticated phylogenetic analyses, phylogenetic trees were constructed in NextStrain16, with genomes from GISAID subsampled by uploading our genomes of interest to the UCSC UShER (UShER: Ultrafast Sample placement on Existing tRee) tool37, identifying relevant genomes (those most related to our genomes of interest), and further reducing that set of genomes when necessary with genome-sampler18,38.

Consent

Samples used were remnant, de-identified samples from a clinical diagnostics lab. No ethical approval was required for their inclusion in this study.

Results

High-throughput workflow

For the plexWell workflow, turnaround time from raw sample to sequence data was approximately 72 hours, and to phylogenetic inference approximately 10 hours, for 1,152 samples (Table 1). Cost per sample was ~$60, which is comparable to similar sample prep methods (Table 1)39. A complete breakdown of run statistics by Ct value can be found in Table 2. Success rates, defined as the percentage of samples with greater than 90% genome coverage, were similar to other previously described methods39,40. Of a random subset of 897 samples analyzed, approximately 83% of samples with a Ct <33 yielded genomes with ≥90% breadth of coverage of the reference genome, i.e., a complete genome, with an average breadth of coverage of the reference genome of 91% (SD 20.10, 95%CI 89.79-92.81). (Figure 2, Table 2). Failure to obtain a complete genome when Ct was <33 was attributed to sample preparation error, sample degradation, or poor sequencing run metrics. Beyond a Ct of 33, success rate dropped drastically. Between Ct values of 33 and 35, complete genome success rate was ~41% (Table 2), with an average breadth of coverage of 78% (SD 23.00, 95%CI 72.50-83.20). Above a Ct of 35, ~18% of samples yielded a complete genome, and breadth of coverage dropped below 57% (SD 31.65, 95%CI 48.70-59.08) (Table 2).

Table 1. Workflow metrics for two separate sample processing systems.

       * Listed is the number of samples that can be processed within the specified turnaround time, on a single sequencing run. This is not necessarily the upper limit of either processing system.

       **A detailed breakdown of reagent costs is available in supplementary materials.

       *** Time from raw sample to analyzed data.

Sequencing
method
Samples able to
be processed*
Cost per
Sample**
Personnel
Needed
Turnaround
time***
Illumina DNA Prep24$60.30148hrs
Seqwell plexWell 1152$98.134–682hrs

Table 2. Average sequencing metrics for various RT-PCR cycle threshold values of samples sequenced using the plexWell 384 system, and either an Illumina MiSeq or an Illumina NextSeq550.

       *Ct value reported is that for the nucleocapsid-2 gene.

       **Uniform depth of coverage was targeted for all samples.

Average Cycle
Threshold Value (n)*
Average %
aligned
Average %
coverage
Average depth**% samples >
90% coverage
<20 (194)96.64 (SD 12.27,
95%CI 94.91-98.37)
92.36 (SD 21.34
95%CI 89.35-95.39)
3371.58 (SD 2041.06
95%CI 3084.37-3658.80)
87.63
20–25 (210)96.91 (SD 11.34
95%CI 95.38-98.44)
92.61 (SD 17.60
(95%CI 90.22-94.99)
2848.11 (SD 1523.21
95%CI 2642.09-3054.12)
85.24
25–30 (169)95.67 (SD 12.81
95%CI 93.74-97.60)
91.11 (SD 20.12
95%CI 88.08-94.14)
2733.37 (SD 1517.23
95%CI 2504.62-2962.11)
84.62
30–33 (110)88.58 (SD 18.99
95%CI 85.03-92.13)
87.18 (SD 21.89
95%CI 83.09-91.27)
2113.09 (SD 1389.81
95%CI 1853.37-2372.81)
74.55
33–35 (71)74.68 (SD 25.36
95%CI 68.78-80.58)
77.85 (SD 23.00
95%CI 72.50-83.20)
1221.05 (SD 1102.35
95%CI 964.64-1477.46)
40.85
35–37 (52)60.20 SD (32.55
95%CI 51.36-69.05)
56.91 (SD 30.39
95%CI 48.65-65.17
716.43 (SD 1140.89
95%CI 406.33-1026.42)
17.31
37+ (91)38.28 (SD 34.42
95%CI 31.21-45.35)
52.17 (SD 32.38
95%CI 45.54-58.80)
576.09 (SD 1286.98
95%CI 311.67-840.51
18.68
9857c800-4597-4311-999b-db17efa33ad2_figure2.gif

Figure 2. Sequencing outcomes of SARS-CoV2-positive samples processed using Seqwell’s plexWell method.

A. Nucleocapsid-2 Ct value vs percent genome coverage and B. percent total reads mapped to SARS-CoV-2 for 897 SARS-CoV-2-positive samples sequenced using Seqwell’s plexWell 384 system.

Uniform depth of genome coverage across samples was targeted when pooling samples for sequencing, but depth generally decreased as Ct increased. Average depth of coverage was approximately 2850X up to a Ct of 33 (SD 1712.91, 95%CI 2721.57-2978.49). Past 33, coverage dropped off sharply, with an average depth of 1221X between Ct values of 33-35 (SD 1102.35, 95%CI 964.64-1477.46). Above a Ct of 35, depth dropped to 627X (SD 1233.64, 95%CI 424.93-829.32).(Table 2).

The number of reads targeted per sample for the high-throughput method was approximately 348,000, considering the limitation of the number of index combinations (i.e., samples per run) that were available at the time. In practice, reads per sample varied considerably. The average read count for samples using the plexWell method was 411,375, with 393,925 being the average number of reads that aligned to the WuHan-Hu-1 reference strain. Below a Ct of 33, average aligned reads was 475,310 (SD 260292, 95%CI 455789-494831) per sample. Between Cts of 33 and 35, the average reads aligned was 204,234 (SD 175947, 95%CI 163308-245160) per sample. Above 35, read counts dropped off sharply, with a mean of 99,388 reads aligning (SD 188499, 95%CI 68493-130283) per sample. Note that all samples were put through an equimolar pooling step by the plexWell system. This suggests this pooling is not entirely effective when Cts vary considerably in a given sample set. This was confirmed through Kruskal Wallis on the total (unaligned) read counts in each of the above-mentioned groups (p <0.001).

Percent of total reads aligning to the SARS-CoV-2 reference genome decreased as the Ct value increased (Table 2, Figure 2). Up to a Ct value of 33, most (~95%) reads mapped to SARS-CoV-2 (SD 13.74, 95%CI 94.15-96.22). From Ct values of 33 to 35, this noticeably decreased, with a mean of 75% alignment (SD 25.36, 95%CI 68.78-80.58) Beyond a Ct value of 35, average percentage of reads aligning dropped to 46% (SD 35.26, 95%CI 40.47-52.03)

Rapid-Turnaround Workflow

Turnaround time for the Illumina DNA prep method was significantly faster than the high-throughput plexWell system. It took less than 48 hours to go from raw sample to deliverable, analyzed data (Table 1), and this time could potentially be reduced further by reducing cycle numbers per sequencing run. Illumina estimates a cycle time of ~2.5 minutes for the kit type described, meaning a reduction of just 50 cycles could reduce the overall run time by >2 hours41.

Both the plexWell and DNA Prep methods use a tagmentation system for adapter addition, rather than a ligation-based approach. This results in adapters being added ~50 bp or more from the end of overlapping amplicons generated during gene-specific PCR. A second PCR step amplifies only the tagmented regions, resulting in final libraries of ~300bp. These approaches negate the need for primer trimming prior to alignment.

Generating consensus genomes and a SNP report from the sequence data, which takes approximately 15 minutes for a small (<=24 samples) dataset, quickly shows whether the samples are part of the same outbreak. To quickly find a potential origin of a cluster or sample (assuming genomes in the public domain were collected prior to the sample(s) of interest), a global SNP database can be used. Constructing or updating a global genome SNP database takes several hours, but it can be done prior to sample sequencing (and regularly). Running the commands to query the global SNP database for particular SNPs of interest takes mere seconds.

To rapidly visualize results of the global SNP database query, a stacked Venn diagram (aka, “onion diagram”) visually describes the hierarchical nature, i.e., the parsimony, of SNPs found in the SARS-CoV-2 genomes (Figure 3), and is easily generated using the methods described above or an alternative tool.

9857c800-4597-4311-999b-db17efa33ad2_figure3.gif

Figure 3. Phylogenetic relationship of SARS-CoV2-positive samples to Wuhan-Hu-1 reference strain.

Stacked Venn “onion” diagram indicating the hierarchical nature of cluster-specific SNPs relative to the reference strain in global collection of SARS-CoV2 samples.

Although fewer data are available from the Illumina DNA prep method, as it was primarily used to process five or fewer samples at a time, data suggest that it performed slightly worse than the plexWell system. Average percent coverage for samples with Ct values less than 25 was 94.53% (SD 3.74, 95%CI 91.54-97.52). At a Ct between 25 and 30, the average coverage was 87.6% (SD 5.62, 95%CI 83.71-91.49). Odds of obtaining a complete genome dropped to 0 at Ct values greater than 30, with an average percent coverage of 62.83 (SD 20.22, 95%CI 39.95-85.71) (Table 3, Supplementary Data).

Table 3. Average sequencing metrics for various RT-PCR cycle threshold values of samples sequenced using the Illumina DNA Prep system and an Illumina MiSeq.

       *Ct value reported is that for the nucleocapsid-2 gene.

       **Uniform depth of coverage was targeted for all samples.

Average Cycle
Threshold Value (n)*
Average %
aligned
Average %
coverage
Average depth**% samples >
90% coverage
<25 (6)96.56 (SD 2.38
95%CI 94.65-98.46)
94.53 (SD 3.74
95%CI 91.54-97.52)
1503.00 (SD 743.91,
95%CI 908.35-2098.83)
83.33
25–30 (8)91.92 (SD 19.66
95%CI 68.29-95.53)
87.60 (SD 5.62
95%CI 83.71-91.49)
1115.25 (SD 470.39
95%CI 342.46-1888.04)
50%
30–37 (3)56.46 (SD 37.84,
95%CI 13.65-99.28)
62.83 (SD 20.22,
95%CI 39.95-85.71)
851.21(SD 443.07
95%CI 349.84-1352.59)
0%

A minimum of 42,000 reads were targeted for the Illumina DNA prep method, but as sample sizes were small for the outbreak in question, the average aligned reads per sample was much higher. For samples with Ct values below 25, the average number of reads aligned to the reference was 275,258 (SD 136,816, 95%CI 165,784-384732). Between Ct values of 25 and 30, average reads aligned was 200,810 (SD 86,999, 95%CI 140,523-261096). Below a Ct value of 30, average aligned reads dropped to 116,888 (SD 275257, 95%CI 53,002-180,773). Kruskal-Wallis revealed no significant difference among the average aligned reads between these groups (P= 0.08).

Discussion

The recent SARS-CoV-2 outbreak has highlighted the need for real-time sequencing and data analysis capacity in the face of active pandemics, as well as high-throughput sequencing and analysis strategies for comprehensive retrospective analysis and evaluation. We describe two different strategies that can be used in combination with Illumina platforms for the rapid or high-throughput interrogation of samples involved in disease outbreaks. Though the results reported here are specific to the SARS-CoV-2 outbreak, these methods could conceivably be modified and applied to many other situations in which a genomic epidemiological response is needed.

The Illumina DNA prep-SNP-comparison analysis method is effective in providing rapid sequence data and genomic epidemiology information for small numbers of samples (<48 hours from raw sample to rough phylogenetic placement). Though throughput is limited by both the number of available indices for multiplexing and the nature of the protocol itself, this method has the advantage of being scalable to small numbers of samples, and can be performed in the course of several hours for small sample subsets.

The Seqwell plexWell system provides a scalable, cost-effective, and high-throughput method of processing thousands of samples with minimal laboratory personnel (Table 1). Here, we report the first use of plexWell for post-amplification adapter addition for whole genome sequencing of SARS-CoV2. As the plexWell protocol calls for the pooling of samples at an initial step in the adapter tagmentation process, hundreds of samples were able to be taken through the later steps of the protocol by a single individual in an 8-hour timeframe. This, coupled with the availability of thousands of index combinations, allowed for a high-throughput, cost-effective means of processing large numbers of samples on a weekly basis with minimal laboratory personnel and infrastructure. With just two full-time staff members devoted to sequencing, and 2–4 part-time staff responsible for arraying samples prior to the ARTIC/plexWell protocol, 1152 samples were able to be successfully processed and sequenced each week. Thus, the number of genomes completed per full-time (40 hour) employee per week is 384, a novel pace for what is normally a lengthy protocol.

Typical analyses of virus genomes include phylogenetic tree construction to understand transmission patterns. Nextstrain16 and GISAID15 have been crucial to the SARS-CoV-2 scientific community for global and local epidemiologic understanding, and tools for smart subsampling (e.g. genome-sampler38) are now necessary with the growth of the public databases. However, reconstructing phylogenies, especially paired with finding relevant subsets, takes time, and is often overkill for initial, time-sensitive public health needs. We employ a simple, rapid analysis method and visualization meant as a quick-look to determine relatedness among a sample set of interest and/or relatedness of a sample or set to the entire public database of genomes. Because of the novelty of SARS-CoV-2 and its low rate of recombination, merely comparing the low number of SNPs (at the time of first publication) across samples without applying a phylogenetic model or program is often enough to answer initial questions about COVID-19 transmission, e.g. whether samples in a given set are closely related, and/or which samples in the global database are most closely related to a given sample set. This information, then, can be used by public health officials to determine which patients might be involved in active spread of disease (requiring additional shoe-leather epidemiology to follow up) and which patients are thought to represent isolated cases without additional spread. This is not meant to replace more thorough phylogenetic analysis, but rather, to provide a quick genomic snapshot that can inform a public health ground response. Our SNP queries followed by generation of a stacked Venn diagram (onion diagram) offer a much faster alternative or antecedent to complete phylogenetic analysis, and have served as a starting point for contact tracing in active outbreaks. The data presented here using the rapid-turnaround method were obtained from coded, de-identified patient samples that were part of an active SARS-CoV2 outbreak, and the resulting phylogenetic placements were successfully used to inform public health efforts to determine the potential origin of the outbreak, and to prevent further spread42. Other pathogens or situations where SNP numbers are expected to be very low may also benefit from these rapid analysis methods.

For retrospective studies, time can allow for more robust phylogenetic analyses including smart subsamplers38, such as NextStrain16 and other commonly used phylogenetic tools; however, their employment can significantly add time to a rapid response. The UShER tool37, which can rapidly place genomes onto an existing SARS-CoV-2 phylogenetic tree, can greatly speed-up the final analysis. Parsing the output of UShER generates a subset of public genomes that are phylogenetically close to the samples of interest. This reduces the input dataset to subsamplers such as genome-sampler38, which significantly reduces the computation time for further subsampling based on geography and time, which in turn significantly reduces the computation time for a NextStrain analysis.

Each of the two methods have their limitations. We observed a reduced success rate of the rapid Illumina DNA Prep method over the high-throughput plexWell system, as evidenced by both decreased overall breadth of coverage and decreased success of obtaining complete genomes at Ct values above 30. It should be noted, however, that rapid prep study was conducted on a limited sample set, using samples from active outbreak clusters that were shipped from long distances through varying ambient temperatures. Samples used to evaluate the plexWell system were obtained locally and processed entirely in-house. This difference in handling, coupled with the sample size difference, may in part account for the differences in results in the two prep methods. Also, at ~$100/sample in reagent costs, the rapid Illumina DNA Prep method is less cost-effective (Table 1, Supplementary Data). And though the turnaround for the protocol is ~1.5 days, faster sequencing is achievable through other methods, such as through the use of long-read Nanopore sequencing17,39. Nanopore technology, however, has the disadvantage of having a higher per-base error rate when compared to short-read sequencing methods17, thus its lack of accuracy may outweigh any potential time savings, particularly in situations where relatively few nucleotide variants can radically alter phylogenetic placement such as for SARS-CoV-2. Also, the SNP-comparison analysis is rapid and robust because of the relatively low numbers of SNPs so far documented in the SARS-CoV-2 genome (at the time of submission of this manuscript) due to the virus’s novelty, the lack of recombination, and the unmatched robustness of the global SARS-CoV-2 genome database. Though this method could be applied to other types of outbreaks, rapid, precise phylogenetic placement will rely on these same factors.

The high-throughput plexWell prep system also has its drawbacks. The turnaround time of this method, when large sample numbers are processed, is not competitive39. The majority of processing time is lost to the ARTIC portion of the protocol, however, and not specifically to the plexWell adapter addition. And though the combinatorial index system allows for thousands of samples to be multiplexed in a single sequencing run, the SARS pandemic has demonstrated that even this may not be sufficient to meet the data challenges presented by expansive disease outbreaks.

This paper highlights the potential for rapid turnout of genomic data from epidemiological samples, giving genomics the potential to become invaluable in pandemic response. It is worth noting, however, that sequencing isn’t the only potential bottleneck in the process. For data to be delivered in real-time, coordination between testing sites, sample delivery, and sequencing laboratories must be robust, so that preparation for sequencing can begin shortly after a sample has tested positive for the pathogen of interest. Failures anywhere in this chain can lead to significant delays, which negates the utility of a real-time sequencing pipeline for investigation. Thus, in addition to improvements in the described sequencing and analysis pipeline, coordinating the efforts of testing sites, public health partners, and sequencing laboratories is critical for this technology to be most effective.

Despite overwhelming benefits of employing next generation technological advances in real-time during a public health emergency, challenges remain when using genomic epidemiology as a means of pandemic control and monitoring. It has been demonstrated that genomics are not, in and of themselves, sufficient to completely elucidate the mechanisms and transmission of all pathogen-transmitted disease, particularly when asymptomatic and mild infections are known to play a role in transmission43 but are less likely to be identified and subsequently sequenced. Thus, the integration of genomic and traditional epidemiology is paramount to the success of this 21st century public health capability.

Data availability

Underlying data

Zenodo: Supplementary Data for "Sequencing the Pandemic: Rapid and High-Throughput Processing and Analysis of COVID-19 Clinical Samples for 21st Century Public Health" DOI: 10.5281/zenodo.582296244.

This project contains the following underlying data:

  • Supplementary_data_final.xlsx – includes all data used in generation of figures and tables, as well as cost breakdown for two sequencing methods.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 26 Jan 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Folkerts ML, Lemmer D, Pfeiffer A et al. Methods for sequencing the pandemic: benefits of rapid or high-throughput processing [version 2; peer review: 2 approved] F1000Research 2022, 10:48 (https://doi.org/10.12688/f1000research.28352.2)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 21 Feb 2022
Revised
Views
9
Cite
Reviewer Report 14 Mar 2022
Bronwyn L. MacInnis, Broad Institute of Harvard and MIT, Cambridge, MA, USA 
Approved
VIEWS 9
The authors have addressed my comments in this ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
MacInnis BL. Reviewer Report For: Methods for sequencing the pandemic: benefits of rapid or high-throughput processing [version 2; peer review: 2 approved]. F1000Research 2022, 10:48 (https://doi.org/10.5256/f1000research.120334.r124549)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
4
Cite
Reviewer Report 07 Mar 2022
Keith Crandall, Computational Biology Institute, The George Washington University, Washington, DC, USA;  Department of Biostatistics & Bioinformatics, The George Washington University, Washington, DC, USA 
Approved
VIEWS 4
The authors have responded well to my previous critiques. I think the paper now provides useful information for ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Crandall K. Reviewer Report For: Methods for sequencing the pandemic: benefits of rapid or high-throughput processing [version 2; peer review: 2 approved]. F1000Research 2022, 10:48 (https://doi.org/10.5256/f1000research.120334.r124550)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 26 Jan 2021
Views
39
Cite
Reviewer Report 21 Sep 2021
Keith Crandall, Computational Biology Institute, The George Washington University, Washington, DC, USA;  Department of Biostatistics & Bioinformatics, The George Washington University, Washington, DC, USA 
Not Approved
VIEWS 39
This study proposes a reasonable approach to rapid assessment of viral diversity for SARS-CoV-2 taking advantage of the latest in Illumina sequencing kits and associated prep protocols as well as open source data and tools for downstream analyses. There are ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Crandall K. Reviewer Report For: Methods for sequencing the pandemic: benefits of rapid or high-throughput processing [version 2; peer review: 2 approved]. F1000Research 2022, 10:48 (https://doi.org/10.5256/f1000research.31361.r92632)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 21 Feb 2022
    Megan Folkerts, Pathogen Genomics Division, Translational Genomics Research Institute, Arizona, Flagstaff, 86005, USA
    21 Feb 2022
    Author Response
    Dr. Crandall, 

    We want to thank you for taking the time to review our paper and provide valuable insight and comments aimed at strengthening our manuscript. We have carefully ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 21 Feb 2022
    Megan Folkerts, Pathogen Genomics Division, Translational Genomics Research Institute, Arizona, Flagstaff, 86005, USA
    21 Feb 2022
    Author Response
    Dr. Crandall, 

    We want to thank you for taking the time to review our paper and provide valuable insight and comments aimed at strengthening our manuscript. We have carefully ... Continue reading
Views
28
Cite
Reviewer Report 16 Mar 2021
Bronwyn L. MacInnis, Broad Institute of Harvard and MIT, Cambridge, MA, USA 
Approved with Reservations
VIEWS 28
This is a methodologically sound study describing complementary approaches for generating and analyzing SARS-CoV-2 genomic data. The workflows address two common use cases: rapid turnaround of a relatively small number of samples, i.e. for cluster investigation, and larger scale but ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
MacInnis BL. Reviewer Report For: Methods for sequencing the pandemic: benefits of rapid or high-throughput processing [version 2; peer review: 2 approved]. F1000Research 2022, 10:48 (https://doi.org/10.5256/f1000research.31361.r78623)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 21 Feb 2022
    Megan Folkerts, Pathogen Genomics Division, Translational Genomics Research Institute, Arizona, Flagstaff, 86005, USA
    21 Feb 2022
    Author Response
    Dr. MacInnis, 

    We want to thank you for taking the time to review our paper and provide valuable insight and comments aimed at strengthening our manuscript. We have carefully ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 21 Feb 2022
    Megan Folkerts, Pathogen Genomics Division, Translational Genomics Research Institute, Arizona, Flagstaff, 86005, USA
    21 Feb 2022
    Author Response
    Dr. MacInnis, 

    We want to thank you for taking the time to review our paper and provide valuable insight and comments aimed at strengthening our manuscript. We have carefully ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 26 Jan 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.