Introduction

Identifying viral genetic material using the PCR technique is considered the gold standard for determining SARS-CoV-2 in nasal swab samples from symptomatic patients. Since the outbreak started, the World Health Organization (WHO) released some SARS-CoV-2 polymerase chain reaction (PCR) protocol assays produced by different reference institutions in the world1. In addition to these initial protocols, an increasing number of works and commercial kits suggest new alternatives to identifying SARS-CoV-2 and its recent variants by molecular or immunological approaches2,3,4.

Concomitant with those advances, the rapid increase in the number of available SARS-CoV-2 sequences from different localities identified several polymorphic regions in SARS-CoV-2's genome. Therefore, it is plausible that some of the available early PCR detection kits have primers that may target these polymorphic regions. That possibility can compromise accurate identification of some viral variants and increase the number of false negatives or inconclusive results, especially with the rise of new and potentially dangerous variants.

In this context, the development of primers that target conserved regions in the genome to detect many viral variants is imperative. In this work, we identified 26 conserved segments (CS) in the SARS-CoV-2 genome based on an alignment of 2,341 full genome sequences and used these regions as a target for the design of universal primers and probes. We extended the analyses to include 211,833 SARS-CoV-2 sequences and the recent virus variants and further demonstrated that the proposed primers are still located in conserved regions, confirming their potential as universal primers.

Results

At the end of the analysis, we elected nine candidate systems (forward primer + reverse primer + probe) that met all requirements (Table 1). In general, in silico analyses revealed that the primers pairs proposed in this study (UFRN_primers) (Table 1) are more compatible with each other (evaluated by lower differences between forward and reverse primers' Tm), have lower self-complementarity (both overall and 3'), and higher specificity than the previously described primers (PD_primers) (Table 2). Regarding the proposed probes, only the probes UFRN_3_P and UFRN_4_P did not reach a Tm higher than that of their respective primer pairs.

Table 1 Primers designed in this study.
Table 2 Primers released by WHO to detect SARS-CoV-2 using polymerase chain reaction.

By comparing the number of SARS-CoV-2 sequences that anneal without mismatches ("No mis" in Tables 1 and 2), using in silico PCR methodologies, it is safe to assume that the set of UFRN_primers targets fewer polymorphic sites in the viral genome than the PD_primers set. Among the 211,833 SARS-CoV-2 genomic sequences used as targets, UFRN_primers anneal with 100% identity with at least 207,689 (UFRN_3) reaching 210,860 (UFRN_8). The probes from the UFRN_primers set aligned with most templates in BLAST searches against the same sequence database (Supplementary material 1 and 2).

We also compared the UFRN_primers against a specific sequence set containing recent SARS-CoV-2 variants, which includes (B.1.1.7, B.1.351, B.1.427, B.1.429, B.1.525, and P.1) (Table 3). At this test, the proposed primer set also presented better in silico results when compared to the PD_primers set. All UFRN_primers, except for the UFRN_4 primer, annealed with all the sequences from variants B.1351 (495 sequences), B.1429, B.1427, B.1525 (94 sequences in total), and P1 (177 sequences) tested, with no mismatches allowed. Regarding the B.1.1.7 variant, the primers UFRN_3, UFRN_5, and UFRN_8 annealed to the vast majority of its sequences (Table 3). Still, two primers (2019-nCoV_N2 and nCoV_IP2-12669Fw) from the PD_primer set had the same performance as the three UFRN_primers mentioned above (Table 4).

Table 3 Analysis of potential annealing (In silico PCR) of UFRN primers (UFRN_primers) to the genomes of the main SARS-CoV-2 variants.
Table 4 Analysis of potential annealing (In silico PCR) of WHO primers (PD_primers) to the genomes of the main SARS-CoV-2 variants.

Concerning the specificity, both primers set performed well. Tests allowing 20% mismatch against Apicomplexa targets revealed that the 2019-nCoV_N2-F / 2019-nCoV_N2-R and UFRN_8_F / UFRN_8_R primer pairs could generate 746 bp and 755 bp amplicons with Toxoplasma gondii sequences from accession codes XTG08368.2 and XM_002364674.2, respectively. The other pairs of primers did not present nonspecific amplicons allowing values between 0 and 20% of mismatches.

Examining the genomes of Gammacoronavirus, Alphacoronavirus, SARS-CoV, SARS-CoV-like, MERS-CoV, Betacoronavirus (excluding SARS-CoV-2), only two (UFRN_1 and UFRN_5) of the nine primers produced amplicons, only with mismatches' allowance (10%). Against 2298 sequences retrieved from the Virus Variation database, these tests predicted just one amplicon (101 bp) with one sequence target (accession code MG772934.1) and 137 amplicons (105 bp) from primer UFRN_1 and UFRN_5, respectively.

Discussion

Early detection of pathogens is crucial to disease prevention5 and containment, especially during epidemic outbreaks6. PCR is a reliable and relatively accessible molecular method that directly recognizes pathogen-derived material from patients samples7. However, PCR protocols' optimization is strongly dependent on primers' specificity and efficiency8. This reason, combined with the increasing number of SARS-CoV-2 sequences available and its crescent polymorphism, led us to design a set of new primers that can address very conserved regions of the virus genomes.

Therefore, to aid PCR optimization, the UFRN_primers were designed to present Tm values that were as close as possible. These settings will probably enable the use of at least two systems using the same thermal cycling parameters. In this way, it would be possible to perform the PCR test identifying different viral genome regions simultaneously, according to the protocols already described for the PD_primers. In this context, possibly the systems UFRN_3 and UFRN_4 will have different thermal cycling parameters compared to the other systems since, in this case, the probe Tm is similar to the primers (Table 1). Probably these systems will depend on more annealing time to ensure that the probe has interacted in the DNA template before the amplification starts.

The higher specificity of UFRN_primers confirmed by in silico analysis is mainly due to the availability of 2.341 genome sequences, which made it possible to identify the conserved regions with greater accuracy from the alignment. The UFRN_6 and UFRN_7 primers differ only by one base and have overlapping probes. However, these discrete differences were sufficient to alter the sequences in which these primers interact (Table 1). Only 12 sequences did not anneal with the designed primers. Among them, seven were isolated from pangolins and one from bats, all from China provinces. The other four sequences are from Australia and Nigeria and presented a high percentage of N bases, which might have caused negative results.

Another striking result is that UFRN_primers presented a higher potential to identify the main SARS-CoV-2 recent variants of concern than the PD_primers, significantly the B.1.351, B.1.427, B.1.429, B.1.525, and P.1. In silico predictions indicate that the UFRN_primers are potentially less prone to generate false-negative results. Its application could represent a significant difference to Covid-19 diagnostic and epidemiology since the Food and Drugs Administration (FDA) has recently warned of the negative impact of SARS-CoV-2 genetic variants on molecular detection tests available9.

The use of universal primers makes it possible to identify several virus variants using the same PCR protocol. UFRN_primers are strong candidates to simplify the procedures and supply chain for detecting SARS-CoV-2, allowing, for example, the mass production of primers and kits that could be applied in different parts of the world with equivalent efficiency. However, the primers presented here still depend on in vitro validation. The availability of these sequences at this time will be crucial so that these new protocols can be validated promptly to assist in the control of the SARS-CoV-2 pandemic.

Another critical point is that primers presented here were tested against the updated RNA sequences databases from bacteria, fungi, and protozoa and did not generate nonspecific amplicons in any case. Although executed through in silico analyses, this lack of prediction increases the potential for applying these primers to different samples such as blood, feces, or even environmental samples. Currently, the most suitable sample for detecting SARS-CoV-2 is the human nasal swab; however, there are already studies that have shown digestive symptoms (e.g. diarrhea and vomiting)10,11 and other less frequent symptoms (e.g. conjunctivitis) in patients who tested positive for SARS-CoV-212,13,14. This diversity of symptoms makes clinical diagnosis difficult, and testing new types of samples may be needed quickly. The application of UFRN_primers to detect SARS-CoV-2 in blood or fecal samples is likely efficient since these primers should not interact non-specifically with RNAs of the main protozoa and bacteria that cause health problems in humans.

Quite possibly, at the time of publication of this work, a considerably larger number of additional sequences will be available, which may reveal new polymorphic sites in the target regions of UFRN_primers and PD_primers. In this way, our research group will continue this bioinformatics work, and whenever relevant, we will report new updates on the primer sequences or new primers.

Methods

Whole-genome sequences of SARS-CoV-2 from human isolates were retrieved from the Global Initiative on Sharing All Influenza Data (GISAID—gisaid.org)15 and Virus Variation from the National Center for Biotechnology Information (NCBI—https://www.ncbi.nlm.nih.gov/genome/viruses/variation/)16 databases, between Mar 30 and Nov 24, 2020. To minimize sequencing errors and artifacts, we activated the filters "complete (> 29.000 bp)", "high coverage only" and "low coverage excl" at sequence retrieval in GISAID database and the filter "Complete" under the option "Nucleotide completeness" from the Virus Variation database. The full list of authors and laboratories of GISAID submissions and the Virus Variation sequences accessions are available in Supplementary Table 3.

Complete fasta sequences were then aligned using Clustal-Omega, version 1.2.417, with standard parameters, using a supercomputer. To avoid excessive misaligned gaps and to better identify conserved polymorphic sites, we trimmed the multiple sequence alignments (MSAs) using the trimAL tool, version 1.218, with the "-automated1" option. We used the sequence from a Wuhan seafood market pneumonia virus (GenBank Accession code MN908947)19 as a reference for all alignments to identify site and region positions.

The CSs were submitted to online Primer-BLAST20 to design primer pairs adopting the following criteria: PCR product size = 90–150 nt; primer melting temperatures (°C) minimum = 55, optimum = 58, maximum = 63 and maximum melting temperature (Tm) difference = 2 °C. The specificity check was performed using the complete Refseq RNA databases for Homo sapiens (taxid: 9606), Bacteria (taxid: 2), Fungi (taxid:4751), Apicomplexa (taxid:5794). We set the primer specificity stringency so that the primer must have at least 3 total mismatches to unintended targets, including at least 2 mismatches within the last 5 bps at the 3' ignoring targets with 5 or more mismatches to the primer. The other Primer-BLAST parameters have been kept in the default configuration to confirm the newly-designed primers pairs features.

From all the primers generated by the Primer-BLAST, we selected 124 primer pairs that presented low self-complementarity for total annealing (max 5 nt) and also for annealing in the 3' region (max 3 nt). After individual evaluation using the Geneious suite (version 9.1.8, 2017), we elected 9 primer pairs that target regions with 100% identity among all 2143 initial genomes. These primers comprise ORF1a, ORF1b, and S regions of the SARS-CoV-2 genome. TaqMan probes for each primer pair were also designed considering the same alignment and prioritizing conserved regions inside each of the predicted amplicons.

To compare and assess the already used and newly-designed primers and probes' annealing specificity, we used three different tools: PrimerSearch version 6.6.0 from the Emboss package21, the stand-alone BLAST + 22, and the on-line Primer-BLAST. For the first two tools, we used five different custom databases: (1) SARS-CoV-2 sequences from GISAID (211,833 genome sequences retrieved on Nov 24, 2020), with the filters as mentioned earlier activated; (2) SARS-CoV-2 sequences from Virus Variation; (3) RefSeq RNAs from Apicomplexa taxon, retrieved from GenBank on Mar 30, 2020; (4) RefSeq RNAs from Toxoplasma taxon, also from GenBank (Mar 30, 2020) and (5) 2298 sequences from Virus Variation database, including Gammacoronavirus, Alphacoronavirus, SARS-CoV, SARS-CoV-like, MERS-CoV, Betacoronavirus (excluding Sars-CoV-2).

The specificity test's first step was to search all 5' and 3' primers pairs sequences to verify amplicon possibilities using PrimerSearch, against each of the databases mentioned above. We used three different mismatch allowance percentages (0, 10, and 20%). We also evaluated the number of hits subject sequences from stand-alone BLAST +, the aligned start and end regions, and the number of mismatches for each alignment for probes similarity searches.

The genome sequences of B.1.1.7, B.1.351, B.1.427, B.1.429, B.1.525, and P.1 variants were retrieved from the GISAID database with the following filters activated: "complete sequence", "excl low coverage", "high coverage", and "w/ pacient status". The total number of sequences for each variant was: 1931 for B.1.1.7, 495 for B.1.351, 94 for B.1.427, B.1.429 e B.1.525, and 177 for P.1. The primer pairs were aligned with each set of sequences using PrimerSearch, with the parameters of 0% mismatches and 10% mismatches allowed. The results were processed and recorded for each primer pair and variant using a custom shell script.