Coronaviruses are of great interest due to the threatening development of a human pandemic caused by the SARS-CoV2 strain [1, 2]. Viruses of this family have an outer lipid envelope (enveloped viruses) and a single-stranded genomic RNA, which has a classic positive polarity and, in different viruses, encodes 25–30 specific proteins [3]. Expression of viral proteins occurs in two phases. In the first phase, the 5'-terminal part of the genome is translated with the formation of 19 nonstructural proteins, ns1–ns19 (Fig. 1a). In the second phase, the 3'-terminal part of the genome is translated through intermediate replication and transcription of the genome with the formation of individual mRNAs encoding structural proteins N, M, S, E, HE, and a number of nonstructural accessory proteins (Fig. 1a).

Fig. 1.
figure 1

Replication scheme of the RNA genome and localization of negative-polarity genes in the genome of coronaviruses. (a) Scheme of the genomic viral RNA and the reading frame of viral genes based on the coronavirus model (ac.n. MT890462.1; SARS-Cov2/human/RUS/20200417_10/2020). The vertical shift of the reading frame in the viral gene reflects the shift in the translation phase. UTR—untranslated terminal regions in RNA. The positive-polarity nonstructural (nsp1-nsp19, 3a, 4c, 7ab, 8a, 9b) and structural (N, S, M, E, HE) genes are shown with gray boxes. The negative-polarity genes and proteins are designated NGP (negative genes proteins) and are shown with wide arrows shaded with a fine grid. Proteins NGP1 (nt positions 1335–1787), NGP2 (2505–2861), NGP3 (2234–2656), and NGP5 (28757–29095) begin with the dominant alternative CUG codon (dashed arrows); the NGP4 protein (nt 6187–6489; the zone of the “positive-polarity” protein nsp3 (Mpro)) begins with the classic AUG codon (solid arrow). (b) Schemes of localization of large negative-polarity genes (≥300 nt) of four coronaviruses: (a) MERS-CoV, (b) SARS-CoV2, (c) Pangolin-CoV, and (d) Bat SARS-like CoV RatGT13. The schemes are based on the analysis of the nucleotide sequence in the GenBank database: human coronavirus MERS (NC_019843.3), SARS-CoV (NC_004718.3), SARS-CoV2 (MT635445.1), Pangolin-CoV (MT040335.1), and Bat/RATG13 (MN996532.1). The numbers indicate the nucleotide at the beginning of the gene, counting from the 5'-end of the viral genome. (c) Scheme of the secondary structure of the region upstream of the NGP4 gene (positions 6187–6489 nt from the 5'-end of genomic (+)RNA) obtained using the HTTP://bioinfo.net.in/IRESPred software [5]; the arrow indicates the AUG initiation codon.

As a result of in silico analysis of the coronaviral genomic RNA, we found extended open reading frames (ORFs) that started with the AUG codon and ended with the classical termination codons UAA or UAG (Table 1, Fig. 1b). Analysis of the negative-polarity zones preceding the AUG position in the identified genes, in particular, the largest gene NGP4 (the negative gene protein 4) (nc 6137–6489) in the SARS-CoV2 virus genome, which was performed using a computer prediction program for ribosome-binding elements, showed the presence of regions with a pronounced secondary structure with numerous hair-pins and, according to the criteria HTTP://bioinfo.net.in/IRESPred software [5], exhibiting a high energy stability (free energy 85 kcal/mmol) and structural properties of the internal ribosome entry site (IRES) (Fig. 1c). Such a structure of the 5'-ATG-adjacent zone can provide recognition of mRNA by ribosomes and subsequent translation of the protein [5]. Moreover, in this IRES-like zone, two additional AUGs and three alternative CUG initiation codons in the translation phase +1 and/or +2 were detected, which might also facilitate the recognition and expression of this gene by ribosomes by the scanning mechanism [4].

Table 1. Quantitative characteristics of the detection of negative-polarity genes (ORF) in the genomes of coronaviruses

The length of the detected genes (ORF) varied in the range of 150–450 nucleotides (nt), which could ensure the synthesis of polypeptides from the molecular weight of at 5–30 kDa. Comparison of the genomes of various members of the coronavirus family showed a significant diversity both in the number of such negative-polarity genes and in the pattern of their localization in the viral genome (Table 1, Fig. 1b). For example, the pangolin-CoV2 and SARS-CoV2 viruses, which, according to modern concepts, are the closest relatives (i.e., generations of the same predecessor), were shown to contain 29 and 21 negative genes, respectively, in the absence of coincidence of their positions in the genome (Fig. 1b). In contrast, the comparison of the BAT-CoV and SARS-CoV2 viruses, belonging to the same genus of beta coronaviruses, showed that they have a similar number of classic AUG-containing negative genes (17 and 21, respectively), which, moreover, have a similar localization in the genome. Thus, the presence and similar localization of these genes in the genome of human viruses and bats confirms the genetic and evolutionary proximity of these viruses.

Conversely, the identification of 29 AUG-negative genes in the genome of the Pangolin-Cov virus (Table 1) may indicate that, contrary to modern concepts, the virus from pangolins is a more distant relative of SARS-CoV2 than the bat virus.

Additional extended ORFs were detected in the genome of coronaviruses if the alternative initiation codon CUG was used as the start codon (Table 1). Similarly to the ORF with the classic AUG, the alternative-type ORFs had IRES-like structures and could provide the synthesis of extended polypeptides with molecular weights in the range of 5–30 kDa. The presence of additional negative-polarity genes of an alternative type in the genome of coronaviruses can significantly increase its genetic capacity.

The results of this report show the presence of extended reading frames (genes) in the genome of coronaviruses, the peculiarity of which is that these genes have a negative orientation. At the same time, the genome of coronaviruses is currently considered to be positive-polar, since all known genes of coronaviruses (approximately 25 genes for the nonstructural proteins and 5 major genes for the structural proteins (E, M, S, N, and HE)) are encoded in genomic RNA in a positive orientation and have an appropriate strategy of genome expression in infected target cells (Fig. 1a). The presence of new negative-polarity genes implies the existence of two mechanisms of their expression and possible synthesis of the corresponding mRNAs and subsequent translation of proteins in two possible ways: either direct translation of a replicative copy of genomic (–)RNA (pathway I) or the transcription of genomic (+)RNA with the formation of individual mRNAs of “negative polarity” for their subsequent translation with the formation of specific polypeptides (pathway II) (Fig. 1a, circled). Interestingly, in the genome of another family, influenza viruses belonging to the family of orthomyxoviruses, which are characterized by a negative-polarity strategy of genomic RNA, (ambisense) positive-polarity genes encoded on the viral negative-polarity genome were detected in a similar way [610].

The function and role of the newly discovered ambipolar viral genes have not yet been established. In the case of influenza viruses, there is an assumption that the identified new ambisense genes may be important in the regulation of the immune response to viral proteins and/or in the regulation of the stability of viral proteins in infected cells through the protein deubiquitination system [1114]. To understand the possible functional significance of the identified new ambipolar genes, it is necessity to take into account two features inherent in these genes. First, the evolutionary stability of the existence of ambipolar genes in viruses for a long time indicates their biological determination [11]. Second, the coding of genes with opposite polarity in the same region of the RNA molecule in the so-called genes stacking format makes it possible to significantly increase the genetic capacity of the viral genome and opens up new opportunities for the virus for variability, increasing adaptability to the host, and biological evolution in nature [11].

The presence of multiple ambisence genes opens up a real possibility of coding a multivirionic population consisting of virions of different structural types, when more than one type of viral particles with an identical genome but a different composition of structural proteins can be synthesized from one genome. In this case, part of the virions (possibly infectious) may remain invisible (the principle of the “dark side of the Moon”). Moreover, this multivirionic profile of the virus population, programmed by the viral genome, may have a cellular or tissue dependence, in which each type of viral particles will have autonomous replication and reproduction and dominate in a particular host (organ or tissue). This as yet hypothetical phenomenon of replication of multiviral particles on the same genome may be important in the cell- or organ-dependent pathogenesis of a viral disease and may create new platforms for the development of methods of treatment and vaccine prevention.

The discovered new negative-polarity genes in the genome of coronaviruses have specific localization for the viral strain and quantitative composition in the genome (Fig. 1b). Thus, the pattern of negative-polarity genes in the viral strain genome can serve as its molecular signature and be used in the diagnosis and study of viral relationships and biological evolution of the coronavirus family.

The presence of potential negative-polarity genes in the genome of coronaviruses raises the question of the classification of this family. The detection in infected cells or infected organisms of protein products expressed on the “negative” gene template gives grounds for classifying the coronavirus family with the ambisens viruses with a bipolar genome strategy.

Currently, such ambisens viruses include viruses of four genera: phlebo-, tospo-, arena-, and tenuviruses [15]. The ambisense genes located in the genome in the stacking format were found in influenza viruses, in which, similarly to coronaviruses, direct expression of these genes has not yet been identified, but there are indirect signs of such expression during natural viral infection in vivo [12, 13]. The study of the mechanisms of the possible expression of the genetic information of these new genes, as well as the elucidation of the role and significance of the detected genes and/or their protein products during viral replication can serve as the basis for creating a new type of vaccines and antiviral chemotherapy agents for treatment of coronavirus infection.