Influenza

Introduction to influenza viruses

The influenza virus is a recurring threat to public health. Seasonal influenza infections are associated with ~290,000–650,000 deaths annually worldwide [1], which includes ~12,000–61,000 deaths each year in the United States (US) alone [1, 2]. Unpredictably, but less frequently, global influenza pandemics occur, infecting 20–40% of the population in a single year [3,4,5,6] and dramatically raising death rates above normal levels. Influenza viruses belong to the Orthomyxoviridae family and are classified into four genera including type A, B, C, and the emerging type D [7, 8] based on their antigenic differences in the nucleoprotein and matrix 1 protein. Influenza viruses contain segmented, negative-sense, single-stranded RNA genomes. Influenza A viruses (IAVs) and influenza B viruses (IBVs) contain 8 viral RNA (vRNA) gene segments, whereas influenza C viruses (ICVs) and influenza D viruses (IDVs) contain 7 vRNA gene segments. Segments 1 (PB2), 2 (PB1), 3 (PA), 4 (HA), 5 (NP), 6 (NA), 7 (MP), and 8 (NS) of IAVs and IBVs encode polymerase basic protein 2 (PB2), polymerase basic protein 1 (PB1), polymerase acidic protein (PA), hemagglutinin (HA), nucleoprotein (NP), neuraminidase (NA), matrix proteins (M1 and M2), and nonstructural proteins (NS1 and NS2), respectively, which will be described in the subsequent sections. In addition, several novel accessory proteins of IAVs were identified that modulate viral pathogenicity, such as PB1-F2 [9] and PB1-N40 [10] encoded by the PB1 gene and PA-X [11], PA-N155, and PA-N182 [12] by the PA gene.

Evolution and antigenic variations of influenza viruses

The low-fidelity, “error-prone” RNA-dependent RNA polymerase (RdRp) of IAVs lacks the 3’ to 5’ exonuclease proofreading capability, leading to a rapid mutation rate that ranges from 0.4 × 10−3 to 2.0 × 10−6 mutations per nucleotide per year, depending on virus strain and gene [13,14,15,16,17]. Although the outcome of most random mutations is detrimental or lethal, non-deleterious mutations may be preserved and subsequently amplified in the population if they confer a fitness advantage [18]. High mutation frequencies and within-host selective pressures create quasi-species [19,20,21,22], defined as a proliferating population of non-identical but closely related viral genomes as seen with most RNA viruses, including influenza viruses [23, 24]. Some mutations can be positively selected in order for a virus to escape from host antibody neutralization or to replicate more efficiently, leading to virus variants becoming predominant in the population [25]. Population-level fitness has also been shown to be increased by cooperative interactions between variants within a quasi-species [26,27,28,29,30].

However, the overall mutation (at the nucleotide sequence level) and amino acid substitution (at the protein sequence level; from nonsynonymous mutations) frequencies are a complex association of factors that are genus-, strain-, and gene-specific and are even environmentally influenced (i.e., temperature or pH). These within- and between-host immune selection pressures result in variable evolutionary rates [13]. The phenomenon that amino acid substitutions accumulating on surface glycoproteins of influenza viruses gradually alter their antigenicity is referred to as “antigenic drift,” which allows influenza viruses to evade immune pressures from their hosts and is responsible for seasonal influenza epidemics that necessitate annual vaccine reformulations.

Unlike mutations, reassortment results in genome restructuring. Reassortment occurs when two strains from a shared genus infect the same host cell and produce a novel viral genotype, i.e., an assembly of segments from each “parental” strain. As is the case of random mutations, most reassortant events are deleterious, usually due to segment incompatibility [31]. When reassortment leads to the introduction of a novel HA and/or NA gene into a naive population (a population without existing immunity), it is commonly referred to as “antigenic shift” [32]. Antigenic shift, in combination with sustained human-to-human transmission, is a requirement for the emergence of an influenza pandemic strain. Reassortments have led to the emergence of the 1957, 1968, and 2009 IAV pandemics [33,34,35,36], contributed to the severe epidemics of 1947, 1951, and 2003, and facilitated the rise in antiviral drug resistance [37].

The influenza vaccine is the most viable option for counteracting and reducing the impact of influenza outbreaks [38]. Since the antigenicity of influenza viruses is continuously changing, new strains can potentially escape the immunity of previously exposed hosts [39, 40]. To account for antigenic drift, the vaccine composition of the influenza vaccine is revised almost yearly and separately in both the Northern and Southern Hemispheres and incorporates contemporary representatives of circulating viruses identified by continuous global monitoring and surveillance led by the World Health Organization’s (WHO) Global Influenza Surveillance and Response System (GISRS) [41].

Antigenic epitopes of HA and NA

HA and NA, which present in a ratio ranging from 4:1 to 10:1, are critical for viral attachment and release, respectively [42]. HA is a homo-trimeric glycoprotein containing a globular head subdomain spanning the primary HA sequences, and a stalk subdomain containing N and C termini of HA1 and most HA2 sequences (Fig. 1A). As the major target for immune responses, the HA globular head has high plasticity and is highly tolerant of amino acid substitutions and glycosylation alternations. Under selective pressure from the host immune system, the HA globular head subdomain has the highest amino acid substitution rate among all influenza virus proteins [43]. In comparison, the HA stalk subdomain is much less accessible to antibodies and is more conserved than the globular head subdomain [44].

Fig. 1
figure 1

Hemagglutinin (HA) structure and antigenic binding sites. (A) Structure of HA protein. (B) The five antigenic sites (i.e., Sa, Sb, Ca1, Ca2, and Cb) of H1 (A/California/04/2009; PDB 3UBE). (C) The five antigenic sites (i.e., A–E) of H3 (A/Aichi/2/1968; PDB 2YPG). (D) The four antigenic sites of influenza B viruses (B/Hong Kong/8/1973; PDB 3BT6)

Five major antibody binding sites (ABSs) are located in the HA1 globular head subdomain and are termed sites Sa, Sb, Ca1, Ca2, and Cb for subtype H1 IAVs (Fig. 1B) [45, 46] and as sites A, B, C, D, and E for subtype H3 IAVs [47, 48] (Fig. 1C). The corresponding five ABSs are reported in other subtype IAVs, particularly in subtype H5 viruses, which have been associated with rapid antigenic drift [49]. Three ABSs overlap with HA receptor binding sites (RBSs), including the residues in the 130-loop (i.e., 135, 136, 137, 138, 153) with ABS site A, the 190-helix (i.e., 186, 190, 194, 195) with site B, and the 220-loop (i.e., 226 and 228) with site D [50]. Compared to IAVs, IBVs undergo less antigenic drift. The ABSs of IBV HA are located in the 120-loop, 150-loop, 160-loop, and 190-helix (Fig. 1D) [51].

In contrast, the NA is a homo-tetrameric glycoprotein that is responsible for removing terminal sialic acids and releasing virions from infected cells, mucous, or other substances. The enzyme active site, including catalytic sites (i.e., R118, D151, R152, R224, E276, R292, R371, and Y406) (in N2 numbering) that directly interact with the substrate and framework sites (i.e., E119, R156, W178, S179, D/N198, I222, E227, H274, E277, N294, and E425), are conserved in all NA subtypes [52, 53]. However, the antigenic sites of NA are not fully understood. Partially antigenic epitopes have been mapped by sequence changes in escape mutant selection with monoclonal antibodies and by NA-antibody complex crystallography (Table 1). The positions of amino acid substitutions surrounding the enzyme’s active site have the ability to escape NA antibody binding and inhibit NA activity [68]. Recently, positions located laterally to the enzyme active site such as positions 329 and 390 have also been associated with antigenic drift [69, 70].

Table 1 Amino acids essential for mAb binding to NA

Antigenic drift of seasonal influenza viruses

Contemporary seasonal influenza viruses comprise of four co-circulating and antigenically distinct viruses. These include two IAVs (H1N1 and H3N2 subtypes) and two IBVs (B/Victoria/2/87-like [Victoria] and B/Yamagata/16/88-like [Yamagata] lineages) (Fig. 2A). The H3N2 IAV has been co-circulating in the human population since 1968 causing the 1968 pandemic. The 2009 H1N1 IAV [A(H1N1)pdm09] has been circulating in the human population since the 2009 pandemic, replacing the H1N1 seasonal virus that had been circulating in humans for 32 years from 1977 to 2009 [A(H1N1)season1977] [73]. Additionally, IBVs have been detected in the human population since their first isolation in 1940 [74], and the contemporary Victoria and Yamagata lineages of IBVs have been co-circulating in human population at least since 1983 [75].

Fig. 2
figure 2

Antigenic evolution of seasonal influenza viruses. (A) Timeline of influenza A and B virus pandemics and circulation in humans since 1918. There have been 4 pandemics during this period, which are indicated with arrows. The 1918 pandemic was caused by the H1N1pdm1918 virus, which circulated in humans until the 1957 H2N2 pandemic. The circulating H2N2 virus was then replaced by the 1968 H3N2 pandemic virus. In 1977, H1N1 reemerged was replaced by a reassortment H1N1pdm09 strain. H3N2 has been co-circulating with H1N1 since 1977. Influenza B viruses were first isolated in 1940 and have also been co-circulating in humans at least since 1987 as two antigenically distinct lineages, Victoria and Yamagata. (B) Antigenic cartography of 39,370 seasonal influenza A(H3N2) viruses (1968–2016) (adapted from Han et al. [71]). A total of 16 antigenic clusters were identified during this time period. Antigenic cartography of 13,591 human, swine, and avian influenza A(H1N1) viruses (adapted from Li et al. [72])

Over the past 50 years, H3N2 viruses have undergone frequent antigenic drift, and at least 18 antigenic variants have been reported (Fig. 2B and Table 2) [71]. The amino acid substitutions associated with antigenic variations are located on the five ABSs (A to E), but primarily in sites A and B, in which 7 residues are neighboring the RBS (145 at site A and 155, 156, 158, 159, 189, and 193 at site B) [82]. Additionally, a single amino acid substitution in HA can sufficiently alter the antigenicity of viruses, whereas most antigenic drift events have involved amino acid substitutions across multiple positions at the same site or two different sites [82]. In addition to those in HA, the amino acid substitutions in NA can also cause antigenic drift. For example, amino acid substitutions at sites 329 and 390 of N1 led to the antigenic drift of A/Solomon Island/3/2006 (H1N1) and A/Yokohama/94/2015 (H1N1) viruses, respectively [69, 70]. The acquisition of N-linked glycosylation at positions 245 and 247 (S245N/S247T) of N2 was associated with the poor vaccine performance against H3N2 viruses during the 2017–2018 influenza season [83] .

Table 2 Antigenic drift events and associated amino acid substitutions in H3N2 (1968–2015) and H1N1 (1977–2009) seasonal influenza A viruses

Compared to H3N2 viruses, H1N1 viruses, including A(H1N1)season1977 (1977–2009) and A(H1N1)pdm09 (2009–present), undergo relatively less frequent antigenic drift events [84]. Between 1977 and 2009, at least six antigenic events were documented starting with the A(H1N1)season1977 epidemic (compared with 10 antigenic events for H3N2 virus during the same time period), and positions 156 at Sa, 192, 197, and 214 at Sb, 144 at Ca2, 77, 81, 82, and 94 at Cb, 273 and 276 at Pa, and 46 at Pb were shown to cause each of these events (Table 2) [72, 81]. Of note, Pa is located at the edge of head and stem regions (i.e., positions 53, 55–59, 61, 271–280, 285, 297, 304, under H3 numbering) and Pb is located at stem region (i.e., positions 23–25, 38–41, 45–48, 288–292, under H3 numbering) [85]. For H3N2 viruses, the HA protein mutates at a rate of 3.6 amino acid substitutions per year, compared to 2.5 amino acid substitutions per year for H1N1 viruses [76, 84]. Similar to those documented in H3N2 viruses, sites associated with antigenic variants are primarily located in ABSs within the protruding loops and helices of the HA molecule [47]. Of note, A(H1N1)pdm09 and A(H1N1)season1977 are antigenically distinct, and residues in the 153–157 regions and N-linked glycosylation sites of HA are shown to affect antigenic variations between these two viruses [86, 87]. Both A(H1N1)pdm09 and A(H1N1)season1977 are descendants of the 1918 H1N1 pandemic viruses (A[H1N1]pdm1918), and amino acid substitutions at the HA protein and N-linked glycosylation are shown to affect antigenic variations among three viruses [87]. Further attempts at characterizing antigenicity have led to the development and application of the Multi-Task Learning Sparse Group Lasso (MTL-SGL) machine learning method (discussed in a later section) by Li et al. [72] to identify key residues associated with antigenic changes for A(H1N1)season1977, A(H1N1)pdm09, and A(H1N1) swine influenza viruses. Of those 176 residues with amino acid substitutions, five at the N-glycosylation sites and 78 additional residues have been associated with H1N1 antigenicity, of which 46 residues were located at ABSs [72].

Although infrequent, antigenic drift events have also been documented in both Victoria and Yamagata lineages of IBVs. IBVs evolve more slowly than IAVs at 2.0×10−3 substitutions/site/year [88], compared to H3N2, which evolves at 5.5×10−3 substitutions/site/year, and H1N1 at 4.0×10−3 substitutions/site/year [89]. Between the two IBV lineages, the Yamagata lineage is more conserved, whereas the Victoria lineage is under greater positive selective pressure and experiences greater antigenic drift [88]. Antigenic cartography showed antigenic variations of the Victoria lineage during the period of 2002–2013, and the amino acid substitutions at the RBS, such as residues 142, 161, 218, 223, 224, and 272, were often found to be associated with the antigenic changes of IBVs [88]. Although occasionally, deletions have been detected in H3N2 and H1N1 IAVs during virus infection [90], and nucleotide deletions and/or insertions in HA and/or NA have been more frequently observed in epidemic IBVs [91, 92]. For example, among multiple genetic clades reported between the 1940s and the 1980s, multiple deletions were observed in HA proteins: compared with clade III (the precursor virus for the Victoria lineage), the viruses in clade I, which includes B/Lee/1940, the prototype strain of IBV, and the viruses in clade II (the precursor viruses for the Yamagata lineage) possessed a single amino acid deletion at position 163 on the HA protein [91]. Along with a deletion at 163, the precursor viruses of the Yamagata lineage that circulated before 1988 harbored double (positions 163 and 165) or triple (positions 162–164) deletions on HA, whereas no deletions were observed in the precursor Victoria viruses [91]. Of interests, since the period of 2011–2015, HA variants with double (positions 162–163) or triple (positions 162–164) deletions have been detected in the Victoria lineage IBVs [93].

As an RNA virus, influenza genetic variants continually emerge in the human population, and it is not uncommon to see multiple genetic variants co-circulating simultaneously. In both IAVs and IBVs, a genetic variant may become predominate in the population. Antigenic variants of seasonal IAVs have emerged from these genetic variants and co-circulate in human populations for a short period of time until one antigenic variant gradually predominates the viral population and replaces the others. For example, during the 2014–2015 influenza season, multiple genetic clades (i.e., Clades 3C.2a and 3C.3a) emerged from A/Texas/50/2012 (H3N2) (TX/12)-like viruses and co-circulated. Among them, two subclades (Clade 3C.3a, A/Switzerland/9715293/2013 [SWZ/13]-like, and Clade 3C.2a, A/Hong Kong/4801/2014 [HK/14]-like A/Hong Kong/2014) are antigenically distinct from TX/12-like viruses. Amino acid substitutions N145S-N225D-A138S-F159S and N145S-N225D-N144S-F159Y-Q311H were shown to drive the emergence of SWZ/13 and HK/14 from TX/12, and all of these amino acid substitutions are located in ABSs A or B. Thus, viruses with N145S-N225D likely served as intermediate precursors for SWZ/13 and HK/14 viruses [71]. In the following 2015–2016 season, the majority of epidemic strains were HK/14-like viruses with SWZ/13-like viruses co-circulating at a lower proportion [94].

Intra-lineage reassortments can occur among co-circulating antigenic variants for both IAVs and IBVs. Intra-lineage reassortments were likely responsible for the 1947 and 1951 A(H1N1) epidemics with the epidemic virus composition comprising of genome segments from separate phylogenetic histories when inferring separate phylogenetic trees for each gene segment [33]. Likewise, intra-lineage reassortment has been identified on the HA and NA surface glycoproteins of A(H3N2) [95]. Intra-lineage reassortment has also been identified in both Victoria and Yamagata lineages of influenza B, particularly in the HA, NA, and NP phylogenies [96, 97]. These intra-lineage reassortments of influenza viruses may contribute towards the antigenic evolution of IAVs and IBVs and cause epidemics. However, intra-lineage reassortment occurrences in H3N2 are undergoing overall negative selection, suggesting that this phenomenon may be rare [95].

Antigenic drift of avian influenza viruses (AIVs)

IAVs have been recovered from more than 105 wild bird species [98]. Migratory waterfowl, including birds in the orders Anseriformes (e.g., ducks, geese, swans) and Charadriiformes (e.g., gulls, terns, and waders), are considered to be the major natural reservoirs of IAVs [99]. Sixteen IAV HA (H1–H16) and nine NA (N1–N9) subtypes have been recovered from migratory waterfowl. Among wild birds, the prevalence of IAV infections can be up to 30% [99], and transmission typically occurs when they are exposed to viruses in the feces of infected animals [100, 101]. The antigenic evolution in migratory waterfowl may be static [102], as supported by recent studies showing minimal antigenic diversity among H3 and H7 IAVs in North American migratory waterfowl [103, 104].

Domestic poultry can be infected with non-pathogenic, lowly pathogenic, and highly pathogenic IAVs. There are a variety of HA subtypes that are less virulent and do not cause death from experimental infection. However, H5 and H7 subtypes can be highly virulent (although most H5 or H7 viruses are lowly pathogenic for poultry) and can cause a 100% death rate in experimental infections [105]. One example of a virulent H5 is the A/goose/Guangdong/1/1996 (H5N1) outbreak in 1996 [106]. The avian A/goose/Guangdong/1/1996(H5N1) was isolated from farmed geese and is the precursor for the H5N1 highly pathogenic avian influenza viruses (HPAIVs) that are currently in circulation and have spread globally to animals [106] with sporadic spillover cases to humans [107, 108].

The emergence of antigenic variants in domestic poultry has been shown to occur after escaping from immune responses generated by vaccination [109], particularly in H5 and H7 IAVs. Limited studies showed that avian influenza vaccinations led to antigenic drift of the H7N3 virus in Italy [110], H5N2 virus in Mexico [109], and H9N2 virus in Korea [111], and that H5N1 viruses rapidly evolved with increased selection in H5N1-vaccinated populations [112]. Since 2002, at least 15 countries have implemented avian influenza vaccinations for preventive and/or emergency programs [113]. Between 2002 and 2010, more than 113 billion vaccine doses were used, and China alone used over 90.9% of the total doses. For A/goose/Guangdong/1/1996(H5N1)-like H5 viruses, at least 20 antigenically different genetic clades or subclades of viruses have been reported [114, 115] and new antigenic variants are still emerging [116]. In China, similar observations, including both antigenic and genetic changes, were seen in enzootic H9N2 viruses [117,118,119,120,121,122,123,124,125,126,127]. However, whether avian influenza vaccinations facilitate the emergence of novel genetic and antigenic variants is not fully understood.

Similar to human seasonal influenza viruses, antigenic changes in AIVs are associated with amino acid substitutions in the ABSs or near the RBS of the HA protein. For example, in Italy, five amino acid substitutions (G133E, A151T, G177V, Q201H, and T112A) in H7N3 viruses during 2002–2004 were identified in the antigenic variant and parent strains, and four of them reside in ABSs [128]. Of interest, for H7N9 viruses, although there was a greater than 4-fold reduction in HI titers between the viruses in the fifth wave and those in the first wave [129], the in vitro and in vivo studies both showed that the heterologous antisera can neutralize virus infection [130]. Further analyses showed that the reduction of the HI titers is predominantly driven by the increased receptor-binding avidity determined by the amino acid substitution L226Q to red blood cells [131].

Antigenic drift of swine influenza viruses (SIVs)

Swine may play also a role in the reassortment and adaptation of novel zoonotic viruses. It is possible for a novel virus with pandemic potential to cause a spill-over event into the human population [51, 132, 133]. Influenza surveillance in swine is important for pandemic prevention and the protection of public health. SIVs not only cause outbreaks among swine, but can also be transmitted to humans, causing sporadic infections and even pandemic outbreaks in addition to the human seasonal IAVs. Domestic swine are thought to be the intermediate host for avian-origin IAV adaptations to human infections [134,135,136,137]. Both avian-like H1N1 and H3N2 viruses have been isolated from domestic swine [137,138,139]. Avian-origin IAVs of subtypes H1 to H13 have been shown to infect and replicate in swine under experimental conditions [137] as well as direct H2N3, H3N1, and H4N6 avian-to-swine transmission in nature [140,141,142,143,144,145,146,147,148,149].

Genetically diverse H1N1, H1N2, and H3N2 IAVs were detected during influenza surveillance studies in domestic swine from 2009 to 2012 [144,145,146,147]. The HA gene of H1 swine viruses were grouped into three genetic lineages: 1A (classical swine H1N1 (cH1N1) lineage), 1B (human seasonal lineage), and 1C (Eurasian avian lineage) (reviewed in [150]). The 1A lineage, which was derived from the 1918 pandemic H1N1 viruses, are detected globally and further grouped into α, β, or γ clades. The γ clade includes the A(H1N1)pdm09 virus. Of note, since the 2009 pandemic, A(H1N1)pdm09 has been repeatedly reintroduced into the swine populations and reassort with the enzootic H1 and H3 swine viruses. The 1B lineage includes the viruses spilt from humans into swine in North America, and this lineage evolved into 1B.2.2 and 1B.2.1 (formerly named δ1 and δ2 subclades, respectively). In the past decade, there were several reports of either H1N1 or H1N2 viruses in South America and Vietnam with an HA gene from the 1B lineage viruses [151,152,153,154]. The 1C lineage viruses were derived from an avian H1N1 virus, which was first detected in Europe and then Asia and has remained a predominant lineage across Eurasia [150].

In contrast to the diverse sources of swine H1N1 viruses, swine H3N2 viruses are associated with multiple introductions of seasonal influenza viruses from humans [150]. The contemporary H3N2 viruses in the European swine population (“Cluster-IV”, 1990.4 lineage) were suggested to be caused by a spillover event from humans in the 1970s [155, 156], and those in the North American swine population caused by spillover from humans during the 2010–2011 (2010.1 lineage) or 2016–2017 (2010.2 lineage) influenza seasons, which have not replaced Cluster-IV viruses in the US [150, 157, 158]. The H3N2 viruses are grouped into four major lineages, including two genetically and antigenically distinct lineages, C-IV H3 and 2010.1, which emerged and become enzootic in the past decade [158, 159]. Antigenic characterization studies demonstrate that these genetically diverse H1 and H3 viruses are antigenically distinct and show different extents of cross-reactivity in the serologic assays [160, 161].

Influenza vaccination

In humans, vaccination is the primary option for counteracting and reducing the effects of influenza outbreaks [38]. However, national influenza vaccination policies differ from country to country. For example, in the US, the annual influenza vaccination has been recommended for persons ≥6 months of age since 2010 [162], especially for the elderly, very young children, pregnant women, and those with certain chronic medical conditions [163,164,165]. China [166], Canada [167], Australia [168], Estonia, and Poland follow the same national immunization guidelines as those of the US, whereas other European countries have vaccination recommendations for more specific populations or do not recommend influenza vaccinations for healthy children (reviewed in ref. [169]). Vaccine coverage during the past few years has reached ~60% in some regions (e.g., the US, Europe, Australia, and Japan). However, in other regions (e.g., Africa), influenza vaccine coverage has been non-existent or minimal. In the US, during the 2009–2010 through 2015–2016 influenza seasons, vaccination rates varied from 43.7 to 59.3% among children and from 38.8 to 43.6% among adults [170]; in France, during the 2012–2013 through 2014–2015 influenza seasons, vaccination rates were approximately 32% in the population that is targeted for vaccinations (ages 65 years and above, pregnant women, individuals with chronic respiratory diseases, and those with other comorbidities considered to place individuals at risk for severe influenza illness), varying from 42.7 to 44.2% in those ≥ 65 years of age [171]. Low vaccination rates and additional factors have led to the wide variations of influenza vaccine performance between seasons, and reduced vaccine effectiveness against some viruses has been observed (Table 3) [176, 182].

Table 3 The adjusted vaccine effectiveness in the US from 2010–2011 to the 2019–2020 influenza seasons

The performance of vaccines is assessed by vaccine effectiveness, which is defined as reduction of influenza risk due to the influenza vaccine compared to unvaccinated individuals after adjusting for confounding factors. In the past decade, vaccines showed suboptimal effectiveness (<50%) for predominant circulating viruses in 8 of 10 seasons for a variety of reasons with a peak of 60% vaccine effectiveness (VE) during the 2010–2011 season and 52% VE during the 2013–2014 seasons to as low as 19% VE during the 2014–2015 season [183]. The low VE of the 2014–2015 influenza season was attributed to a mismatch between the circulating and vaccine A(H3N2) strains, but the vaccine strain was effective against influenza B [176]. As for the remaining seasons, during the 2018–2019 season, the overall VE was 29% with 44% VE against A(H1N1)pdm09 but only minor protection against A(H3N2), which had become the predominant influenza virus towards the latter part of the season [180]. Similarly, during the 2017–2018 (VE=38%) seasons and 2016–2017 (VE=40%), the VE against A(H3N2) viruses was less than A(H1N1)pdm09 and B viruses, which was speculated to be in part due to egg-adaptive amino acid substitutions in the HA protein resulting in poor inhibition of the circulating A(H3N2) viruses [178, 179, 184]. The 2015–2016 season (VE=48%) saw reduced VE of the live attenuated vaccine among children [177]. The 2012–2013 season (VE=49%) also had reduced VE against A(H3N2) among older adults, also speculated to be due to egg-adapted mutations, immunosenescence, or prior influenza exposures [174]. Finally, the 2011–2012 season (VE=47%) similarly saw lower VE against A(H3N2), potentially due to antigenic drift or incongruent influenza exposure history [173].

Although only available after vaccination implementation, the data for VE provide a retrospective, yet critical role in vaccine strain selection. A low VE indicating antigenic mismatch provides an opportunity to update the mismatched component(s) for the following influenza season.

Vaccine effectiveness has been shown to be affected by host factors including age, sex, comorbidities, and pre-existing immunity [185, 186], antigenic match [187,188,189], and the type of vaccine [188, 190]. Among these factors, identification of vaccine strains antigenically matching the epidemic strain is the key to a successful vaccination program [14]. The quadrivalent vaccine comprises all four co-circulating antigenically distinct strains whereas the trivalent vaccine contains A(H1N1), A(H3N2), and only one of two circulating IBVs. However, studies have shown that the inactivated quadrivalent influenza vaccines may provide limited overall benefit against influenza B illness when the IBV lineage in the trivalent vaccination matches the circulating B lineage [191, 192]. The global influenza surveillance network, which is coordinated by the WHO [193], characterizes antigenic and genetic properties of the epidemic influenza viruses as well as the prevalence patterns of these viruses, and then predicts which strains will be predominant during the upcoming influenza season [194, 195]. After the vaccine strains are determined, high yield strains of influenza viruses are often needed to be generated and optimized for vaccine production, although occasionally, wild-type strains (e.g., IBV) can be directly used for vaccine production [196, 197].

In domestic poultry, influenza vaccines are not only used for the prevention of disease and death, but also for the prevention of infection or reduction of viral replication in respiratory and digestive tracts to limit viral spread to uninfected flocks [198,199,200,201]. Currently, vaccines against H5 or H7 are available for a wider range of species including chickens, turkeys, ducks, geese, and zoo birds [200]. A bivalent H5 vaccine targeting clades 2.3.4.4 and 2.3.2.1 was replaced by a bivalent H5/H7 vaccine in 2017 [202] with a 97.9% coverage rate in Guangdong, China, and varying degrees of coverage in other Chinese provinces for a variety of bird species [203]. While there was no difference in protection between the prior H5 vaccine and the H5/H7 vaccine against H5 positive rates in humans, H7 positivity reduced by 98% [203]. Additionally, the H5/H7 vaccine has been shown to reduce virus shedding and mortality in chickens [202]. Challenges in poultry vaccination against H5 and H7 influenza viruses will encompass antigenic drift and potential cross-species transmission.

In domestic swine, vaccination is commonly used in the US. Swine vaccines are typically available as licensed commercial products consisting of culture-derived virions in crude allantoic fluid from specific-pathogen-free (SPF) chicken eggs, which are then chemically inactivated and formulated into a mineral oil emulsion vaccine [204]. Current swine influenza vaccines are strain-specific, consisting of two or more H1 and H3 isolates, and fail to induce cross-protection against genetic and antigenic virus variants [148]. Unlike human vaccines, this process avoids costly purification steps for the enrichment of the surface glycoproteins HA and NA [205], but booster vaccinations are necessary in order to achieve and maintain protective levels of systemic hemagglutination-inhibiting antibodies [206].

Selection of vaccine strains

The GISRS was established to monitor the evolution and spread of influenza viruses and is composed of 144 national influenza centers (NICs) from 123 member states, six WHO collaborative centers (WHOCC), and numerous vaccine companies [207] (Fig. 3). In the US, more than 70 participating labs in the National Respiratory and Enteric Virus Surveillance System are involved. Tens of thousands of samples are collected from patients with influenza-like illnesses year-round and sent to one of the NICs. NICs identify the type and subtype of the samples and/or isolate them, and then send representative isolates and/or the original swab samples to one of the WHOCCs. WHOCCs perform genome sequencing and antigenic analyses on these virus samples and isolates [208, 209]. Based on the epidemiological and antigenic data, the GISRS committee meets in February (Northern hemisphere) and September (Southern hemisphere) and recommends vaccine strains to be used in the next influenza season [210].

Fig. 3
figure 3

Schematic of the influenza surveillance, vaccine strain-selection, and vaccine production process. (A) The process by the Global Influenza Surveillance and Response System (GISRS) was detailed in [208]. (B) Timeline of influenza vaccine production. The GISRS vaccine strain selection, production, approval process, and distribution in both the Northern and Southern Hemispheres are shown with the corresponding time of year that each procedure is performed

After the selection of vaccine components, WHOCC generates the virus seeds of each vaccine component and issue them to vaccine manufacturers. For inactivated vaccines, IAVs require the recombination of HA and NA genes from the candidate vaccine viruses and six internal genes from the A/Puerto Rico/8/34 strain by using reverse genetics to improve growth in eggs. For live attenuated vaccines, influenza A and B viruses require reassortment between the HA and NA genes from the candidate vaccine viruses and six internal genes from a cold-adapted master donor virus(A/Ann Arbor/6/60 [H2N2) or A/Leningrad/134/17/57[H2N2] for IAVs; B/Ann Arbor/1/66 or B/USSR/60/69 for IBVs) [211]. After the growth and antigenic characteristics are qualified, the corresponding vaccine virus seeds are applied to produce vaccines. The production of the vaccine can last more than 6 months. Vaccination starts in September for the Northern Hemisphere and in April for the Southern Hemisphere.

During these meetings, zoonotic influenza viruses, especially those with pandemic risks, are also discussed to update the vaccine strains used in pandemic preparedness. The procedure is similar to those for seasonal influenza viruses, and the current viruses include subtype H5, H7, and H9 AIVs and subtype H1N1 and H3N2 SIVs, all of which are enzootic in animals and have caused sporadic spillovers to humans [212]. The US Centers for Disease Control and Prevention (CDC) and WHO developed the Influenza Risk Assessment Tool (IRAT) and Tool for Influenza Pandemic Risk Assessment (TIPRA) to assess the emergence risk and public health impact risk of a novel (i.e., new in humans) IAV [213, 214]. The IRAT scores the risk by using 10 criteria based on biologic domain knowledge (e.g., virus properties, such as changes with known molecular signatures, receptor binding, transmission potential in laboratory animals, and drug susceptibility/resistance); population attributes (i.e., existing immunity, susceptibility to infection, severity of illness, and antigenic relationship to vaccine candidates); and virus ecology and epidemiology (i.e., global distribution, infections in animals, and infections in humans). IRAT then prioritizes the risks for enzootic IAVs by using laboratory-accumulated data. Similar to the IRAT, TIPRA evaluates the pandemic risk by assessing the likelihood for a virus to cause sustained human-to-human transmission based on virus properties, attributes (i.e., population immunity) in the human population, and virus ecology and epidemiology in animals [213,214,215,216].

However, selection of an influenza vaccine is not trivial. The influenza virus has a high mutation rate, and antigenic variants continually emerge in human populations. During the 6-month window between vaccine strain determination and subsequent vaccine development, antigenic drift can occur, leading to mismatch of the recommended vaccine strains. In addition, virus isolation and antigenic characterization are labor-intensive, and it is not possible to collect all potential influenza samples and include them into the analyses. Thus, even an antigenic variant that emerges before the GISRS committee meeting, especially those in the early stages of an epidemic, could be missed because of low prevalence or limited samples included in the laboratory analyses. To overcome sampling challenges, the CDC and Association of Public Health Laboratories (APHL) developed an Influenza Virologic Surveillance Right Size Roadmap, which provides tools to optimize sampling methods, disease surveillance, response and control efforts, and policy decisions [217]. An additional challenge is that the viruses can gain adaptive nonsynonymous mutations during isolation and/or propagation [77, 218], and these resulting amino acid substitutions can complicate antigenic analyses of epidemic strains in humans.

In addition to antigenic mismatch, a selected strain could produce low yields during vaccine production. A low yield vaccine strain will require additional laboratory work such as mutagenesis or egg passage to increase their growth rate. This could significantly delay vaccine production and delivery. For example, the 2009 H1N1 seed strain was a low yield strain (available on May 27, 2009) and required five WHO labs over 2 months to prepare the strain for production (August 6, 2009) [219]. Because of this delay, vaccine-derived immunity arose only after the second wave of the pandemic. Without such a delay, the vaccine may have reduced the impact of the second wave, and it is likely that many fewer infections and deaths would have occurred. Since different strains with similar antigenic profiles could have different growth ability in eggs, selection of a high yield strain with antigenic match is an important goal of vaccine strain selection.

An ideal method for antigenic variant identification includes comprehensive virus sampling, use of clinical specimens or limited virus isolation and passaging to avoid egg adaptations, and decreased time between strain selection and vaccine production. The following section will review the conventional and recent methods for antigenic characterization of influenza viruses with a discussion of their challenges.

Antigenic characterization of influenza viruses

Serological methods commonly used in antigenic characterization

Serological tests, such as hemagglutination inhibition (HI), neuraminidase inhibition (NI), and neutralization assays and the enzyme-linked lectin assay (ELLA), are routinely used to determine antigenic properties and identify antigenic variants during influenza surveillance. However, these conventional methods require a high quantity of viruses, which are more than what are typically available in the clinical samples, and, thus, a live testing virus will typically need to be isolated and propagated in either cells (e.g., Madin-Darby canine kidney, MDCK) or embryonated chicken eggs. Table 4 compares the principles of the common serological assays used in antigenic characterization.

Table 4 Comparison between serological assays used in antigenic characterization of influenza virusesa

HI assay

The HA of influenza viruses binds to sialic acid glycan receptors [221] of red blood cells (erythrocytes) and can agglutinate erythrocytes. Such agglutination ability of influenza viruses can be used to quantify viruses by determining the maximal dilution of viruses, which can achieve visible erythrocyte agglutination. Antibodies can inhibit erythrocyte agglutination by binding to HA ABSs at the RBS. The HI assay measures how a test influenza antigen and a reference antigen (e.g., a serum for a current vaccine strain) match through the immunological reaction between the test antigen and the reference antiserum (Fig. 4A). With its simplicity and moderate throughput, HI has been widely applied (and is expected to continue being a useful tool) in antigenic analyses of influenza viruses during surveillance and vaccine strain selection.

Fig. 4
figure 4

Conventional of serological assays used in influenza antigenic analyses. (A) Hemagglutination inhibition (HI) assay. After mixing 4 hemagglutination (HA) units of virus and 2-fold serially diluted reference sera, red blood cells (RBCs) are added to the reaction. If the binding of viruses to RBCs is not inhibited by the antibody, the RBCs will agglutinate in the micro-titrate. Otherwise, the RBCs are not agglutinated, forming (1) a button or a halo when using avian RBCs or mammalian RBCs or (2) a floating pellet (when the plate is tilted) using avian RBCs. The pattern of non-agglutinated chicken and turkey RBCs are shown in the top four wells to the left column and that of agglutinated chicken and turkey RBCs are shown in the bottom four wells. (B) Enzyme-linked lectin assay (ELLA). The mixture of a predetermined amount of virus and serially diluted reference sera is added to 96-well plates coated with fetuin, a liver protein with sialic acid and galactose at the glycan terminal, and then incubated overnight at 37°C. Peanut agglutinin conjugated to peroxidase (PNA-HRP) is then added, and the PNA-HRP binds to the exposed galactose due to the removal of sialic acid by neuraminidase activity. Otherwise, the neuraminidase is inhibited by the reference antibody, and the PNA-HRP does not bind to the fetuin. Finally, the signal is detected by adding o-phenylenediamine dihydrochloride (OPD) substrate. (C) Micro-neutralization (MN) assay. Reference sera are diluted by twofold, and then mixed with viruses in a titer of 100 median tissue culture infectious dose (TCID50) per well. The mixtures are used to infect cells (e.g., MDCK cells or MDCK-SIAT cells). After 1 day of incubation at 37°C, an ELISA can be performed to detect the fixed cells using anti-NP antibodies [222]. (D) Focus reduction neutralization test (FRNT). Reference sera are diluted by twofold and added to MDCK or MDCK-SIAT cell pre-seeded plates, and viruses caused 20–85% infected cell population (ICP) are then added. After 3 h incubation at 37°C (influenza A virus) or 34°C (influenza B virus), the inoculum is removed, and the monolayers are overlaid with the culture medium containing 1.2% (w/v) Avicel (FMC BioPolymer) and 2 μg/mL TPCK-trypsin. After 22-h incubation at 37°C (influenza A) or 28-h incubation at 34°C (influenza B), an immunostaining is performed to detect the fixed cells using NP-specific antibodies, a peroxidase-conjugated secondary antibody and TrueBlue substrate. The infected cell population is imaged by flatbed scanner. The neutralization titer is expressed as the reciprocal of the antiserum dilution that reduces ICP by 80% [223]

The success of the HI assay depends on the agglutination ability of erythrocytes and thus is affected by the influenza virus–associated glycan receptor distribution on the erythrocytes used in the assay and the receptor binding properties of the testing virus. Previous studies showed that turkey and chicken erythrocytes express α2,3- and α2,6-linked sialic acids, horse erythrocytes almost exclusively express α2,3-linked sialic acids [224], and guinea pig erythrocytes disproportionately express more α2,6- than 2,3-linked sialic acids [225]. Of these, turkey erythrocytes are commonly used and are still effective in antigenic analyses of human seasonal influenza viruses, IBVs and A(H1N1) (i.e., both A(H1N1)season1977 and A(H1N1)pdm09). However, for antigenic analyses of seasonal A(H3N2) viruses, chicken erythrocytes were used until mid-1990s [226]; turkey erythrocytes were then used to replace chicken erythrocytes until the 2004–2005 influenza season [34, 227, 228]; guinea pig erythrocytes were used from the 2004–2005 through the 2013–2014 influenza seasons [229]; after the 2010–2011 influenza season, the NA inhibitor is suggested to be added when using guinea pig erythrocytes [230, 231]. Multiple amino acid substitutions of the HA RBS [40] caused A(H1N1)season1977 [232, 233] and seasonal H3N2 IAVs [234] to lose their ability to bind to different species of erythrocytes that are typically used in influenza surveillance. For example, the amino acid substitutions at residues 193, 196, 197, and 225 in A(H1N1)season1977 in 1988 or later resulted in the loss of their abilities to agglutinate chicken erythrocytes [235]. For H3N2 viruses, the amino acid substitutions at positions 190, 226, and 194 of HA were associated with the loss of the ability for the A(H3N2) viruses to agglutinate chicken erythrocytes [234, 236,237,238,239]. Since the 2004–2005 influenza season, A(H3N2) viruses acquired Asp225Asn in HA, which caused the loss of their binding abilities to turkey erythrocytes [227, 230]; this may be attributed to the inability of viral binding to short oligosaccharides terminated with sialic acids (i.e., those having only one or two N-acetyl-lactosamine repeating units), which are mainly expressed by chicken or turkey erythrocytes [240]. As an extreme example, a large portion of A(H3N2) viruses, especially clade 3C.2a viruses, have lost the capacity to agglutinate chicken, turkey, and guinea pig erythrocytes [229, 241]; thus, they cannot be antigenically characterized by HI assays. In addition, the 3C.2a A(H3N2) viruses showed different adaptation patterns when using different types of MDCK cell lines: when passaged in conventional MDCK cell lines (CCL-34; ATCC), the 3C.2a viruses may acquire amino acid changes on HA and/or NA and regain the ability to agglutinate turkey and/or guinea pig erythrocytes, which likely affects antigenic analyses, but such changes were not observed with the MDCK-SIAT1 cell line [229]. The MDCK-SIAT1 cells, which are stably transfected with human CMP-N-acetylneuraminate:β-galactoside α-2,6-sialyltransferase, with increased SA2,6Gal but decreased SA2,3Gal expression [242], have been recommended to be used for virus isolation and propagation of current H3N2 viruses.

Of interest, NA activities were then found to affect viral erythrocyte agglutination in H3N2 viruses (e.g., those between 2005 and 2009) and can affect HI results [231]. Amino acid substitutions T148I and D151G were found to be artifacts of virus isolation and propagation in MDCK cells and associated with the NA-mediated viral erythrocyte-binding [231]. Subsequent studies reported that substitutions H150R [243] and G147R [244] on the NA protein were also associated with the phenomenon. Another study suggested that the HI assay measures the effect of antibodies against NA rather than HA for A(H3N2) viruses [243]. Thus, over time, to reduce the non-antigenic effects of virus variations, the protocols for HI assays have been updated by using neuraminidase inhibitors (e.g., oseltamivir) in HI assays using guinea pig erythrocytes [231].

In addition to the animal species, the sources and even batches of erythrocytes may impact the results from HI assays in practice, causing such data to be notoriously noisy and difficult to interpret, integrate, and reproduce between laboratories [245,246,247]. In addition, implementation of the HI protocol can generate additional noise in HI analyses. In the past decades, various mechanical practices have been used to improve the accuracy and throughput of HI assays, such as Cypher One developed by Wilson et al. in 2017 [248] and High-Throughput Imaging and Visualization Equipment (HIVE) T670 by Nguyen et al. in 2016 [249] to automate image analyses for interpretation of HI assays and to remove inconsistencies caused by plate tilting. Sanchez-Cano et al. [250] recently developed synthetic erythrocytes, called synthrocytes, which can be used for HI assays instead of animal erythrocytes. They created these synthetic erythrocytes by first identifying commercial beads that express distinct sedimentation patterns based on agglutination (Sicastar-blue beads), and then, based on the study by Suzuki et al. [221] that influenza viruses differ in their recognition of sialic acid–galactose linkages, developed sialylated beads as bioreceptors to bind influenza. Tested against conventional assays, synthrocytes offered improved speed and stability for A(H1N1) and IBVs and can easily be mass-produced [250]. While the sensitivity of synthrocytes remains less than fresh erythrocytes, the improved specificity, stability, and assay time suggest that synthetic erythrocytes offer great potential for improved serological analyses of influenza viruses. Nevertheless, these synthetic erythrocytes have not been used as a substitute of animal erythrocytes in the HI assays.

NI assay

While HA mediates receptor binding, NA cleaves N-acetyl neuraminic acid from the virus, which allows the virus to be released and infect additional cells [251]. Aminoff et al. [252] developed the original NI assay in 1961 as a macro-assay. This macro-assay was then modified by Van Deusen et al. in 1983 as a micro-neuraminidase-inhibition (micro-NI) assay performed on micro-titer plates to accommodate smaller quantities of reagents and allow for increased sensitivity compared to macro-NI and increased capacity to evaluate multiple isolates at a time [253]. NA can be subtyped using the antibody specificity of NA to influenza, and the susceptibility of influenza viruses to antiviral drugs can be determined using NI assays, which work by quantifying the amount of sialic acids cleaved by NA [222, 251]. Despite the development of the micro-assay, NI assays are still limited by the vast resource requirements including the need for individual glass tubes and use of toxic chemicals including arsenite and 2-thiobarbituric acid [222, 254]. Additionally, traditional NI assays may not adequately detect NA antigenicity when antigenic drift occurs at the lateral surface of the NA head and prevents the binding of anti-NA monoclonal antibodies [70]. These challenges are the reason that NI assays are not a commonly used method for assessing influenza antigenicity.

ELLA

ELLA is an alternative method for measuring NI titers that overcomes the resource limitations of the NI assays, allowing for increased scalability and safety [254]. Rather than individual glass tubes, ELLA is performed on 96-well plates that are coated with fetuin as the NA substrate [254, 255]. Heat treatment is first implemented to remove non-specific NA inhibitors, followed by careful virus titration to optimize assay sensitivity [254]. ELLA reactivity is measured on the cell population level, and thus is more accurate when the cell population is more homogenous, but will only report the average surface a-D-galactopyranasyl groups in a heterologous cell population (Fig. 4B) [256].

Neutralization assays

Neutralization assays quantify the ability of a testing antibody to neutralize viral infection. Typically, to evaluate antigenic properties of a testing virus, a series of reference sera dilutant is first incubated with the virus for antibody-virus binding and then the mixture is inoculated in cells or chicken embryonated eggs. Viruses with similar antigenic properties are expected to have similar results in neutralization assays against the same set of reference sera. Classical neutralization assays utilized embryonated chicken eggs or primary cells to detect the reduction of virus plaque formation or virus growth inhibition [257]. Many efforts have been made to further increase the throughput and optimize neutralization assays.

To increase the throughput and shorten the detection time, Okuno et al. [223] developed a focus reduction neutralization test (FRNT) based on micro-titration plates. In this assay, residual virus infectivity was obtained by counting foci stained by specific antibodies. Micro-neutralization (MN) assays increase the throughput in quantifying neutralizing antibodies. Multiple methods have been used in determining terminal titers, such as cytopathic effect (CPE) formation in cell culture, colorimetric assays to detect the cell viability or cytotoxicity, hemagglutination assays for detection of released virus, or usage of the enzyme-linked immunosorbent assay (ELISA) to identify infected cells [257, 258]. Lin et al. [259] established an optimized FRNT to mimic the plaque reduction of neutralization assays based on the infected cell population (ICP) in a 96-well-plate format by using an imaging method for more efficient quantification. Fig. 4C shows the ELISA-based MN assay described by the WHO manual for laboratory diagnosis and virological surveillance of influenza viruses [222], and Fig. 4D shows the optimized FRNT which has been used by WHOCCs in characterizing the antigenicity of seasonal H3N2 viruses to date. Based on FRNT, Baalen et al. [260] and Jorquera et al. [78] further improved the method parameters and developed the ViroSpot MN assay and high content imaging-based neutralization test (HINT), respectively. The HINT was performed at a single infection cycle by using a low multiplicity of infection (MOI) in order to reduce antigenic mischaracterization due to viral host-cell adaptation. To further minimize the impact of host-cell adaptation, the reference ferret antisera were generated directly against viruses in the human respiratory specimens instead of the propagated viruses. The cell immunostaining images are analyzed by a high-content imaging micro-plate reader to determine the HINT titer by calculating a 50% ICP.

Different from HI assays, neutralization assays determine neutralization ability by measuring the effects of virus propagation ability and can reflect antigenic properties of not only the global head subdomain of HA but also the stalk subdomain of HA and NA. Neutralization assays can overcome the challenges of HI assays, such as receptor-binding avidity variants [261]. However, neutralization assays require the virus to propagate efficiently in cells or chicken embryonated eggs. The poor growth ability of recent H3N2 viruses has generated challenges not only in viral isolation but also the implementation of MN assays (see the section of Challenges in conventional serological assays). Although MN assays generally showed a strong correlation with HI assays [262], there are often exceptions which result in difficulty integrating HI and MN data [247]. Additionally, these assays are lower-throughput and more much elaborate and time-consuming than HI assays [261].

ELISA

Indirect ELISA was routinely used in antigenic characterization. This assay typically requires virus or a recombinant protein (e.g., HA or NA) to be adsorbed to a micro-titer plate followed by the addition of sera samples, an enzyme-conjugated secondary antibody HA, and lastly the substrate, which is used for colorimetric quantification [257]. The binding strengths of antibodies can be quantified and compared across antigens and sera. For example, Li et al. [263] coated 15 HA units of purified viruses to micro-titer plates and characterized the antigenicity of the mutant viruses with an amino acid substitution N145K. Chambers were coated with virus-like particles expressing HA proteins in the micro-titer plates, suggesting that antigenic change was caused by the F159S substitution [79]. In both of these studies, monoclonal antibodies recognizing the conserved epitopes of HA proteins were used to calibrate the viruses before running the assays so that equal amounts of viruses are used across samples in the assays. One major limitation for ELISA is that this assay could suffer from high background due to substrate contamination, poor washing steps, or cross reactivity thus can lead to false positive or false negative results. Nevertheless, ELISA can measure different isotypes and subclasses of immunoglobulins (Ig), such as IgM, IgA, and IgG [257], which are valuable for dissecting humoral immune responses. However, it has not been reported that the antigenicity of influenza viruses is associated with the Igs other than IgG.

Mass spectrometry immunoassays

To overcome the shortcomings of not providing viral molecular detail in HI assays, a mass spectrometry (MS) immunoassay was developed to evaluate the antigenicity of influenza viruses utilizing monoclonal antibodies [264]. This assay is based on a comparison of matrix-assisted laser desorption ionization (MALDI) mass spectra obtained from proteolytic digestion of the whole virus [264] or an separated antigen [265, 266] either with or without antibody treatment. The mass maps provided not only the antigenicity characterization of a specific epitope but also the primary structure of the viral antigen. The basis of MS makes it possible to not immobilize either the antigen or antibody and avoid the antigenic alteration during immobilization. The antigenic characterization of this assay showed high consistency with that of HI assays for A(H1N1)season77 viruses [267]. The challenge of this assay is that the antigenicity can be only analyzed on the epitopes targeted by the given monoclonal antibodies.

Selection of reference sera in serological analyses

The selection of reference sera is one of the keys to unbiased serological analyses. Ideally, homologous antibodies are included in antigenic analyses for all testing strains. However, this is not possible during seasonal influenza surveillance because it is not practical to generate reference sera against all testing viruses as hundreds to thousands of viruses are assessed each year. It is critical to include homologous sera for those with potential antigenic variants, especially those with amino acid substitutions at known ABSs. Thus, serological analyses in influenza surveillance involve a dynamic update of the reference sera panel.

Due to their sensitivities to both seasonal influenza A and B viruses, influenza seronegative ferrets are widely used to generate the ferret sera used in seasonal influenza vaccine strain selection [41, 268,269,270,271,272,273]. One of the key criteria for WHO to make vaccine strain update is that variant viruses are identified to have ≥8-fold reduction of cross-reactivity to the ferret sera as compared to that of the homologous vaccine virus [274]. Vaccines have also been evaluated with human post-vaccination sera for cross-reactivity with circulating variants; a reduction of ≥50% in geometric mean titers significantly indicates low vaccine efficiency [261].

However, antigenic properties derived from ferret serological data may not reflect those in the human population, which contain varying degrees and compositions of pre-existing immunity. There are dramatic differences in specificity between human and ferret antibodies, and these differences are due in part to pre-exposure history [269]. This process, termed “original antigenic sin,” or immune imprinting, was observed in humans infected with the A(H1N1)pdm09 during the 2013–2014 season [275]. The human immune system has a preference for generating antibodies with cross-reactivities of previously exposed strains at the apparent expense of generating new antibodies that specifically recognize newer strains [276]. Although the circulating A(H1N1)pdm09 viruses showed antigenically neutral amino acid substitutions when evaluated using ferret sera, they acquired a genuine antigenic substitution located on HA that escaped antibody recognition and elicited infections in a large number of middle-aged humans [275]. Using representative A(H3N2) strains circulating during 2007–2014, we conducted head-to-head comparisons of the antigenic maps derived from human and ferret serologic data and illustrated their differences in antigenic characterizations [274, 277]. In general, data from ferret sera are more comparable with post-vaccination sera from children but not with those of adults with pre-existing immunity [277].

Antigenic cartography

Antigenic cartography, a computational approach to visualizing antigenic diversity, is often used in conjunction with serological assays. Antigenic cartography was initially proposed by Smith et al. [76] as a method for visualizing the antigenic diversity of IAVs based on antigenic distances. Antigenic distances in antigenic cartography are calculated using local pairwise distances between viruses that are in close temporal proximity; typically, each horizontal or vertical gridline on the map represents one antigenic unit distance corresponding to a 2-fold difference in serologic (e.g., HI) titers, and either the horizontal or the vertical axis can be used for this calculation.

Because challenges arise due to low reactors and potential biases from missing data, we developed AntigenMap, a publicly available, novel, low-rank matrix completion-based method (http://sysbio.missouri.edu/AntigenMap) [278, 279]. AntigenMap functions by first reconstructing serologic data (e.g., HI matrices with viruses and antibodies) by using low-rank matrix completion to account for missing data and, then, using multidimensional scaling, generates a 2- or 3-dimensional antigenic cartography [278, 279]. Antigenic cartography provides an intuitive visualization and interpretation of serological data and has been used routinely in antigenic analyses and vaccine strain selection for seasonal influenza viruses, pandemic preparedness, and basic research. Two-dimensional antigenic cartography may generate biases in antigenic distance during multidimensional scaling, especially in high dimensional serological data. However, such biases can be mitigated by using a three-dimensional antigenic map [278,279,280].

Three methods have been proposed for calculating antigenic distances for influenza vaccine selection, which were evaluated by Cai et al. [280]. These methods include average antigenic distance (A-distance), which measures the average difference between antigen and antisera interaction effects of two antigens, mutual antigenic distance (M-distance), which measures the distance between two clusters of antisera, and largest antigenic distance (L-distance), which uses the maximum difference between antigen and antisera interaction effects between antigens. These measurements were combined with a metric multidimensional scaling (MDS) method, which indirectly calculates antigenic distance as a Euclidean distance to minimize error and projects the antigens onto a graph. Cai et al. [280] determined that the most robust method for measuring antigenic distances for influenza strain selection is the M-distance.

Because it is derived from serological data, antigenic cartography is dependent on the input data. Thus, selection of the reference sera is extremely important. Biases in the reference sera can skew coordinates of antigens in the map and lead to misinterpretations of the antigenic relationships among the testing antigens. In addition, as mentioned above, potential biases in antigenic cartography may arise when used with high dimensional serological data. Thus, although antigenic cartography is a very useful and convenient tool, cautious interpretation on antigenic cartography should be used, often with an examination of the original serological data.

Challenges in conventional serological assays

Overall, these conventional serological assays described above have multiple barriers that include limited facility resources, culture-adapted mutations, and poor virus viability, leading to the inability to timely detect amino acid substitutions associated with antigenic drift [269]. As mentioned earlier, these assays require virus isolates from clinic samples. Compared to the seasonal surveillance of influenza viruses, antigenic characterization for pandemic preparedness of emerging pathogens presents even more challenges. Propagation of these viruses often requires facilities with biosafety level (BSL)-3 or BSL-4, which are difficult to obtain. Thus, most specimens must be shipped to laboratories with appropriate biosafety containment resources, which are often accompanied by administrative barriers, especially when shipping between countries.

Once the samples have reached an adequately equipped facility, the virus isolation process is not only time-consuming, but also vulnerable to emerging culture-adaptive mutations during viral propagation in either cells or embryonated eggs [281,282,283,284,285]. Since 1993, antigenic and biochemical differences in the HA proteins of influenza viruses have been noted after passage in both MDCK and eggs [232]. Growth in cell culture spiked with non-immune horse serum has produced altered patterns of agglutination for influenza viruses that are propagated in comparison to the original isolate [232, 286,287,288,289,290,291,292,293,294]. Additionally, sequence analyses have shown that changes in the H3 HA residues occur in the antigenic sites: residue 137 (antigenic site A), 156 (B), 186 (B), 248 (D), 276 (C) [287]. Both monoclonal and polyclonal antisera can detect a difference in antigenicity between HAs before and after passage in eggs [232]. The altered viruses negate the goal of choosing vaccine viruses which are likely to be circulating during the impending influenza seasons. For example, vaccine effectiveness in the 2012–2013 influenza season was only 49% in part due to egg adaptations in the A/Victoria/361/2011 vaccine strain (IVR-165) [174, 270].

Recently, viruses have also evolved to alter its receptor binding properties towards the various sialic acid receptor types of erythrocytes [239, 295,296,297], which includes losing the binding ability of human viruses to chicken erythrocytes [234, 236]. This has resulted in diminished effects of NA enzymatic activities in virus HI [235, 298, 299], further making HI assays not ideal for antigenic analyses. The limited growth ability of the influenza viruses in MDCKs [300] and/or embryonated chicken eggs without culture-adaptions [301], particularly the H3N2 and H1N1 human seasonal IAVs, two cells conventionally used in virus isolation, further challenges the usefulness of conventional serological assays. Thus, a clinical sample-based assay is needed. These barriers of high resource requirements, culture-adaptations, and poor virus growth often prevent timely and accurate surveillance and response to influenza outbreaks [302].

Antigenic characterization using clinical samples

A typical human patient with an acute seasonal IAV infection will have a detectable viral load of ~106 TCID50/mL in their nasopharyngeal swab samples [303], while viral loads in the swabs from those with asymptomatic infections can be much less [304]. Two potential approaches can be used to determine influenza antigenicity using clinical samples: (1) antibody-based assays, which are sensitive and require a low amount of influenza viruses equal to or less than the amount collected in the clinical specimen; and (2) sequence-based assays. Since the 1990s, sequencing of influenza virus genes, particularly HA, NA, and MP, has been routinely used in influenza surveillance. Recent advances in sequencing technology allow us to rapidly capture the genetic changes in influenza genomes [207, 268]. Genomic sequencing of influenza viruses requires as little as 49,350 copies of viruses in the sample (with a 95% probability of whole-genome recovery) [305] and are well documented to succeed for clinical specimens from both acute and asymptomatic infections [306].

Polyclonal serum–based proximity ligation assay (polyPLA)

We recently developed a novel quantitative PCR (qPCR)–based antigenic characterization method using a polyclonal antibody-based proximity ligation assay (polyPLA) [302, 307]. PolyPLA was developed based on a proximity ligation assay (PLA), which detects antigen–antibody interactions using monoclonal antibodies [308]. Specifically, the PLA incubates oligonucleotide-linked monoclonal antibodies with the analyte in question, and if the oligonucleotides are in close proximity, they can be ligated; the presence of analyte is then shown by the amplification of ligated products with qRT-PCR. The assay reporter signal depends on the proximity and dual recognition of each target analyte, which allows for high specificity [309].

Building on the PLA, PolyPLA is developed to quantify the antibody-antigen binding avidity by using the amplification signals in qPCR from the pairs of primers attached to a reference polyclonal antiserum (Fig. 5). Since the NP protein is antigenically conserved among influenza viruses, the ΔCt value based on the anti-NP monoclonal antibody was used to normalize the amount of intra-type viruses across the testing samples as well as those reference viruses used in the analyses. Therefore, the polyPLA units derived from polyclonal serum after normalization (i.e., NP monoclonal-based ΔCt value) are determined as the antigenic properties for the equal amounts of viruses and thus are comparable across the testing samples.

Fig. 5
figure 5

A diagram illustrates polyPLA. PolyPLA quantifies antibody-antigen binding avidity. Reference polyclonal antiserum (for antigenic analyses) or anti-NP monoclonal antibody (mAb) (for normalization) is biotinylated and then labeled using sodium azide-linked oligonucleotide probes. The labeled polyclonal antiserum or monoclonal antibody is incubated using a reference (virus) or testing antigen and ligated with the two oligonucleotides linked to the antibodies are ligated followed by qPCR, which is used to determine the amplification signals and quantify antibody-antigen binding avidity. The resulting cycle threshold (Ct) values of the polyclonal antisera and antigens are normalized by those by anti-NP monoclonal antibody and antigens and then analyzed for antigenic differences, and the normalization will ensure the equal amount of antigens in antigenic analyses. This figure was adapted from Martin et al. [302]

PolyPLA can detect a low virus titer of <1000 median tissue culture infectious dose (TCID50)/mL, distinguish between different IAV HA subtypes, and effectively identify antigenic variations within the same IAV HA subtype [302, 307]. Antigenic profiles determined by polyPLA have been validated to be consistent with those from HI and neutralization assays [302, 307]. In addition, antigenic maps derived from HI assays and polyPLA using the same set of viruses and sera were concordant [307]. Fig. 6 shows the correlation between the polyPLA and HI data obtained for 19 A(H3N2) clinical samples and 3 reference viruses by using 3 ferret reference sera. Correlation coefficients determined by linear regression showed that the titers between polyPLA units and log2(HI) had a coefficient of R = 0.8196 (p<0.0001). None of the 5 testing sera reacted with A/California/04/2009(H1N1) in a HI assay or a polyPLA. An eightfold increment in HI titer is correlated with a 3.26-fold increment in polyPLA units [307].

Fig. 6
figure 6

Correlation between polyPLA and HI titers. The Pearson correlation coefficient analysis was performed on paired polyPLA and HI titers for 19 H3N2 viruses against three reference sera, which were adapted from Martin et al. [302]

The polyPLA can address those aforementioned challenges in HI assays (dependence on types of erythrocytes and high noise) and in neutralization assays (labor intensiveness). Different from HI assays, the polyPLA detects antigenic variations in both HA and NA proteins. The polyPLA only requires a small volume of clinical samples (e.g., 2 uL), and this enables the feasibility to include replicates and test multiple reference sera for the same clinical samples. Because it is based on a common qRCR platform, polyPLA can be implemented in large-scale analyses with the same biosafety requirement as those typically used in diagnosis, e.g., BSL-2. Similar to most of those conventional serological assays, the polyPLA will work the best after narrowing down the virus subtype for the analyses, and such a challenge can be overcome by combining the analyses of the virus subtype(s) in the clinical samples through genomic sequences.

Sequence-based antigenic analyses

In the past three decades, sequence-based analyses, particularly those on the HA, NA, and MP genes, have been used routinely as an important component of influenza vaccine strain selection (Fig. 3). The sites affecting influenza antigenicity are primarily located in the head structures of the HA protein (Fig. 1) [45, 310, 311], and one or several of these antigenicity-associated sites frequently change during antigenic drift events [76, 271, 312,313,314]. Thus, these analyses have primarily focused on amino acid substitutions at the reported epitopes in the HA protein and drug resistance markers in these three proteins. These substitutions are typically mapped into phylogenetic trees to understand their evolutionary relationships and prevalence trends in the viral population. However, a comparison of antigenic and genetic maps showed that the antigenic impact of genetic changes varies. Thus, not all amino acid substitutions in HA sequences, including those at ABSs, equally affect antigenic properties [315, 316]. An ideal sequence-based strategy for vaccine strain selection is to quantify antigenic distances directly using protein sequences (Fig. 7).

Fig. 7
figure 7

Evolving analytic platform for vaccine strain selection and vaccine development. The conventional platform involves isolating viruses from clinical samples, and the viruses are used in antigenic analyses. Due to the labor intensiveness in laboratory efforts, typically only a small set of samples can be analyzed. In the past decade, the advances in genomic sequencing can allow us to quickly sequence viruses using clinical samples. To determine viral antigenicity, virus isolates are still needed. These sequences can be used to guide selection of samples in virus isolation and virological analyses. As a next-generation platform, we would sequence protein and glycans from clinical samples, and antigenicity would be determined by these sequences. The next-generation platform is expected to be higher throughput and can minimize sampling biases. The computational tools correlating antigenicity (and other vaccine strain required phenotypes) and these sequences would be available. Ideally, in this platform, the big data and artificial intelligence-based tool would be able to forecast antigenic evolution

A few attempts at simple machine learning methods have been implemented to identify influenza antigenicity–associated sites using a small set of serological data. For instance, Lee and Chen [317] developed a simple correlation method between HI titers and the number of amino acid changes between test viral HA and reference viral HA. Liao et al. [318] applied multiple logistic regressions between amino acid substitutions and HI values. Most recently, Huang et al. [319] developed a decision tree algorithm to predict variant drift by deriving association rules from HI data based on information theory. However, these earlier attempts did not consider the challenges of missing values and low reactors, which are commonly seen in serological data [278].

To overcome these challenges, we developed a set of sparse learning methods to identify antigenicity-associated residues by using serologic data and a quantitative function to identify antigenic distances using HA and NA proteins. Sparse learning methods have the advantages of efficiency and generalizability when using a small number of non-zero elements [320]. When data is limited, promoting sparsity has been shown to produce robust models that generalize well to extrapolated data [321]. Thus, sparse learning is suitable for this problem, which suffers from a relatively small data size and high noise levels. We formulated this as an optimization problem that measures the correlation between the antigenic distance changes in HI and NI data and the antigenic profiling by using a scoring function to characterize the number of amino acid substitutions in protein sequences. Structural and physiochemical features were also integrated into the scoring systems. This sparse learning algorithm effectively identifies antigenicity-associated residues in H5N1 [322] and H3N2 [323, 324] viruses. We then integrated antigenic mapping and machine learning via bootstrapped ridge selection using Antigen-Bridges [80]. This model identified 39 antigenicity-associated amino acid positions and, as a first, quantified antigenic distances on the basis of genetic distance of influenza A(H3N2) HA1 sequences from 1968 to 2009 [80].

After noticing the contribution of co-evolution in antigenic drift, we further developed AntigenCo to identify and quantify both single and co-evolutionary amino acid substitutions driving antigenic drift with higher accuracy [324]. To further identify the synergistic effects of multiple amino acid substitutions on antigenic changes, we further developed a generalized hierarchical square model (GHSM) [325]. GHSM enforces sparsity under hereditary structures in the interactions among the covariates and effectively incorporates the feature space up to the 5th-order interactions. We also developed a multi-task sparse learning algorithm and showed that a multi-task formulation can help solve challenges in integrating data from different platforms [71]. Based on this multi-task platform, we developed another algorithm, MTL-SGL [72], which can integrate different groups of features (e.g., amino acids and N-glycosylation). MTL-SGL enables us to identify key intra- and intergroup features by utilizing the l1 regularization term on the feature weights to force sparsity regularization on intragroup features and by utilizing the group structured l2 regularization to force sparsity regularization on the group of the feature weights. Antigenic maps for seasonal H1 and H3 viruses are developed for these viruses (Fig. 2B and C). Using sequence and N-glycosylation feature types, genetic determinants that affect influenza antigenicity were identified based on the weights of each feature. Compared with two other conventional single-task (LASSO and SGL) and multi-task models (ℓ1, 2 MTL and ℓ1,∞ MTL), MTL-SGL achieved the best performance with the lowest root mean squared error, highest accuracy, and highest sensitivity.

A few other attempts have been made to integrate antigenic analyses with phylogenetic analyses. Neher et al. [326] described sparse learning models based on the phylogenetic tree structure or amino acid substitutions to infer their antigenic properties, which also considers changes in avidity and serum potency, and demonstrated that the two models showed similar prediction accuracy and could be applied to all circulating subtypes. In addition to sparse learning algorithms, regression and Bayesian models have been introduced by treating the amino acid substitutions as features and the serological data or antigenic distances between sequences as responses [86, 316, 318, 327,328,329,330,331,332]. Steinbruck and McHardy [316] described a computational “antigenic tree” method by using nonnegative least-squares optimization to map pairwise antigenic distances onto the branches of a phylogenetic tree. Bedford et al. [332] implemented a Bayesian approach to combine antigenic cartography and phylogenetic information about the seasonal influenza viruses.

Harvey et al. [86] integrated sequences on the non-conserved and exposed HA surface and corresponding HI data for seasonal A(H1N1) viruses before 2009 using regression models. These models identified and quantified the impact of 18 determining substitutions and allowed for the prediction of antigenicity by HA sequence data with improved accuracy [86]. Cui et al. [328] combined multiple linear regression and physicochemical changes of 18 key amino acid positions to infer antigenic variants of A(H3N2) viruses with a low false-positive rate. By comparison, Rahman et al. [329] showed that combining a non-linear regression model with a scoring method based on the biochemical properties of amino acids for the specific antigenic areas on HA resulted in the best prediction.

To extend the multiple subtype prediction, Zhou et al. [333] proposed a Context-Free Encoding Scheme method to predict cross-subtype antigenic patterns by integrating them with a random forest classifier and to predict the antigenicity of different subtypes by using transfer learning. Yao et al. [334] emphasized that substitution metrics reflecting variant properties of amino acids are critical for improving the prediction performance of models and combined top substitution metrics into the random forest algorithm, which they named Joint Random Forest Regression (JRFR), and this method showed an improved prediction of antigenic variants of A(H3N2) HA1. Yin et al. [335] constructed a stacking model combined with residue-based, regional band-based, and epitope region–based feature extraction methods to predict the antigenic variants of both pandemic and epidemic A(H1N1) viruses. This model showed the capability of determining antigenic variants with an accuracy of 0.908 [335].

To improve predictions of H3N2 antigenic evolution, Suzuki et al. [327] evaluated de novo mutations by integrating HI titers with volume, isoelectric point, solvent accessibility, and distances from the RBS and N-linked glycosylation sites to quantify antigenic distances and while improved, prediction accuracy was still low. Du et al. [336] built the PREDict Antigenic Cluster (PREDAC) machine learning model integrated with network clustering to infer antigenic clusters of strains from HA sequences at the population level with an accuracy of 89.24–89.70%. HA antigenic similarity was predicted with a naïve Bayes classifier from 12 structural and physicochemical features of HA sequences, and 17 dominant A(H3N2) antigenic clusters were identified in mainland China between 1968 and 2010 [336]. This method was subsequently expanded to A(H1N1), A(H5N1), and all influenza subtype viruses [337,338,339]. Qiu et al. [340] designed a structure-based antigenicity scoring model by integrating a position-specific scoring matrix (PSSM) profile with local environmental change which fully incorporated the structural context of HA proteins to calculate the antigenicity. This model showed an accuracy of 0.875 for the antigenic inference of A(H3N2) [340]. To further predict the antigenicity of different pathogens, they established a new model, Conformational Epitope (CE)-BLAST, by comparing the conformational epitopes directly to suggest the relative antigenic distance between antigens. This model was validated using influenza and dengue experimental data [341].

In summary, antigenic characterization with direct utilization of clinical samples can overcome the challenges of culture adaptations during virus isolation and propagation. Both polyPLA and sequence-based analyses are high throughput but complementary. Genomic sequences can at least provide initial subtypes of viruses in the samples and help narrow down the targets for the polyPLA assays, and the polyPLA assays can help develop and refine those sequence-based computational models.

Predicting antigenic evolution of influenza viruses

Predicting antigenic evolution of influenza viruses using neuralization assays

The quasi-species nature of influenza viruses can lead to mutants that have escaped immune pressure to become predominant in the population after passages of viruses have been incubated with immune or convalescent sera in vitro or in previously immunized animals in vivo. A body of research utilized this classical approach to select virus escape variants and to aid the prediction of virus evolution [297, 342,343,344,345]. The classical approach is low throughput and constricted to individual mutants, and thus new high-throughput methods have been applied to this field.

Pioneering work done by Li et al. [346] involved the selection of antigenic variants from HA globular head random mutation libraries of A(H1N1)pdm and A(H3N2) viruses either in vitro under the pressure of human or ferret convalescent sera or in previously immunized mice. This proof-of-concept study identified the antigenic escape variants that showed similar antigenicity with variants that were observed in nature for A(H1N1)pdm and past A(H3N2) viruses. Furthermore, this study identified escape variants before they caused the epidemic in 2014-2015 [346]. On the other hand, the non-antigenic effects of amino acid substitutions, such as viral growth, also play a critical role in virus evolution.

Hence, Lee and colleagues [347] quantitatively characterized the effects on viral growth of all single amino acid substitutions to the HA of A(H3N2) by using deep mutational scanning. After deep sequencing, the amino acid preferences at each site were revealed and the results showed that beneficial amino acid substitutions generally tended to occur at higher frequencies in nature [347].

Predicting antigenic evolution in silico

Not all antigenic variants can succeed in fixing and spreading throughout a population (e.g., K158R) [348]. In many years, clades that are antigenically more distant from previously circulating viruses have died out [326]. Therefore, besides antigenicity, it is critical to include virus variants in vaccine strain selection that are predicted to become predominate in future influenza seasons using the surveillance and genetic data of the current season [349]. In other words, accurate forecasting would aid vaccine strain recommendations that must be determined one season in advance.

Prediction of influenza evolution requires the identification of viral clades/lineages coexisting in influenza population genetics and derivation of phylogenies from sequence data [208]. Then, the destiny of each clade can be predicted by estimating its fitness. In an earlier study, 18 HA1 codons under positive selection were identified based on the ratio of non-synonymous-to-synonymous mutations (dN/dS) on the trunk of a phylogenetic tree. The strain with the greatest number of substitutions in the positive selection sites determined the future dominant lineages of A(H3N2) [350]. However, the dN/dS ratio lacked sensitivity and was uninformative when applied to individual sites and for detecting selection pressures within a population [351].

Rather than using the dN/dS ratio, Steinbrück and McHardy described allele dynamics plots (AD plots) for visualizing the evolutionary dynamics of the different alleles of a gene within the population over time and identifying the alleles that might be associated with a selective advantage. With the application of this model in the HA of A(H3N2) isolates between 1998 and 2008, AD plots allowed for the correct identification of the alleles and their associated viral strains that subsequently became predominant in the viral population in four out of five test seasons [351]. To take the antigenic impact of the selected allele into account, they further combined AD plots with an antigenic tree [316] to estimate whether antigenically distinct HA alleles and the associated viral strains would become predominant within one season. This method predicted the predominant HA allele over nine influenza seasons with 78% accuracy and is currently used for recommending candidate vaccine viruses [352]. Klingen et al. [353] described Sweep Dynamics (SD) plots, an extension of AD plots, which combines phylogenetic algorithms with statistical techniques to detect the statistical significance of allele dynamics and can better dissect the influence of individual changes. The SD plots identified sweep-related changes in antigenic sites of A(H3N2) HAs that allowed for the timely prediction of antigenic variants [353].

Luksza and Lassig [349] described a fitness method by combining a susceptible-infectious-recovered (SIR) model with a mutational load model based on epidemiological data and HA sequences, respectively. Strain fitness is determined by similarity to past and presently circulating strains in epitope and non-epitope sites. This method can successfully anticipate the frequency of existing clades of A(H3N2) and other subtype viruses in the next year and can estimate how vaccination affects the course of influenza evolution [349]. In contrast, Neher et al. [354] developed a lineage fitness model based on local tree shape without using molecular data. This model used the simple assumption that an internal node with high fitness will be the root of a descendent lineage with high fitness in the genealogical tree. The fitness of different lineages was estimated with a growth rate measure derived from the branching patterns of the HA genealogy. This model was validated to predict the progenitor lineage of subsequent seasons using simulated and historical A(H3N2) data [354].

Castro et al. [355] identified early indicators of predicting cluster evolution and quantified fundamental trade-offs in prediction ability by using a phylodynamic model of influenza transmission. Their statistical logistic regression models can predict whether the emerging cluster at low frequencies will eventually rise to dominance using simulated data. However, the method achieved only 56% sensitivity when applied to 12 years of empirical influenza surveillance data [355].

In summary, both the experimental and computational prediction approaches shed insights into the understanding of the evolution patterns of influenza viruses. However, these approaches are challenged by our incomplete understanding of the complexity in influenza virus-host interactions at the individual, community, and population levels, particularly with the diversity and complexity of human immune histories and immune responses. Nevertheless, these prediction models have the potential to inform or improve the vaccine strain selection, and the accuracy and prediction range of these models to improve the predictability of influenza evolution remain to be improved.

Continued challenges in influenza antigenic variant analyses and vaccine strain selection

Worldwide epidemiological surveillance is necessary to monitor the continuing evolution of influenza viruses. The emergence and spread of antigenic variants will impair influenza vaccine effectiveness. Early detection of influenza antigenic variants is a key to a successful influenza vaccination program. Remarkable advancements have been made for influenza antigenic characterization techniques in the past decades, and these techniques were summarized in prior sections. However, the ongoing challenges discussed for influenza antigenic variant analyses and vaccine strain selection remain to be resolved.

These continued challenges can be summarized as the following: (1) The time required for the current process of vaccine production creates a window of opportunity for new virus variants to emerge and occasionally cause decreased vaccine effectiveness. The use of computationally predictive models, genomic and proteomic sequencing, and new vaccine platforms such as the mRNA vaccine (see later section) may help decrease the time from strain selection to vaccine production. (2) Culture-adapted mutations during virus isolation and propagation continue to present a challenge in antigenic characterization, seed preparation, and manufacturing. Methods that directly use clinical samples, including both assay- and sequence-based methods, are expected to play more important roles. (3) Selection of reference sera in antigenic characterization continues to be a challenge to reflect the heterologous and dynamic pre-existing immunity in the human population. (4) The majority of methods for antigenic characterization have been limited to protein sequences but not glycan sequences, and the conventional vaccine design does not consider site-specific glycosylation occupancy and glycan heterogeneity [356], which can affect antigenic properties of influenza viruses [83]. Thus, to robustly analyze antigenicity, integrating changes in N-glycosylation sites of influenza glycoproteins is necessary. Recent advances in mass spectrometry technologies allow us to use glycoproteomics and bioinformatics approaches to determine the glycosylation profiles of HA proteins and include site-specific glycosylation occupancy and glycan heterogeneity at each site [357,358,359,360,361,362,363], and such analytic method needs to be refined to be more sensitive and quantitative [364]. In addition, a robust tool is needed to predict influenza-specific N-glycosylation sites, associated glycan heterogeneity, and their impacts on influenza antigenicity for vaccine strain selection and vaccine preparation. (5) We are still unable to predict antigenic evolution or antigenic variants ahead of influenza seasons. Robust computational models, such as those using Big Data, AI, and machine learning models, remain to be developed. An ideal model will be able to predict the emergence and spread of an antigenic variant and its potential to become an epidemic in human populations at least one influenza seasonal ahead.

It is important to highlight that, despite these continued challenges, the advancements in influenza antigenic characterization techniques are effective and have proven to be invaluable in the rapid characterization and vaccine development during the coronavirus disease of 2019 (COVID-19), which is discussed in the following section.

SARS-CoV-2

Introduction to coronaviruses and SARS-CoV-2

In contrast to influenza viruses, coronaviruses belong to order Nidovirales, family Coronaviridae, and subfamily Coronaviridae, which is further subdivided into four genera: alpha, beta, gamma, and delta coronaviruses (CoVs) [365, 366]. Alpha and betacoronaviruses often infect mammalian species, while gamma and deltacoronaviruses primarily infect avian species [367]. Coronaviruses are non-segmented, enveloped, positive-sense, and single-stranded RNA viruses that contain the largest genomes of RNA viruses spanning between 28 and 32kb [365, 366]. The coronavirus genome is flanked by a 5’ cap and 3’ poly-A tail and 5’ and 3’ untranslated regions (UTRs). The 5’ region is comprised of a 5’ replicase gene, which synthesizes nonstructural proteins (NSPs) within open read frame (ORF)1ab [368], and the 3’ region is comprised of genes that encode four main structural proteins. The remainder of the genome is comprised of genes encoding an additional six accessory proteins [369]. The most abundant structural protein is the membrane (M) protein, which is made of three transmembrane domains, providing the virus its shape and plays a role in the assembly of viral particles [366, 370]. The trimeric spike (S) protein is a fusion protein that mediates viral attachment and fusion to the host receptor, making it the primary target of therapeutics and neutralizing antibodies [369]. Finally, the envelope (E) protein facilitates viral assembly and release and is highly divergent among coronaviruses, and the nucleocapsid (N) protein allows for binding of viral RNA in vitro [366]. Lineage A viruses of the betacoronaviruses genus also contain hemagglutinin-esterase (HE), a structural protein that acts as hemagglutinin, binds to surface glycoproteins, and may enhance S protein-mediated cell entry [366].

The majority of coronaviruses infect animals (i.e., livestock, birds, and other mammals), which become intermediate host reservoirs. Animal coronaviruses often infect livestock and poultry and create disruptions in both industries [371]. In animals, coronaviruses generally result in enteritis or upper respiratory disease [366]. During antigenic shift events, coronaviruses can gain the ability to infect humans. There are seven endemic human coronaviruses (HCoVs) that have been identified. The HCoV-NL63 and HCoV-229E alphacoronaviruses likely originated from bat reservoirs, while the HCoV-OC43 and HCoV-HKU1 betacoronaviruses likely emerged from rodent-associated viruses [367]. Recently, HCoV-229E has been shown to be more likely transferred from dromedary camels rather than bats [367]. These four endemic coronaviruses typically cause only mild upper or lower respiratory or gastrointestinal symptoms and are responsible for the seasonal common cold [372].

In addition to the four seasonal coronaviruses, there have been three instances of betacoronaviruses that have caused severe human diseases. The first outbreak occurred during the 2002–2003 season in China, caused by the SARS-CoV (27.9kb), was called the severe acute respiratory syndrome (SARS) with a 9% mortality rate, particularly in the elderly, and was widely accepted to have emerged from bats in China [366, 373]. However, the SARS-CoV was primarily spread through direct contact with infected individuals, and, thus, transmission was relatively limited and could be controlled by quarantining infected individuals [366]. During the early stages of the pandemic, the SARS-CoV genome had greater diversity and higher rates of nonsynonymous mutations than the later isolates, suggesting adaptation to the human host towards the beginning of the pandemic [373]. The functional receptor for SARS-CoV is the angiotensin converting enzyme 2 (ACE2) receptor, which binds to the S1 domain of the S protein and allows for viral replication [374]. Downregulation of ACE2 has been associated with acute lung injury [375, 376]. The next epidemic emerged in the Middle East in 2012 due to the Middle East respiratory syndrome-CoV (MERS-CoV, 30.1kb), and had a mortality rate of 36% [366, 377]. MERS-CoV is believed to have originated from dromedary camels, function through the dipeptidyl peptidase 4 (DPP4) receptor [366], and transmit through nosocomial contact, limiting its transmission potential [376]. Finally, the SARS-CoV-2 (29.9kb) [378] emerged in China in 2019 and caused the coronavirus-2019 disease (COVID-19) global pandemic and over 4.3 million deaths worldwide as of August 13, 2021 [379]. Similar to SARS-CoV, SARS-CoV-2 is suspected to have originated from bats [380, 381] and functions through ACE2 receptors which allows for efficient human-to-human spread [382, 383], but is transmitted by respiratory secretions, contributing to its rapid transmission [384, 385].

SARS-CoV-2 and antigenic variations

Antigenic relationships and known antigenic changes among CoVs

The genome order for coronaviruses is commonly 5’ replicase, S, E, M, N, accessory genes, and 3’ polyA sequence [386]. While structural and nonstructural genes remain relatively conserved between coronaviruses, the accessory genes vary [368]. Among the three pandemic HCoVs, SARS-CoV and SARS-CoV-2 are more genetically similar than MERS-CoV. SARS-CoV is 29,727 nucleotides long with five NSPs and eight accessory proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8a, ORF8b, ORF9b), SARS-CoV-2 contains 29,903 nucleotides and 15 NSPs with six accessory proteins (ORF3, ORF6, ORF7a, ORF7b, ORF8, and ORF9), and MERS-CoV contains 30,119 nucleotides and 16 NSPs with five accessory proteins (ORF3, ORF4a, ORF4b, ORF5, and ORF8) [368]. These variations in the accessory genes have been suggested to cause the differences in pathogenicity between coronaviruses. For example, in SARS-CoV, NSP1, papain-like protease (PLpro), NSP7, NSP15, ORF3b, M, ORF6, and N proteins have been shown to antagonize interferon (IFN) response [387].

In contrast, several proteins in SARS-CoV-2 have been shown to antagonize various sites in the IFN-I production and signaling pathways. ORF6, ORF8, and N have been implicated in SARS-CoV-2 to inhibit type I interferon (IFN-α and -β) expression, the NF-kB-responsive promotor, and the activation of IFN-simulated genes [388]. NSP6 and NSP13 delays IFN-I responses by targeting IFN regulatory factor 3 (IRF3) and antagonizing IFN-β production. They also bind TANK binding kinase 1 (TBK1) and suppress STAT1/STAT2 phosphorylation, leading to decreased IFN-stimulated gene formation. Additionally, Xia et al. [387] showed that NSP1 and NSP6 of SARS-CoV-2 can more efficiently suppress the IFN-I signaling than SARS-CoV and MERS-CoV.

SARS-CoV-2 shares 83.9% similarity in the receptor binding domain (RBD) and 87.2% similarity in the spike glycoprotein with SARS-CoV [389]. Kumar et al. [389] also identified novel glycosylation sites (NGTK, NFTI, NLTT, NTSN) on the spike glycoprotein of SARS-CoV-2 and multiple novel cytotoxic T lymphocyte epitopes compared with SARS-CoV, changing its binding capacity to host receptors and infectivity into the host cell. These variations may have led to the elevated transmission rate and severity of COVID-19.

Antigenic epitopes of spike

The RBD of the spike protein is of particular interest due to its role in mediating virus attachment to host receptors [368] and its role in binding with the human ACE2 receptors in SARS-CoV. Attachment to host receptor allows the virus to fuse with the cell membrane and enter the host cell. Lan et al. [390] showed that the RBD in SARS-CoV-2 is almost identical to that of SARS-CoV but has improved receptor binding affinity [390]. Additionally, anti-RBD antibodies have been shown to be the primary neutralizing polyclonal antibody response [391] and compete with the viral RBD binding to ACE2 receptors [392]. Amino acid changes (i.e., substitutions, insertions, and deletions) of the RBD in SARS-CoV-2 can lead to immune escape. Greaney et al. [393] performed complete mapping of these functional amino acid substitutions to the RBD of SARS-CoV-2 for ten human monoclonal antibodies and found that of the 36 RBD sites, 14 of those sites contained at least five mutants that could escape at least one of the ten tested antibodies. Greaney et al. [393] noted that the frequency of these substitutions was low, and the escape mutations generally had little impact on ACE2 binding and RBD folding. In contrast, Wang et al. [394] demonstrated that residue substitutions on the SARS-CoV-2 C-terminal domain on S1 interact with the human ACE2 receptor rather than the RBD, which also strengthens its receptor binding affinity compared to SARS-CoV. Regardless, the substitutions causing increased binding affinity of SARS-CoV-2 to human ACE2 receptors have contributed towards the increased pathogenesis of COVID-19.

Emerging SARS-CoV-2 variants

In addition to immune escape mutations, whole-genome sequencing (WGS) has allowed for the quick and comprehensive study of the rapid SARS-CoV-2 evolution and its genetic variants. In particular, amino acid substitutions in the NSP2, NSP3, and S proteins contribute towards SARS-CoV-2 virulence and transmission [395]. As more variants become predominant worldwide, there are increasing concerns regarding the potential for variants to escape the effects of vaccinations or natural immunity, worsen pathogenesis, and increase transmission. Weisblum et al. [396] demonstrated in vitro that amino acid substitutions in the RBD and N-terminal domain of the S protein have the potential for antibody evasion, although the frequency of natural occurrence for these substitutions was low. However, increasing prevalence of antibody-resistant strains may limit currently available antibody and convalescent plasma therapies.

SARS-CoV-2 variants have been rapidly emerging over the past year. By March 2020, the D614G variant, located on the spike protein, had already become predominant globally [397, 398]. D614G has been shown to be associated with increased viral loads [397,398,399] and potentially increased transmission [400], but no association with increased infectivity [397] or worse clinical outcomes in humans [398, 399]. Shortly after the discovery of the D614G variant, a new, rapidly spreading variant from the UK was identified as the WHO α (B.1.1.7) lineage containing the N501Y and 69-70del in the S gene [401]. This substitution N501Y was estimated to have an increased infectivity of 52% [402] and transmission of 75% [403]. Similar to the α lineage, the β (B.1.351) lineage contains the N501Y in addition to two other substitutions at the RBD (E484K, K417N) and became the dominant lineage in South Africa by November 2020 [404]. Additionally, the frequency of the γ (P.1) lineage rose to predominance in Brazil between September 2020 and February 2021 containing E484K, N501Y, and K417T [405]. Chen et al. [406] showed that strains containing the E484K had greatly reduced the neutralizing potential of multiple class I antibodies and even worse reductions on antibody binding when combined with the N501Y in vitro. The fourth and most recent variant of concern is the δ (B.1.617.2) lineage that has been predominant in India and contains G142D and E154K (N-terminal domain), L452R and E484Q (RBD), and P681R (furin cleavage site) [407, 408], and its clinical impacts are still being studied. While not yet fully elucidated, these emerging variants also raise concerns for possibilities for reinfections, vaccine evasion, atypical symptoms, and increased potency in younger adults and children. Continued meticulous surveillance of SARS-CoV-2 variants will be essential for containing the pandemic.

Antigenic characterization of SARS-CoV-2

The analytic methods used in antigenic characterization of SARS-CoV-2 are similar to those used in influenza viruses. The rapid mutation rate of SARS-CoV-2 requires constant monitoring of vaccine efficacy, changes in epitopes, and modifications of virus function and infectivity. Although genetic analyses can quickly identify the genetic variants, their antigenic properties must be assessed using conventional laboratory assays mentioned above. As with influenza viruses, culture-adaptations may occur in SARS-CoV-2 [409]. More data are needed to evaluate these culture-adapted amino acid changes in SARS-CoV-2 and their impacts on antigenic analyses. Similar to that for influenza viruses, an ideal method will be to assess the antigenic properties of the viruses using clinical samples. These barriers will need to be addressed as SARS-CoV-2 research expands and variants continue to emerge.

Enzyme-linked immunosorbent assay (ELISA)

ELISAs are used to detect IgM and IgG in COVID-19-positive individuals, which allows for the determination of disease state, detection of antibody titers, and identification of isotype. Antibodies can also be tracked for seroconversion in immune responses to SARS-CoV-2 over time [410]. In addition to Igs, ELISAs can detect the presence of certain peptides and proteins, including the viral S protein and human ACE2 receptor [411]. The limitations of ELISA are discussed in the ELISA assays section for influenza viruses.

Neutralization assays

Several types of neutralization assays including authentic viruses and pseudoviruses have been utilized for SARS-CoV-2 [412,413,414]. Authentic virus-based plaque reduction neutralization tests (PRNTs) and micro-neutralization (MN) assays have been well-adapted to quantitatively assess the extent that antibodies inhibit SARS-CoV-2 virus entry and replication [412, 413]. The limitation of these methods is the requirement of BSL3 facilities and suitably trained staff for authentic viruses. Limitations of pseudoviruses are discussed separately below. Although it is a common assay used in many viruses, the SARS-CoV-2 plaque formation using PRNT is time-consuming and low-throughput due to the requirements of agarose overlay and 24-well plates for the plaque forming stage as well as the process of plaque counting and data analysis [412, 413].

Micro-neutralization assays

The micro-neutralization assays, considered as the gold-standard [414], offer advantages over the PRNT by increasing throughput and reducing operation time and operator workload by the use of 96-well plates and the increased automation of washing, staining, and reading stages [412]. Manenti et al. [414] presented two detection methods of micro-neutralization assays that include the classical readout which entails measuring the CPE and a colorimetric readout using a spectrophotometer. The subjective CPE-based readout is carried out by checking the cell monolayers under an optical microscope, whereas the colorimetric readout is obtained by staining the healthy cells with neutral red dye followed by measuring absorbance at 540 nm. The comparison between the two methods showed that a well-trained operator is able to achieve a CPE-based readout in consistency with a colorimetric readout [414]. To avoid the subjective visual inspection of CPE, Amanat et al. [413] adapted the well-established ELISA-based micro-neutralization assay for other viruses such as influenza viruses to SARS-CoV-2 virus. In this assay, infected cells are immuno-stained depending on the viral NP protein, and the absorbance of the colorimetric product obtained by the reaction between substrate and peroxidase is measured by an ELISA reader. Bewley and colleagues [412] adapted the FRNT to SARS-CoV-2 viruses, which is also based on the immunostaining of SARS-CoV-2-specific proteins. The FRNT uses primary antibodies targeting the RBD on the S protein, and its readout is the count of immuno-stained foci (spots) rather than absorbance as seen in ELISA-based micro-neutralization assays. The advantage of foci counting is that it is easier to catch mishandling such as microbial contamination or other errors causing cell monolayer damage by the automatic foci counter (e.g., ImmunoSpot analyzer). Of note, a strong correlation was observed between PRNT and FRNT [412].

Pseudovirus neutralization assays

Pseudovirus neutralization assays have been developed to safely study viral entry inhibitors and neutralizing antibodies against SARS-CoV-2 and virus-host interactions without risk of SARS-CoV-2 infection. Pseudoviruses also help with accessibility for laboratories that do not have access to BSL-3 or BSL-4 facilities [415]. Three packaging systems have been used as the backbone for SARS-CoV-2 pseudoviruses: HIV-based lentiviral particles, murine leukemia virus (MLV)-based particles [416], and vesicular stomatitis virus (VSV) [417]. Ou et al. [418] and Hu et al. [419] generated a luciferase (Luc)-expressing pseudovirus that contained the S protein within a HIV-1 system. They also established a pseudovirus inhibition assay for testing two protease inhibitors that target virus entry after transfection with ACE2 receptors into the HEK293T cell line. Similarly, Nie et al. [415] and Hoffmann et al. [420] developed a pseudotyped virus expressing the S protein on a VSV backbone to quantify SARS-CoV-2 neutralizing antibodies. Case et al. [421] compared the neutralization activity between VSV-SARS-CoV-2 (spike) and SARS-CoV-2 isolates grown on Vero E6 cells, and found a strong correlation between the two tests as well as concordance between their EC50 values. Various commercial SARS-CoV-2 pseudoviruses are also available for purchase from mybiosource.com, Takara, eEnzyme, and multiple other companies. Limitations of this platform include the need to create a separate assay to test different aspects of virus mechanisms or other proteins and potential variability between the protein functions of pseudotyped virus and wild type virus [422]. While some studies as in Case et al. [421] had determined strong correlations between pseudovirus results and wild type, few studies using pseudoviruses confirm findings with wild type virus. Thus, more data need to evaluate how accurately pseudoviruses reflect the antigenic properties of wild type SARS-CoV-2 viruses.

High-throughput neutralization assays

Several high-throughput neutralization assays have been developed to meet the detection demands for the tremendous quantities of COVID-19 specimens. Muruato et al. [423] developed a fluorescent-based high-throughput neutralization assay. In this assay, the reporter virus was modified to introduce the mNeonGreen gene into the ORF7 of the SARS-CoV-2 virus genome. After being incubated with the testing sera, the reporter virus is inoculated onto Vero cells and the quantities of neutralizing titers are determined by the fluorescence signals quantified by high-content imaging. A strong correlation between this assay and the PRNT was observed (R2 of 0.85), and the reporter virus was shown to have similar replication ability with wild type virus [423]. Compared with other neutralization assays, this reporter virus-based assay can shorten the assay turnaround time by eliminating the immunostaining step and can be scalable to 384- or even 1536-well plates. Instead of a reporter virus, most other high-throughput neutralization assays use a surrogate virus and thus are called surrogate virus neutralization test (sVNT) [424,425,426,427]. Similar to conventional pseudovirus neutralization assays, sVNTs detect antibodies that are only specific to the spike protein or the RBD [428]. For example, the sVNT by Fenwich et al. [429] is developed based on the competitive inhibition of ACE2 binding to the spike protein trimer coupled with Luminex beads, and the quantification is dependent on the measurement of ACE2 recruited by the beads, which are fluorescently tagged in recognition of the ACE2-Fc fusion protein on a Bio-Plex 200 system. Compared with those assays with live viruses, the pseudovirus neutralization assays, especially those based on fluorescent pseudoviruses, are highly scalable [430, 431]. Nevertheless, compared with the conventional VN assays, one common challenge for these high throughput neutralization assays is that both systems need to update the virus component in either a reporter virus or a surrogate virus to match those of the epidemic strains, which, as an RNA virus, evolves rapidly.

mRNA vaccine and antigenic characterization of influenza viruses and SARS-CoV-2

A number of challenges were described for influenza antigenic variant analyses and vaccine strain selection, which are associated with the conventional vaccine platforms involving virus culture. Hence, rapid and flexible virus culture-independent vaccine production platforms, such as recombinant HA vaccines and nucleic acid vaccines (DNA vaccines and mRNA vaccines), are important and can potentially overcome those challenges.

Recombinant HA vaccines against seasonal influenza viruses (Flublock) have been licensed by the US Food and Drug Administration since 2013 [432]. However, they require threefold higher HA content than inactivated vaccines to provide a similar level of protection [433]. As for DNA vaccines, the main challenges are poor immunogenicity in humans and larger animal models compared to mouse models [434], eliciting of unintended antibody production that may lead to autoimmune diseases [435, 436], low specificity for target cells, poor anti-vector immunity, concerns for genome integration [437], and the requirement of high doses and devices (e.g., electroporation, aerosol, polymer condensing agents) [434, 438,439,440].

In recent decades, mRNA vaccines have emerged as a new vaccine platform. This platform began with the successful translation of mRNA injected into mice by Wolff et al. in 1990 [441], and multiple mRNA vaccine platforms have since been shown to elicit immunity against the influenza virus [442, 443], Zika virus [444, 445], rabies virus [446], and even some cancer types [447,448,449] in animal models [450]. Among the many advantages of current mRNA vaccines include its safety profile which removes the risks of potential infection or genome integration, efficient delivery to the cytoplasm rather than transfection to the nucleus as in DNA vaccines [451], stability and highly translatable nature after delivery [452], and potential for inexpensive, rapid, and scalable production [453]. The benefits conferred from mRNA vaccines (e.g., BNT162b2 [Pfizer-BioNTech] and mRNA-1273 [Moderna]) for human use was demonstrated during the COVID-19 pandemic as the fastest vaccines developed, with the aid of the Emergency Use Authorization by the United States Food and Drug Administration, at an astonishing 95% efficacy with minimal adverse effects [417]. The success of these vaccines is attributed to the use of nucleoside-modified mRNA to enhance translation capacity and reduce the innate immunity response and the unique use of lipid nanoparticles (LNPs) to improve mRNA stability and cellular penetration [454, 455]. As with all vaccine platforms, mRNA vaccines also present challenges that will need to be addressed moving forward which include thermal instability at elevated temperatures, allergic reactions to LNPs, and the unknowns of long-term properties in humans including the effects of repeated administration [417, 456]. However, the unprecedented speed and efficacy of these mRNA vaccines for COVID-19 opens the world of nanoparticle delivery systems for other infectious diseases and may become a viable option for future influenza vaccinations.

The mRNA vaccine platform can help overcome challenges in vaccine seed preparation and in vaccine manufacturing. The remaining challenges for the mRNA vaccine platforms rely on vaccine strain selection and antigenic characterization. A rapid and robust tool for sequence-based antigenicity analyses is urgently needed for influenza and SARS-CoV-2 viruses as well other potential diseases, and such tools will enable the assessment of virus antigenic properties using clinical samples directly, including swabs or genomic sequences as describe earlier [71, 72].

Conclusions

In summary, the importance of antigenic characterization for outbreak response and prevention, prediction of viral evolution, and vaccine strain selection cannot be understated. The necessity of these continuously improving techniques for characterizing antigenic properties of infectious diseases has been prominently displayed throughout the history of influenza epidemic outbreaks and pandemics. These existing methods have been modified and have become invaluable in identifying the mechanisms of infection, differences in lineages, and the unprecedented rapidity of response against the deadly COVID-19 pandemic, allowing for >90% effective vaccines to be developed within 1 year.

Currently available analytic methods for antigenic characterization include hemagglutination inhibition, neuraminidase inhibition, neutralization, and micro-neutralization assays as well as enzyme-linked lectin assays and antigenic cartography for influenza. Likewise, multiple existing methods have been modified to characterize SARS-CoV-2. Enzyme-linked immunosorbent assay, pseudoviruses, and micro-neutralization assays have been instrumental towards the rapid characterization of SARS-CoV-2 and the pandemic response. These methods are routinely used to assess the antigenic properties of emerging and re-emerging viruses and to identify antigenic variants of these viruses.

One major barrier of these current influenza characterization methods include the need for a larger quantity of viruses than what is typically available from the original clinical samples, requiring isolation and amplification in cells or embryonated chicken eggs. This presents opportunities for culture-adapted mutations of the testing strain, which may generate modifications of the virus phenotype leading to antigenic mismatch during vaccine development. Additionally, vaccine composition must be determined almost 6 months prior to peak seasonal influenza activity, allowing time for antigenic drift to occur. This is another potential cause of vaccine mismatch with the circulating strain. SARS-CoV-2 characterization faces similar challenges. In addition to the requirement of high virus quantity and potential laboratory-acquired mutations, testing of each aspect of virus function requires a generation of a separate pseudovirus. The high mutations rates of both influenza and SARS-CoV-2 also present challenges for the relevancy of each test to the circulating strains.

Ideally, antigenic characterization would include comprehensive virus surveillance and sampling, use of original clinical specimens rather than virus isolates, genome-based methods, and increased efficiency of vaccine production from time of strain selection. Current methods in development to improve the techniques for antigenic characterization include mechanical and automated hemagglutination inhibition assays, synthetic erythrocytes, the enzyme-linked lectin assay, the combination of assays with imaging to increase quantification efficiency and throughput, 3D antigen cartography techniques for visualization and for mitigating bias introduced during multi-dimensional scaling, the novel polyclonal antibody-based proximity ligation assay which reduces the sample volume requirement while improving specificity, and advanced machine learning algorithms. These improved methods will help optimize sensitivity, specificity, and assay time. While antigenic characterization of influenza is complex, there are continuous advancements and innovation in the field, which have already served to rapidly address the COVID-19 pandemic. These advancements for both fields will continue to improve the accuracy, precision, and speed for characterizing and selecting vaccine strains in influenza and SARS-CoV-2.