Introduction

Coronaviruses are a large family of viruses that are widespread in nature, exhibiting high mutation rates and significant pathogenic potential, enabling them to cause a range of human diseases1,2,3. Bats serve as natural reservoirs for SARS-like coronaviruses4, and cross-species transmission of these viruses is a critical pathway leading to human infections. This transmission often involves intermediate hosts, such as pangolins, civets, or other wild animals, where the virus may undergo adaptive mutations that enhance its ability to infect humans5,6,7. Notably, SARS-CoV-2 shares 96% whole-genome identity with a bat coronavirus and is the seventh known coronavirus capable of infecting humans8,9. Other human-infecting coronaviruses include HCoV-229E (Human coronavirus 229E), HCoV-OC43 (Human coronavirus OC43), HCoV-NL63 (Human coronavirus NL63), HCoV-HKU1 (Human coronavirus HKU1), SARS-CoV (Severe Acute Respiratory Syndrome) and MERS-CoV (Middle East Respiratory Syndrome)10, which have posed unpredictable public health threats, frequently leading to emerging pandemics11,12,13.

Beyond human-infecting coronaviruses, several coronaviruses, such as Bovine coronavirus (BCoV) and Murine coronavirus (MuCoV), play a critical role in spreading infections within the animal kingdom. Murine coronavirus (MuCoV), which primarily infects rodents, provides critical insights into broader viral pathogenesis and interspecies transmission. It is responsible for a variety of symptoms, ranging from respiratory to neurological and gastrointestinal conditions in affected animals. Mice, particularly, are commonly used as infection models for studying MuCoV pathogenesis due to their susceptibility and utility in biomedical research14,15.

Similarly, Bovine coronavirus (BCoV), also classified as a betacoronavirus, predominantly affects cattle, causing conditions such as neonatal calf diarrhea and respiratory diseases, including winter dysentery in adult cattle. BCoV and closely related bovine-like coronaviruses have been detected in various domestic and wild ruminants, such as water buffalo, sheep, goats, dromedary camels, llamas, alpacas, deer, wild cattle, antelopes, giraffes, as well as in dogs and humans. These infections contribute significantly to both economic and health impacts in livestock and wildlife populations16. The study of these non-human coronaviruses not only enhances understanding of viral evolution but also aids in mitigating zoonotic risks. In combating these viral diseases, virus-specific vaccines and antiviral drugs are among the most effective tools, playing a crucial role in mitigating their global health impact17. The SARS-CoV-2 epidemic, known for its exponential case growth, shares pathological features with its predecessors, SARS-CoV and MERS-CoV18,19,20. Common symptoms of CoV infections include respiratory distress, fever, cough, shortness of breath and dyspnea21.

The replication strategy of coronaviruses is a highly coordinated series of molecular events, beginning with viral entry and culminating in virion release. Coronaviruses replicate through a multi-step process involving viral entry, replication, and assembly within host cells. The cycle begins when the spike (S) protein of SARS-CoV-2 binds to the angiotensin-converting enzyme 2 (ACE2) receptor on the surface of human cells, facilitating viral entry. Host cell proteases cleave the S protein, allowing the viral envelope to fuse with the host cell membrane, thereby releasing viral RNA into the cytoplasm. The viral RNA is then translated using host ribosomes, and the viral RNA-dependent RNA polymerase (RdRp) synthesizes a complementary negative-sense RNA strand, which serves as a template for the synthesis of positive-sense RNA22,24. During this process, the virus exploits host proteases, such as chymotrypsin-like protease (3CLpro) and Papain-like protease (PLpro), to cleave viral polyproteins necessary for replication. Host cell machinery is utilized to synthesize structural proteins, including the spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins, as well as nonstructural proteins (nsp1-16). These components assemble within the host cell, forming new virions that are subsequently released to propagate the infection. This highly coordinated replication strategy underpins the pathogenicity and the spread of coronaviruses25,26. Understanding these intricate mechanisms is crucial for developing effective therapeutics and vaccines.

Several public databases have been established to support CoV research, including CoVDB27, ViPR28, 2019nCoVR29, COVID19db30 CoV-AbDab31, the SARS-CoV-2 3D database32, COVIEdb33, DockCoV234, COVID-ONE-hi35, SCoV2-MD36, and SCovid37. These resources offer valuable data, including sequence, metadata, structural information, biological pathways, drug screening results, and omics datasets, with tools for data retrieval and analysis. However, while these databases provide extensive information on various aspects of CoV research, they generally lack comprehensive data on the animal models used to study CoV infections. Animal models play an indispensable role in biomedical research, serving as critical tools for studying human diseases. They not only facilitate a deeper understanding of disease mechanisms but also provide essential support for drug development and the optimization of therapeutic strategies38,39. Despite their significance, the use of animal models often represents a ‘technical bottleneck’ in certain areas of research. The complexity of CoV infections, in particular, necessitates the use of a variety of animal models, as no single model can fully replicate the multifaceted nature of human diseases. Given the extensive research available on CoV infections, selecting appropriate models is both crucial and challenging. This diversity in models is critical for advancing our understanding of the pathogenesis of coronaviruses and for developing effective treatments40.

Reliable animal models remain essential for understanding CoV transmission, pathophysiology, and immune mechanisms, even as the global pandemic subsides. These models play a critical role in developing vaccines and therapeutics, allowing researchers to assess the safety and efficacy of treatments. Animal models have long been indispensable in infectious disease research, particularly in the context of coronavirus studies. They provided a controlled model system to explore viral pathogenesis, immune responses, and therapeutic efficacy, especially in cases where human studies are often not feasible or ethical. By enabling detailed investigations into viral replication, immune modulation, and potential therapeutic interventions, these models are key to the development of effective vaccines and treatments41. Despite their utility, the aggregation and analysis of animal model data have been highly fragmented. Thus, the Coronavirus Disease Animal Model Disease (COVID-AMD) database was designed to address this by providing a comprehensive platform that centralizes and integrates these data, significantly improving accessibility and enabling more efficient analysis42.

The transmissibility of coronaviruses underscores the importance of developing animal models that accurately replicate human disease. These models are crucial for assessing antiviral therapies and vaccines, as well as for advancing our understanding of the pathogenesis of infections41,43. Animal models used in COVID-19 research include non-human primates, such as Rhesus macaques(Macaca mulatta), Macaca fascicularis, common marmosets, African green monkeys and Cynomolgus monkeys, as well as transgenic hACE2 mice, SARS-CoV-2 MA infected BALB/c mice, golden Syrian hamsters and ferrets. Notably, Rhesus macaques have shown higher susceptibility to SARS-CoV-2, while African green monkeys develop more severe respiratory illness44,45,46,47,48,49. The animal models used for SARS-CoV research have mainly involved old and new world primates, including Cynomolgus, Rhesus, and African green monkeys, which exhibit high viral replication. Additionally, other species such as BALB/c mice, 129SvEv/STAT1-/- mice, C57BL/6 mice, transgenic hACE2 mice, golden Syrian hamsters, ferrets, and domestic cats have been used to study SARS-CoV50,51,52,53,54. For MERS-CoV, animal models have focused on dromedary camels, alpacas, Rhesus macaques, common marmosets, Yorkshire-Landrace pigs, New Zealand white rabbits, and various transgenic mouse models, including Ad5-hDPP4 and hDPP4 C57BL/6J mice. Naturally permissive species, such as rabbits, pigs, and camels, have also contributed to MERS-CoV research55,56,57,58,59,60,61,62.

Given the physiological and immunological differences between species, no single animal model can fully replicate human disease63. This underscores the necessity for a centralized and standardized platform to facilitate cross-species comparisons and support robust analyses of CoV pathogenesis and therapeutic responses. The COVID-AMD database serves as a comprehensive repository that systematically aggregates animal model data from diverse studies. By offering a unified platform for accessing and analyzing this data, COVID-AMD resolves the issue of data fragmentation, enabling researchers to conduct more efficient and comparative investigations. The inclusion of diverse species in COVID-AMD enables cross-species comparisons, offering invaluable insights into the pathogenesis of coronaviruses and accelerating the development of antiviral treatments. The incorporation of the above-mentioned coronaviruses provides key insights into the study of cross-species transmission, viral evolution, and host immune responses. Their integration into comparative analyses enhances our understanding of coronavirus diversity and zoonotic potential, contributing to research beyond human-pathogenic coronaviruses.

This platform systematically integrates animal model data and analytical tools, facilitating efficient data mining and analysis to support CoV-related research across human and animal systems. It currently includes, more than 869 animal models from more than 500 publications, representing 312 virus strains across 29 species. By focusing on phenotype data, the database enables comparative medicine, allowing researchers to better understand CoV pathogenesis, develop vaccines, and advance antiviral treatments.

Results

Database structure and content

The COVID-AMD web interface consisted of seven primary sections: Home, Advanced search, Models browse, Analysis tools, Download, Submission and Help. The platform offered three analytical utilities: model recommendation, comparative analysis, and omics data analysis. The homepage presented various types of SARS-CoV-2 information, including sequences from public repositories such as GISAID and NCBI statistics (Fig. 1a). It also featured a distribution map of COVID-19 literature categorized by research areas like animal model construction, vaccine development, drugs and pharmaceutical treatments, diagnostic technologies, and etiology, as compiled from PubMed between 2021 and 2024 (Fig. 1b). The site provided access to COVID-19 related animal models, updated literature reviews, species statistics relevant to model applications for vaccines or drugs, and the statistical distribution of infection routes and species documented within COVID-AMD (Fig. 1c-e).

Fig. 1
figure 1

Overview of data landscape and statistical analysis in COVID-AMD. (A) Trend of sequence Reporting: SARS-CoV-2 sequences submission to public repositories (GISAID and NCBI) illustrated via curve chart, spanning from January 25, 2020 to February 17, 2024. (B) Literature Volume by Category: Quantitative analysis of COVID-19 literature in NCBI categorized by topic such as animal model construction, vaccine development, drugs and clinical treatment, testing technology and products, etiology and epidemiology, and other related areas. (C) Literature Distribution by Species and Application: Statistics of COVID-19-related literature focusing on species affected the application of models in drug or vaccine research, as recorded in NCBI. (D) Infection Routes Distribution: Proportional representation of infection routes in COVID-AMD, with each color signifying a distinct route infection. (E) Species Affected by Coronavirus: Distribution of coronavirus infection among species within COVID-AMD, involving 20 species and 5 viruses, with each color denoting a different species. (F–H) Co-occurrence Network Analysis: Textual co-occurrence network derived from the abstract of articles addressing SARS-CoV-2, SARS-CoV and MERS-CoV, visualizing the interrelation of terms and concepts.

Moreover, a corpus text co-occurrence network analysis, focusing on CoVs (SARS-CoV-2, SARS-CoV, MERS-CoV), were derived from a bibliometric study of the collected literature. Utilizing VOSviewer (v1.6.14), three distinct networks were constructed, emphasizing the ‘link strength’ between phenotypic and infection-related keywords in abstracts64,65. Terms with a minimum occurrence threshold (three for SARS-CoV-2 and two for SARS-CoV and MERS-CoV) were curated and manually refined to remove non-relevant terms such as 'DPI,' 'PFU,' and ‘efficacy’, with 126 terms in SARS-CoV-2, 8 in SARS-CoV and 21 in MERS-CoV (Fig. 1f-h). The keyword co-occurrence network showed that research related to CoVs-infected animal models focused on ‘mouse’, ‘vaccine’, ‘S protein’, ‘lung’, ‘antibody’ and ‘immunization’, with the highest frequency (supplementary information table S1). All these frequent keywords allowed us to see that in indexed publications, mice models were used to conduct mechanism research and application experiments.

Search methodology for identifying specific CoVs-infected animal models

COVID-AMD provides a powerful search facility that allows for the rapid identification of animal models infected with various CoVs, based on multiple parameters such as species, viral strain, disease, model application, infection route, and model classification (Fig. 2a-b). The advanced search functionality enabled the integration of multiple criteria to refine the search results (Fig. 2c). The search results redirected users to a browsing page, listing all relevant models or experiments ranked according to their relevance to the search terms, thus streamlining the process for users to efficiently located models or experiments pertinent to their research needs (Fig. 2d). For example, a researcher seeking a mouse model infected with a coronavirus can select “mouse” as the species, and COVID-AMD will provide a list of mouse models infected with various coronaviruses, including details on infection routes and strain variations, allowing for more informed decision-making.

Fig. 2
figure 2

Searching for and visualizing animal models infected with Coronaviruses (A) Global Search Interface: Search functionality for CoVs-infected animal models. (B) Advanced Search Interface: Filtered search by species, virus name, infection route, or classification with corresponding input terms. (C) Multi-field Search Example: An example of a combined search using ‘mouse’for animal species, ‘BetaCoV’for virus strain, and ‘C57’for mouse strain to locate specific models. (D) Search Results Page: Display detailed information for ‘BetaCoV’ for virus strain, and ‘C57’ for mouse strain to locate specific models including species, disease, virus strain, application, infection route, classification and references. (E) Model Detail page: Shown comprehensive data for a SARS-CoV-2 intratracheally infected African Green Monkey (SARS-CoV-2/human/USA/WA-CDC-02982586–001/2020). (F) Pathogen Detection Visualization: Web interface showcasing pathogen detection results. (G) Mortality Rate Visualization: Web interface presenting mortality rates. (H) Immune Response Visualization: Web interface displaying cytokine data.

Browsing and visualization of CoVs infected animal models

COVID-AMD systematically classified and displayed animal models of CoVs infection, including those for SARS-CoV-2, SARS-CoV, MERS-CoV, and other coronaviruses. It also encompassed models used in evaluative experiments, such as vaccine efficacy studies, pharmacological trials, mechanistic investigations of CoVs, and other related research. The browsing interface of COVID-AMD presents detailed entries for each model and experiment, listing the name, species, viral strain, route of infection, classification, intended application, and cited references (Fig. 2e).

The animal model details page within COVID-AMD was methodically segmented into four sections: model and experiment summary, strain information, experimental methodologies, and phenotype data. The summary section had critical details, including disease and virus strain, routes of infection, species involved, categorization of the model/experiment, application, names of vaccines/drugs, the researching entity, and bibliographic references. Interactive links provided direct access to the Taxonomy ID for species, Nucleotide database for virus strains, and PubMed for references. Strain specifics encompassed strain name, algebra, gene, microbial status, cultivation period, reproductive data, and strain supply unit. The experimental methodology section included group name, inoculation route, animal count, infection dosages, timing, substances used, and a description of the methods, enhancing users’ comprehension of the experimental framework. Phenotypic data were presented both visually and textually, using a variety of graphical representations, such as bar graphs and line charts, to articulate data trends in pathogen detection, mortality rate and immunological changes, among other variables, across experimental and control cohorts (Fig. 2f-h). These results provided crucial insights into the physical characteristics influenced by genetic factors and virus infection, which are vital in assessing symptoms, disease progression, and immune response. Researchers can gain an intuitive understanding of how the disease manifests in different species, how it interacts with the immune system, and how it might be affected by potential therapeutic interventions. This information is crucial for developing effective treatment strategies and vaccines against CoVs. The visualizations, complemented by detailed textual descriptions, could be displayed in detail or downloaded in formats such as PNG, JPEG, PDF, or SVG. Furthermore, users can download comprehensive datasets on CoVs infection models or related experiments and access supplementary information for in-depth analysis. Further, the evaluation of drugs and vaccines for animal models of coronavirus infection is incorporated into the established animal model drug screening database (https://www.uc-med.net/DrugScreen/). This section includes detailed assessments of pharmacodynamics, pharmacokinetics, and toxicological evaluation, providing data to support the rapid development of drugs and vaccines66.

Comparative analysis of CoVs-infected animal models

The comparative analysis module enables multi-level examinations, assessing different species with identical phenotypes, diverse indicators for similar phenotypes, and comparisons of various species against a single indicator. Initially, users select a phenotype of interest for analysis, such as viral load, mortality rate, cytokine levels, or antibody presence. Subsequently, a specific virus (SARS-CoV-2, SARS-CoV, MERS-CoV) was chosen (Fig. 3a). Upon submission, COVID-AMD facilitated the interactive presentation of comparative analysis graphs and the corresponding datasets, inclusive of p-values. The mortality rates were depicted through bar charts (Fig. 3b), illustrating the variance across species infected with the chosen virus. The analysis of the data indicated a significant interspecies variability in mortality rates following SARS-CoV-2 infection. Specifically, the data revealed that mice exhibited a notably higher mortality rate compared to other species under evaluation. Quantitatively, the mortality rates among humans, mice, hamsters, and macaques were 9%, 52.12%, 41%, and 46%, respectively. These findings underscore the differential susceptibility and response to the SARS-CoV-2 infection across species, with mice demonstrating a particularly heightened vulnerability. To statistically validate these observations, a t-test was used, revealing a significant difference in mortality rates between the infected groups and a control group with a 0% mortality rate. This analysis, quantified by the calculated p-value, confirms the significant interspecies variability in response to the virus, providing critical insights into the differential impact of SARS-CoV-2 across species. Box-plots (Fig. 3c) illustrate cytokine level fluctuations post-infection. A comparison between the uninfected control group and the infected group reveals the application of Welch’s t-test for p-value computation, suitable for datasets with unequal sample sizes or variances67. The data, organized by cytokine type, facilitate an exploration of each cytokine’s role in physiological and pathological contexts. Notably, CXCL9, CXCL10, CCL2, and CCL4 levels were significantly elevated in mice, whereas IFN-α and IL-1β showed increased levels in hamsters post SARS-CoV-2 infection. The dynamic evolution of viral load in species infected with CoVs was illustrated using line charts (Fig. 3d). The analysis showed that humans and minks generally exhibited higher viral loads following infection with SARS-CoV-2 or SARS-CoV. The peak of infection across all species occured around the seventh day and had a significant decrease by the 14th day. Furthermore, for a precise assessment of the immune response, we categorized the antibody-related data into two datasets: one for IC50 and another for ELISA. This categorization allowed for a more accurate analysis. IC50 values shed light on the neutralizing capabilities of antibodies, whereas ELISA data provided quantitative insights into antibody concentrations, measured in OD450nm, over various infection periods68. Antibody-related data were presented via scatter plots with trend lines, outlining variations in antibody (NAb, IgG, IgA, IgM, etc.) across different infection periods. This approach aids in selecting the appropriate species and timing for antibody production, critical for evaluating experiments involving different antibodies. For instance, following SARS-CoV-2 infection, the ELISA results showed that IgG levels, measured in OD450nm units, were notably higher in mice compared to other species, as illustrated in Fig. 3e.

Fig. 3
figure 3

Comparative analysis and model recommendation (A) Comparative Analysis Selection and Tool: Interface for selecting parameter selection for comparative analysis. (B) Mortality Rate Bar Chart: Comparison of SARS-CoV-2 infection mortality rate (%) across different species. The chart reflected selections for the virus ‘SARS-CoV-2’, Phenotype ‘mortality rate. (C) Cytokine boxplot: Pre- and post-infection cytokine levels (pg/mL) in lung tissue samples. The chart reflected selections for virus ‘SARS-CoV-2’, phenotype ‘Cytokine’ and species ‘hamster’. (D) Antibody Scatter Plot: ELISA-derived dynamics of lgG levels in the lungs across various species after SARS-CoV-2 infection. The trend line represents the concentration of IgG changes. The chart reflects selections for virus ‘SARS-CoV-2’, phenotype ‘antibody’ and specifically, the antibody ‘lgG’. (E) Viral Load Line Plot: Trends in viral load across different species post-infection. The chart reflects selections for the virus ‘SARS-CoV-2’, phenotype ‘viral load’. (F) Model Recommendation Selection Tool: Interface for selecting parameters for model recommendations (G) Model Recommendation Pie Chart: Visualization displaying the proportion of models occurring within COVID-AMD for SARS-CoV-2 in mice, pertinent to human disease research and drug screening. The chart reflects selections for the virus ‘SARS-CoV-2’, species ‘mouse’, and research objective ‘mechanism research’.

Model recommendation of specific CoVs-infected animal models

The model recommendation function in COVID-AMD was designed to suggest the top 10 models by analyzing ‘model application’ and ‘frequency of models in the database’. Users can specify a virus (SARS-CoV-2, SARS-CoV, MERS-CoV, HCoV-229E, HCoV-OC43), the corresponding species, and their research purposes (such as mechanism research, drug evaluation, research on human diseases, vaccine assessment, or phenotypic and symptomatic comparison) (Fig. 3f). Upon selection, COVID-AMD generates a list displaying the model’s name along with their frequency in the database (Fig. 3g). For instance, selecting 'SARS-CoV-2' dynamically updates the species field to display options associated with SARS-CoV-2 in the database, including mice, monkeys, hamsters, etc. After a user selected ‘mice’ and opt for ‘mechanism research’, the platform recommends the most frequently utilized SARS-CoV-2 mouse model—SARS-CoV-2/human/USA-WA1/2020 intranasally infected K18-hACE2 mouse model— for this research focus, indicated that this model was most commonly used in articles related to mechanism research69,70. Each model is hyperlinked, allowing users to navigate directly to detailed pages with a single click.

Correlation of animal models of CoVs infection and omic data

High-throughput datasets from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) were collected for PCA (Principal Component Analysis), differential gene analysis, and GO and KEGG enrichment analysis. These datasets were integrated into the omics data analysis section of the corresponding model details page71. The General Details section contains metadata for each dataset, providing essential information such as infection time, assay, series, sample number, nucleic acid type, platform and description. PCA successfully visualized group clustering in a plot, demonstrating the distinct grouping of data points based on their principal components (Fig. 4a). Differential expression analysis identified differentially expressed genes, which were illustrated using a volcano plot. This analysis was complemented by a corresponding table displaying crucial gene expression metrics, including symbol, logFC (log fold change), logCPM (log counts per million), P-Value, FDR (False Discovery Rate), case_mean and control_mean (Fig. 4b). GO term and KEGG enrichment analyses presented the top 20 enriched pathways, showcasing the biological processes and pathways associated with the experimental data. These findings were depicted through informative bubble and bar plots for an intuitive understanding of pathway enrichment. The corresponding table further detailed these pathways with ID, description, gene, p-value, p. adjust, and count, offering an in-depth exploration of the enriched pathways (Fig. 4c-d). To enhance data analysis capabilities, a customized gene expression profile tool was developed into the COVID-AMD tools with 5 species and 45 datasets, offering advanced functionalities for exploring gene expression profiles (Fig. 4e). Users could generate a single gene expression box plot online based on the selected species and dataset, displaying gene expression profiles post different CoV infections, along with expression values and P-values between different groups. All images generated through online analysis are interactive, and all tables can be downloaded in full. By supplementing gene-level data association, this comprehensive genomic analysis enables researchers to identify critical genes and pathways implicated in viral pathogenesis and host response. Furthermore, the COVID Comparative Expression Database (COVID-CED, https://covid.com-med.org.cn) was incorporated into the COVID-AMD tools to implement more in-depth omics data analysis. COVID-CED focuses on comparative gene expression variations across species, virus strains, infection times, titers, cells/tissues, and offering four bioinformatics analysis tools (single gene expression profile, mult-gene expression profile, within-species comparative, cross-species comparative). These tools analyze differentially expressed genes and predict enrichment of molecular functions, pathways, and potential effects in cells72. Insights derived from this analysis could pave the way for identifying novel therapeutic targets and refining existing intervention strategies, ultimately contributing to the development of more effective treatments and preventive measures against CoV outbreaks. Although the current version of COVID-AMD primarily focuses on gene expression and enrichment analyses, future updates will incorporate proteomics data to provide a more complete view of virus-host interactions and enhance our understanding of infection mechanisms.

Fig. 4
figure 4

Omics data analysis. (A) PCA analysis of human bronchial epithelial cells against SARS-CoV at 12 h, 24 h, and 48 h, showing sample correlation using scatter plots (GSE17400). (B) Volcano plot showing DE genes in human bronchial epithelial cells against SARS-CoV at 48 h. The volcano plot for differentially expressed genes is cutoff for FC (fold change): 1.5, log2FC: 0.585, Pvalue: 0.05. The log-transformed fold change is plotted on the x-axis, and the negative log10 p-value is plotted on the y-axis. Colored points represent DE genes, and the number of up/down genes, along with the corresponding table, is shown. (C) Gene ontology terms significantly associated with candidate genes (Top 20). Each row represents one functional term, and neg_log10 p-value represents the enrichment significance for each GO term. Count represents the number of DE genes or transcripts enriched in each GO term. The bubble plot shows the biological process for differentially expressed genes in human bronchial epithelial cells against SARS-CoV at 48 h. (D) KEGG terms significantly associated with candidate genes (Top 20). Each row represents one pathway, and neg_log10 p-value represents the enrichment significance for each pathway. Count represents the number of DE genes or transcripts enriched in each pathway. The bar plot shows the enrichment pathway for differentially expressed genes in human bronchial epithelial cells against SARS-CoV at 48 h. (E) Dynamic gene expression profile of ACE2 in human bronchial epithelial cells against SARS-CoV. The box plot demonstrates ACE2 expression levels at 12 h vs mock 12 h. The table shows the expression levels and p-values for the test/control groups.

Discussion

CoVs infections, prevalent in humans and other mammals, has raised significant concerns due to species-crossing strains such as SARS-CoV, SARS-CoV-2 and MERS-CoV, leading to severe disease73,74,75. The COVID-19 pandemic has inflicted unparalleled economic strains and mortality, accentuating the virus’s persistent yet unseen presence76,77. The constant mutation of SARS-CoV-2 underscored the complexity of managing public health and sustaining vaccination efforts78. Future CoVs are likely to continue posing global health threats, thus necessitating the development of diverse animal models for preparedness, vaccine development, and therapeutic evaluation79.

To bolster public health preparedness and assess the efficacy of vaccines and therapeutics, the construction of heterologous disease animal models is imperative. Animal models for CoVs infections have become indispensable in life sciences research, fulfilling two principal functions: characterizing viral pathogenesis and evaluating antiviral agents and vaccines. They are particularly crucial when clinical trials are infeasible or unethical for infectious diseases, supplementing with vital insights. The optimal animal model should be susceptible to infection and emulate the clinical manifestations and pathology seen in human cases41. COVID-AMD has been developed to serve as an exhaustive online repository, as it continuously collects new models from a wide range of species and viruses, enabling the dissemination of findings related to CoVs infected animal models and offering analytical tools for comparative studies across species or various infection conditions. Furthermore, the database has addressed the challenge of disparate phenotypic data terminology by undertaking extensive manual curation and standardization. Curation is a systematic process that involves collecting, organizing, cleaning, validating, standardizing and documenting data, to improve data availability, reliability and comprehensibility80. The effective utilization of publicly available papers necessitates expertise in data acquisition, processing, normalization, and filtration81.

Given the rapid spread of SARS-CoV-2, China quickly identified the virus and shared its sequence, enabling other countries could quickly diagnose and protect themselves, leading to the rapid development of diagnostic tools82. There is a critical need for animal models to assess interventions against novel viruses41. It is our aspiration that the scientific community will exchange data on CoVs animal models, with a particular emphasis on the transparency of phenotypic data, both internationally and in an open-access manner. This exchange aims to bolster the expedited development of virus-specific vaccines and antiviral treatments, while simultaneously mitigating redundant efforts and adhering to the 3Rs—replacement, reduction, and refinement—in animal experimentation83,84.

It has been recognized that a comprehensive repository of animal model data for CoVs infections could significantly enhance researchers’ understanding of phenotypic variations across species and the efficacy of drugs or vaccines in different models, given that various viruses exploit similar replication pathways and host factors85. By utilizing the detailed experimental methods and indicators available in COVID-AMD, scientists can fine-tune their study designs, selecting optimal animal species, strains, age groups, genders, and quality standards, as well as determining appropriate grouping methods and reliable detection indicators. This facilitates the refinement of experimental protocols and the identification of suitable animal models. While acknowledging the limitations of the models currently in our database, their contribution is invaluable in bridging knowledge gaps, thereby positioning COVID-AMD as a crucial resource in combating both emerging and reemerging viral diseases.

In this paper, we developed COVID-AMD, satisfied the imperative to encompass all animal models of CoVs infection and was committed to continuous updates, particularly with respected to COVID-19 models, incorporating diverse routes of infection, genetic modifications, and phenotypic variations. Future enhancements to COVID-AMD will incorporate more CoVs infection models and additional omics data. To maintain the currency and accuracy of the COVID-AMD database, we have implemented a semi-annual update regimen. These updates will include the integration of new SARS-CoV-2 sequence data, a comprehensive review of updated literature, and the inclusion of updated CoVs infected animal model. This methodical approach to data management ensures that COVID-AMD remains an essential and evolving resource for researchers addressing the challenges of CoV infections. Through ongoing refinement, we aim to bolster the scientific community’s efforts in advancing our understanding and response to COVID-19 and related CoV diseases.

Methods

Data collection and processing

An initial data acquisition sheet was developed by reviewing literature related to animal models infected with CoVs. This sheet captured information such as the route of infection, clinical manifestations, pathological changes in the lungs or other organs, methods of virus detection, histological lesions, and immunological responses to CoVs. To ensure data consistency across multiple groups, enumerated values were assigned. Searches were conducted on NCBI (National Center of Biotechnology Information), PubMed (https://pubmed.ncbi.nlm.nih.gov/), bioRxiv (https://www.biorxiv.org/) and Elsevier (https://www.sciencedirect.com/) using specific terms related to the virus (e.g., ‘SARS-CoV’, ‘MERS-CoV’, ‘SARS-CoV-2’, ‘BCoV’) and species models (e.g., ‘Human’, ‘Mouse’, ‘Macaque’) from 2003 to 2024 to gather candidate articles. These articles were then manually curated to identify appropriate models. Scripts were employed to facilitate the retrieval of publication information, followed by an evaluation of abstracts to select studies encompassing a wide range of species, viral strains, infection routes, and methodological approaches. The inclusion criteria prioritized studies based on laboratory animals that simulate human diseases, providing a comprehensive view of disease occurrence and progression. This includes detailed phenotypic data such as clinical symptoms (e.g., changes in body temperature, weight loss, dyspnea), pathogen detection (e.g., viral replication in the lungs, histopathological changes), various transmission routes (e.g., nasal cavity, trachea, aerosol), histopathological changes due to viral infection (e.g., lung inflammation, cell infiltration), immune profiling (e.g., production of specific antibodies, release of cytokines), and biochemical markers. Studies from high-impact factor (IF) journals were given preference (Fig. 5a).

Fig. 5
figure 5

The framework of data integration and database construction. (A) Data acquisition and collection: Literature on CoVs-infected animal models and related data were downloaded and collected from public databases, including PubMed, bioRxiv, Elsevier, NCBI Taxonomy, Nucleotide and JSTOR. (B) ETL method: Quality control was executed using stringent inclusion criteria and the PRISMA method for systematic reviews86. The ETL involved normalization with OMOP Common Data Model for extraction87, Python scripts for misinformation, and SQL procedures for loading, culminating in an integrated dataset. (C) Database design and web interface: Imported the filtered and organized data into SQL database, employed data deduplication algorithms for cleaning, and applied the CDISC standard for data harmonization88. Establish front-end and back-end page and deployed models and related tools to realize search, browse, analysis, statistics and download.

To enhance the database’s external link functionality, a suite of publicly available identifiers was integrated, which includes Digital Object Identifier (DOI), PubMed Unique Identifier (PMID), NCBI Taxonomy browser IDs (https://www.ncbi.nlm.nih.gov/) and NCBI nucleotide records (https://www.ncbi.nlm.nih.gov/nuccore). Each full-text article was rigorously reviewed, including a thorough examination of supplementary tables, to extract multidimensional and crucial data from our predefined templates. When a single publication described multiple models, each variant—such as identical diseases in the same species but different strains, or the same diseases and strains with different viral strains—was cataloged separately. Each model received a unique and permanent ID starting with 'M' followed by numbers and letters (e.g., M0522A, M0522B) to differentiate models within the same publication. The nomenclature for models was as follows: virus name + virus strain name + infection route + infected + strain name + species + model. A team of four biological researchers conducted detailed reviews of all papers, ensuring that each manuscript was assessed by at least two researchers. Detailed sample information is provided in Supplementary Information Table S2.

Data modelling and exploratory data analysis

The data were refined and integrated using ETL (Extract, Transform, Load) methodology to ensure the quality and consistency necessary for rigorous analysis (Fig. 5b). Extract: Data were retrieved from various sources, including web interfaces, literature, and public databases, using the Data Thief tool (https://datathief.org/). Transform: This step entailed preliminary data processing such as format standardization, retaining key columns, filtering records, and unifying measurement units for compatibility with SQL databases. Load: Utilized SQL commands to import the cleansed data into the target database. Data conversion and cleaning: In this phase, SQL commands were applied to promote data integrity and uniformity. This included standardizing column names using ‘ALTER TABLE’ statements, normalizing data types with ‘CAST’ and ‘CONVERT’ functions, filtering records using ‘WHERE’ clauses, managing duplicate entries with the ‘DISTINCT’ keyword, and handling missing values using ‘NULL’ conditions with ‘IS NULL’ and ‘IS NOT NULL’. Column names were standardized through ‘ALTER TABLE’ statements. ‘CAST’ and ‘CONVERT’ functions standardized data types, whereas ‘WHERE’ clauses enabled selective record filtering. Duplicate entries were managed using the ‘DISTINCT’ keyword, and missing values were handled with ‘NULL’ conditions, employing ‘IS NULL’ and ‘IS NOT NULL’ for the identification and exclusion of incomplete records. Additionally, the ‘GROUP BY’ clause, combined with aggregation functions like ‘AVG’ and ‘COUNT’, was employed to effectively consolidate and summarize data within the SQL framework.

Secondary cleaning of data

In the subsequent phase of data quality enhancement, the Python Pandas library played a pivotal role. Initially, the dropna( ) function removed rows or columns with missing entries, while the fillna( ) method filled gaps in the dataset with predetermined data. This was followed by the removal of duplicate records through the drop_duplicates( ) method to ensure data uniqueness within the DataFrame. Subsequently, the astype( ) function converted DataFrame columns to standardized types, establishing data type uniformity. Specifically for temporal data, the to_datetime( ) function was crucial in accurately converting columns to date-time formats. Lastly, Boolean indexing was employed as a strategic means to filter DataFrame rows, effectively applying logical conditions to refine the datasets.

Comparative analysis

COVID-AMD conducted comparative analysis at the phenotype level, enabling data visualization for multi-perspective and multi-level analysis. The analysis was stratified by sample size, with smaller samples (n < 30)—including cytokine profiles, antibody titers and protein—analyzed using Welch’s t-test to assess inter-group differences. For medium-sized samples (30 ≤ n < 100), such as mortality rates, independent t-tests or ANOVA were utilized depending on the data distribution. Python’s SciPy library was employed for executing these statistical tests and accurately calculating p-values. Subsequently, the polyfit function within the same library was applied to fit a quadratic polynomial for trend line extrapolation. Comprehensive summary statistics, trend analyses, and correlation assessments were conducted, shedding light on patterns and relationships within the data. Interactive online visualization of phenotypic data was achieved via Plotly (v 1.5.10) and integrated into a meticulously crafted HTML page, with JavaScript and CSS providing structural and stylistic enhancement before server deployment.

Model recommendation system

The COVID-AMD model recommendation system adheres to established nomenclature protocols, encapsulating virus and species information within each model name, meticulously extracted and standardized from relevant literature. Model names, in accordance with naming rules, include virus and species information that has been standardized based on the literature. Model applications were classified into categories including phenotype and symptom comparison, drug screening, drug evaluation research, research on human diseases, mechanistic investigation, vaccine evaluation, and vaccine prevention research. After users select diseases, species, and purpose of research (model application) online, a list of corresponding model names will be retrieved based on ‘disease’ and ‘species’ fields in the name of each model. This approach indicated that a model was frequently used by researchers and provided guidance for user to select models when designing subsequent experiments.

Omics data analysis

High-throughput gene expression datasets were sourced from the Gene Expression Omnibus (GEO) database to perform gene-level analyses of CoV infections, facilitating the identification of key genes and pathways involved in the viral infection process. Both microarray and RNA-seq datasets related to CoV infections in human or animal models were acquired from the GEO (https://www.ncbi.nlm.nih.gov/geo/). For microarray datasets, preprocessing procedures, including quality control, background adjustment, calibration, and probe summarization, were performed using Affy (Affymetrix package, v1.78.2, https://bioconductor.org/packages/release/bioc/html/affy.html). Regarding RNA-seq datasets, quality control steps were implemented utilizing fastQC (v0.12.1) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), alignment was performed using STAR (Spliced Transcripts Alignment to a Reference, v2.7.11a, https://github.com/alexdobin/STAR), and count generation was done using Salmon. Subsequently, gene quantification was performed using Salmon (v1.2.1). Both raw read counts and TPM data were retained. Gene annotation used the latest human (GRCh38, Ensembl release-99), mouse (GRCm38), monkey (Mmul_10) and ferret (MusPutFur1.0, Ensembl) gene annotations. Principal Component Analysis (PCA) and analysis of Differentially Expressed Genes (DEGs) were conducted based on sample grouping. These analyses were performed using the limma package (v3.56.2, https://bioconductor.org/packages/release/bioc/html/limma.html) or edgeR (v4.0.16, https://bioinf.wehi.edu.au/edgeR/https://bioconductor.org/packages/edgeR) with a threshold of |logFC|> 0.585 and p < 0.05 (equivalent to FC > 1.5, p < 0.05). Based on the obtained differentially expressed genes, clusterProfiler (v4.10.0, https://bioconductor.org/packages/clusterProfiler/) and DAVID (https://david.ncifcrf.gov/) were used to conduct GO (biological process, cellular component and molecular function) and KEGG functional enrichment analysis. Significantly enriched functional items with p < 0.05 were selected, and the top 20 were displayed. Various visualizations, including differential gene box plots, volcano plots, GO enrichment bubble plots, and KEGG enrichment bar plots were generated using plotly (version 1.5.10). Metadata extraction was performed based on key parameters such as summary, species, strain, virus, virus strain, infection time, assay, series, sample number, nucleic acid type, platform, PubMed ID, and description.

Database architecture and accessibility

The database interface utilized HTML5 and CSS for layout and styling, with jQuery (version 1.3) specifically enhancing user interaction. For backend architecture, the Enterprise Java Beans framework was combined with Node.js for efficient server-side processing. Python (version 3.9) served as the primary programming language, complemented by JBoss (version 6.0) as middleware, and MySQL (version 5.7) as the database engine. The system was designed for scalability, utilizing standard SQL queries. Key features of the system included its open access, user-friendly and intuitive interface, operational simplicity, and robust security measures, ensuring a reliable experience for users (Fig. 5c). COVID-AMD is compatible with major browsers including Google Chrome (latest version), Safari (v12.0 and above) and Firefox (latest version). Moreover, the platform supported bilingual functionality, offering an English-Chinese interface, and was freely accessible to the public at https://www.uc-med.net/CoV-AMD/.

Conclusion

COVID-AMD is a valuable, open-access platform that integrates and standardizes data on animal models infected with various coronaviruses (CoVs). It provides researchers with a wealth of resources, including animal models, phenotypic data, model applications, and gene expression data, effectively bridging the gap between preclinical animal experiments and clinical research. This integration facilitates a comprehensive understanding of the molecular mechanisms underlying CoV infections. The platform’s advanced analytical tools enable cross-species comparisons, revealing phenotypic and genetic variations in response to different CoV infections. Additionally, the model recommendation tool offers critical guidance for the selection and establishment of appropriate animal models. At the genetic level, COVID-AMD allows for online single-gene expression analysis, highlighting similarities and differences between animals and humans by comparing gene expression changes following CoV infections. This capability enhances the accuracy of disease prediction and progression modeling. These tools significantly streamline research processes, improve efficiency, and play a pivotal role in advancing CoV-related studies. By promoting drug and vaccine development, COVID-AMD strengthens global preparedness and response to future epidemics.