Abstract

This systematic review (PROSPERO registration number: CRD42021282476) aims to collect and analyse current evidence on real-world performance based on clinical accuracy of instrument-read rapid antigen diagnostic tests (Ag-IRRDTs) for SARS-CoV-2 identification. We used PRISMA Checklist and searched databases (PubMed, Web of Science Core Collection and FIND) for publications evaluating the accuracy of SARS-CoV-2 Ag-IRRDTs as of 30 September 2021, and included 40 independent clinical studies resulting in 48 Ag-IRRDT datasets with 137,770 samples. Across all datasets, pooled Ag-IRRDT sensitivity was 67.1% (95% CI: 65.9%–68.3%) and specificity was 99.4% with a tight CI. Pooled sensitivity and specificity of SARS-CoV-2 Ag-IRRDTs did not demonstrate a significant superiority over SARS-CoV-2 rapid antigen tests which do not require a reader instrument, even in the case where surveillance and screening datasets were excluded from the analysis. Nevertheless, they provide connectivity advantages and remove operator interface (in results-reading) issues. The lower sensitivity of certain brands of Ag-IRRDTs can be overcome in high prevalence areas with high frequency of testing. New SARS-CoV-2 variants are major concern for current and future diagnostic performance of these tests.

1. Introduction

Since the World Health Organization declared COVID-19 a pandemic in March 2020, rapid and accurate testing for SARS-CoV-2 become essential for clinical management and effective isolation of COVID-19 patients. While qRT-PCR instruments detect viral nucleic acid in few hours, and are considered the gold standard for detecting COVID-19, they have high purchasing and running costs and require dedicated staff to operate.

Both Ag-RDTs and Ag-IRRDTs are rapid, low cost, portable, and simple to operate devices which can be used in point-of-care (POC) use as well as in hospitals, schools and sports communities. FIAs constitute a subset of Ag-IRRDTs. An Ag-IRRDT device provides user-independent test results. Since there is an electro-optical reader of the Ag-IRRDT, there is a possibility of connection to laboratory-information system in the hospitals which provide ease of documentation and archiving, but they are also amenable to point-of-care testing. Ag-IRRDT assays consist of lateral flow cartridges where the specimens are manually loaded. Results are read on a small portable electronic reader. These devices are considered easy to use without much training required. However, they are not suitable for batch testing, as (in most cases) only a single sample can be analyzed at a time and the device requires 3–20 min process period, in addition to about 5-min disinfection and drying procedure [1]. It should be noted that (although FIAs dominate the Ag-IRRDT world market at present), a definition of Ag-IRRDT does not exclusively imply operation of tests using fluorescence principle alone, since a reader instrument may sense the lines on the sensor cassette by visible reflectance principle. A Comparison of reported SARS-CoV-2 probes can be found elsewhere [2, 3]. However, currently only few of them are employed in commercially available instruments.

Earlier review articles on Ag-RTDs in the literature either do not include Ag-IRRDTs [4, 5] or have limited coverage [69]. Another study [10] presents an extended review of Ag-RTDs but mixed Ag-IRRDTs with CLEIAs.

Reports and guidelines of regulatory agencies [1113] and healthcare authorities [14, 15] also take the performances of Ag-IRRDTs into consideration in varied depth scale.

This SR attempts to give a current overview of manufacturer independent studies for an objective assessment of Ag-IRRDTs, applying some specific inclusion criteria, as of 30, September, 2021. To the best of our knowledge, present study is a unique systematic review in the literature which has been concentrated specifically on Ag-IRRDTs. Such independent reviews can be helpful in differentiation studies of test devices eligible for reimbursement in worldwide healthcare systems.

2. Methods

2.1. Survey Methodology

The PRISMA flow-diagram [16] and standard guidelines for systematic reviews were followed as shown in Figure 1. Additionally, the systematic review was registered on PROSPERO (Registration number: CRD42021282476).

2.2. Search Strategy

Databases PubMed, Web of Science Core Collection, as well as the Foundation for Innovative New Diagnostics (FIND) website were searched using the terms of SARS-CoV-2, COVID-19, coronavirus, evaluation, accuracy, point of care testing, POC tests, fluorescence immunoassay, fluorescence, FIA and rapid antigen test. Two authors (A.E. and A.U.K) performed the Search Strategy. Disagreements were resolved by continued discussions until a unanimous decision was reached in a session with the participation of all authors, third author (P.C.) acting as a referee.

2.3. Inclusion Criteria

Only peer-reviewed publications and reports were included (preprints were not included in the analysis). No language restrictions were applied. If existed, publications with a tested sample population size of less than 30 were excluded.

Studies based on saliva samples were excluded in this study due to evidence regarding use of saliva as Ag-RDT specimen type has conflicting results [12].

While new brands of rapid antigen test devices for SARS-CoV-2 enter into the market, their performances were reported to be markedly lower than the manufacturers’ specs [1719]. Hence independent analyses are essential for accurate judgement of device performances. Assessment of independence from manufacturers was based on whether a study received financial support from a test manufacturer or any study author was affiliated with a test manufacturer. Here, only those independent (non-manufacturer sponsored) Ag-IRRDT-based studies were included. Reagent, device and other consumable materials donations were exempt from exclusion decision.

Only those studies which clearly report sample size, sensitivity and specificity of their measurements were included in this analysis with qRT-PCR as the reference standard.

This SR takes the assumption that qRT-PCR testing is the most appropriate measure of comparison for the diagnosis of COVID-19. While viral culture might provide better measurements, it suffers from other implementation issues. Nevertheless, some studies reporting their results in reference to viral culture were also included in the SR.

Descriptive analyses of all studies were performed to estimate pooled sensitivity and specificity in comparison to qRT-PCR testing.

2.4. Data Extraction and Analysis

Studies were screened, their characteristics were extracted independently by each reviewer. Each of the reviewers were acting blind during this process. Two reviewers (A.U.K and A.E, A.U.K and P.C., or A.E and P.C.) reviewed the titles and abstracts of all publications independently, then followed by a full-text review for those eligible, to select the articles for inclusion in this study. Any disagreements were resolved by the participation of third reviewer in joint discussions.

The last name of the first author of a study was used along with the country where testing took place, the manufacturer and model names of the Ag-IRRDT kits, total number of subjects, sample condition (fresh or un-fresh), sample types (NP, MT, OP, AN), compliance with manufacturer instructions for use (IFU), the number of positive qRT-PCR samples, reported sensitivity and specificities and ranges of Ct values of the reference standard. The results were tabulated using a reference number for each dataset. Pooled data results were also given.

Sensitivity and specificity for each test were presented with 95% confidence intervals (CIs). Data extraction was independently performed using 2-by-2 contingency tables of the number of true positives, false positives, false negatives and true negatives, and data according to viral load (high or low, according to Ct cut-offs defined within studies) were separately extracted.

The results were presented using the forest plots of sensitivity and specificity, in each case. Pooled sensitivities and specificities were computed according to test manufacturer.

2.5. Statistical Analysis and Data Synthesis

Raw data were extracted from the studies and performance estimates were recalculated. Forest plots indicating sensitivity and specificity and their CIs for each test, as well as for polled sensitivity and specificity and their CIs are plotted. Then, the heterogeneity between studies was visually evaluated. Accuracy parameters and their CIs were recalculated. In order to assess the uncertainty introduced by sample size, the 95% CIs were calculated using Wilson’s method.

A group-analysis was performed for a test group if three or more datasets were available under its title, otherwise only a descriptive analysis was performed, and sensitivity-specificity ranges were reported.

Point estimates of accuracy parameters for SARS-CoV-2 detection were reported relative to their qRT-PCR results with 95% confidence intervals (CIs). The meta-analyses and relevant plots were constructed by using “metafor” package and a bivariate model package “mada” in R 4.0.1 software (R Foundation for Statistical Computing, Vienna, Austria) and RStudio (RStudio, Inc., Boston, MA, USA) (version 1.3).

Sample type assessment was accomplished using nasopharyngeal (NP) alone against combined oropharyngeal (OP), anterior nasal (AN) or mid-turbinate (MT) specimens, grouped as “others.”

2.6. Methodological Quality Assessment and Publication Bias

Assessment of the quality of the included studies were independently performed by two authors (A.E. and P.C) using the diagnostic test accuracy quality assessment tool of the Joanna Briggs Institute (https://jbi.global/critical-appraisal-tools). Discrepancies were resolved in a discussion session with the participation of all authors. Quality (risk of bias) grading were accomplished as follows: Total score ≤49; low-quality (high-risk of bias), total score 50–69: moderate-quality (moderate-risk of bias); total score ≥70%, high-quality (low risk of bias). Funnel plots were constructed to detect publication bias.

2.7. Sensitivity Analysis

Estimation of sensitivity and specificity analysis was planned by excluding surveillance and screening studies. The results of each sensitivity analysis were compared against overall results to assess the potential bias introduced by considering surveillance and screening studies.

2.8. Analytical Comparisons

This study design was confined to clinical diagnostic studies, therefore a comparison with analytical studies was beyond the scope of this SR.

2.9. Comparing Performances of SARS-CoV-2 Ag-RDTs against SARS-CoV-2 Ag-IRRDTs

We searched earlier papers that present an overview of commercial SARS-CoV-2 Ag-RDTs not requiring a reading instrument. We then compared the performance results of studies dealing with SARS-CoV-2 Ag-IRRDTs against earlier SRs which report performances of commercial SARS-CoV-2 Ag-RDTs not requiring a reading instrument.

2.10. Comparing Performances of SARS-CoV-2 Ag-IRRDTs against Combination of SARS-CoV-2 Ag-RDTs and Ag-IRRDTs

As another benchmarking, the overall sensitivity measure reported in other SRs which include both SARS-CoV-2 Ag-RDTs and Ag-IRRDTs was compared with the overall sensitivity measure reported in this SR (which includes only SARS-CoV-2 Ag-IRRDTs).

3. Results

3.1. Summary of Studies

This SR included 48 clinical accuracy datasets reported in 40 sources with a total number of 137,770 samples and 5,925 samples with confirmed SARS-CoV-2 by qRT-PCR.

3.2. Overall Performance of Ag-IRRDTs

Across all analysed samples, the pooled Ag-IRRDT sensitivity and specificity were 67.1% (95% CI 66.7% to 69.1%) and 99.4% (95% CI 99.4% to 99.4%), respectively.

Table 1 displays all 48 datasets gathered on the Ag-IRRDT based studies that were eligible in this SR. Figure 2 shows forest plots of these 48 tests included in this SR, as well as their pooled result (Accuracy estimates with 95% confidence interval were calculated using the Wilson score method).

Diagnostic odds ratio (for all 48 tests combined) is computed as DOR = 336.54 (95% CI = 308.09–367.61), positive likelihood ratio LR+ = 111.318 (95% CI = 103.63–119.58), and negative likelihood ratio LR− = 0.331 (95% CI = 0.319–0.343) along with the test for equality of sensitivities: χ2 = 875.98, df = 47, test for equality of specificities: χ2 = 1776.08, df = 47, indicated that overall heterogeneity of the tests was high.

3.3. Methodological Quality Assessment

The diagnostic test accuracy quality assessment tool of the Joanna Briggs Institute diagnostic accuracy checklist was used (with 480 entries with 48 resulting scores) to examine the quality of each study that has been included in this SR. The highest quality score of the included studies was 88.9/100 (12 studies). The lowest quality score was 55.6/100 (4 studies). Overall, there were no low-quality studies, 62.5% of high-quality, and 37.5% of moderate-quality studies.

3.4. Publication Bias

For the publication bias assessment, a funnel plot is drawn including all datasets of this SR along with the results of Egger’s tests are shown in Figure 3. In this plot, the effect size was taken as the logarithm of odds ratio.

Regression Test for funnel plot asymmetry (using weighted regression with multiplicative dispersion model and standard error as predictor) yields t = 5.7391, and limit estimate value of intercept is b = −5.3701 (CI: −6.4775, −4.2626). Inspection of the size of intercept shows that it differs significantly from zero, indicating funnel plot asymmetry, hence (possible) publication bias.

3.5. Prevalence of SARS-CoV-2

Prevalence rate (the number of qRT-PCR positive samples within the study population) varied between 0.4% and 78.7%. Pooled prevalence rate was 4.3%. However, it was noted that the prevalence of SARS-CoV-2 in most of these studies did not reflect the prevalence in the local populations, hence introducing a bias in the studies.

3.6. Symptomatic and Asymptomatic COVID-19 Population

Although most of the datasets reported in studies included in this SR were related to symptomatic COVID-19 cases, majority of samples were collected from asymptomatic individuals. This is because of the fact that three datasets alone [1, 40, 48] were surveillance studies including a total number of 107,514 Ag-IRRDT samples, comprising 78% of overall sample count and having 638 qRT-PCR verified positive cases in total.

3.7. Conformity with Manufacturers’ Instructions for Use

It was noted that 22 studies reported conformity with the manufacturers’ instructions for use of Ag-RRDTs out of 48 datasets (45.8%). Pooled sensitivity was 0.703 (95% CI: 0.686–0.720) and pooled specificity was 0.995 (95% CI: 0.994–0.995). Although surveillance and screening studies were present, these sub-group accuracy values are slightly higher than the overall accuracies of Ag-RRDTs. Diagnostic odds ratio and likelihood ratios of pooled tests were as follows: DOR = 439.44 (95% CI: 391.38–493.40), LR+ = 131.15 (95% CI = 120.60–142.62) and LR− = 0.30 (95% CI: 0.28–0.32).

Test for equality of sensitivities yields χ2 = 325.36, and for specificities χ2 = 925.33, both with df = 21 and indicate the existence of heterogeneity in this sub-group of Ag-IRRDT studies which reported conformity to manufacturers’ instructions. In this sub-group, correlation between sensitivities and false positive rates was weak (ρ = 0.183 with 95% CI: −0.259–0.561).

Pooled accuracies of non-conforming sub-group including 26 datasets were computed as follows: Sensitivity = 0.645 (95% CI: 0.629–0.661), specificity = 0.990 (95% CI: 0.989–0.992). The non-conforming sub-group accuracy values were lower than the overall accuracies of Ag-RRDTs. In this sub-group, test for equality of sensitivities yield χ2 = 526.4, and test for equality of specificities provide χ2 = 570.9, both with df = 26 and , indicating substantial heterogeneity.

Figure 4(a) displays the forest plots related to conformity to manufacturers’ instructions for use of Ag-IRRDTs.

3.8. Analysis by Sample Type

Nasopharyngeal (NP) samples with oropharyngeal (OP), anterior nasal (AN) or mid-turbinate (MT) swab samples, or with their combinations were assessed to categorize tests by sample type. Note that saliva tests were excluded in this SR. The most common sample type evaluated was NP swabs (in 32 studies, 66.7%) followed by AN (in 7 studies, 14.6%). Hence, NP swab samples were separately analysed for their accuracy performance against other sample types. Figure 4(b) displays forest plots related to sample types. NP swab samples achieved a pooled sensitivity of 0.651 (95% CI: 0.635–0.665). DOR = 300.9 (95% CI: 271.3–333.9) and test results for equality of sensitivities in pooled NP swabs (χ2 = 512.4, ) demonstrate heterogeneity in sensitivity values for tests done using NP swabs.

3.9. Analysis by Sample Condition

Pooled sensitivity values of Ag-IRRDTs for un-fresh and fresh samples were 66.9% (95% CI: 64%–70%) and 67.2% (95% CI: 66%–68%), respectively.

3.10. Ag-IRRDT Sensitivity by Ct Value

This is used as a surrogate for viral load to estimate the limit of detection of antigen tests. A single Ct threshold value of Ct = 30 was selected and sensitivities of available datasets were investigated according to specified threshold, rather than using multiple Ct values. As expected, all Ag-IRRDTs showed higher sensitivity values in samples with high viral loads, and sensitivity dropped beyond Ct >30 (Table 2).

3.11. Sensitivity Analysis

When analysis was restricted to studies that exclude three surveillance and screening reports, overall pooled sensitivity increased from 67.1% to 69.3% (95% CI: 68% to 70.5%) and overall pooled specificity decreased from 99.4% to 98.7% (95% CI: 98.5% to 98.8%).

3.12. Meta-Regression

A meta-regression was not performed due to substantial heterogeneity in reporting subgroups.

3.13. Manufacturer Based Accuracies

Overall pooled sensitivity of five different Ag-IRRDT brands with the available database of more than three studies, altogether comprising 40 clinical accuracy datasets with 135,624 samples was 68.3%. Pooled specificity of the same sub-group was 99.4%. In this sub-group, correlation coefficient of sensitivities and false positive rates is ρ = 0.843.

Figure 5 displays forest plots of these Ag-IRRDT brands. Eyeball test on forest plots and pooled diagnostic odds ratio DOR = 353.097, (95% CI: 322.423–386.688), positive likelihood ratio of LR+ = 112.514 (95% CI: 104.723–120.884), negative likelihood ratio of LR− = 0.319 (95% CI: 0.306–0.332), as well as test for equality of sensitivities calculated as ρ2 = 278.83, , and test for equality of specificities calculated as χ2 = 867.72, show that heterogeneity in datasets for five major Ag-IRRDT manufacturers is high.

This SR highlights the top performance of the LumiraDX including 10 studies with pooled sensitivity of 81.8%, a sample size of 4,697 and with relatively narrow ranges of CIs for both sensitivity and specificity. Although Shenzen Bioeasy FIA demonstrated the highest sensitivity value of 87.2%, the number of studies and sample size (3, 410) were low. Note that its 95% CIs have the widest ranges for both sensitivity and specificity. SD Biosensor Standard F group had the highest number of test samples (79,030). Removing surveillance studies from SD Biosensor Standard F group did not change the pooled sensitivity value (54.4%) and reduced the pooled specificity from 99.4% to 98.5%. On the other hand, removing surveillance studies from Quidel Sofia demonstrated an increase of pooled sensitivity value from 68.7% to 74.6% (and a decrease in pooled specificity value from 99.7% to 98.5%) for 11,500 samples, placing Quidel Sofia among good performers.

3.14. Results of Comparing Performances of SARS-CoV-2 Ag-RDTs against SARS-CoV-2 Ag-IRRDTs

Hayer et al. [5] present an overview of commercial SARS-CoV-2 Ag-RDTs not requiring a reading instrument with 19 studies investigating five different Ag-RDTs presented detailed population characteristics and Ct values. Only three commercial Ag-RDTs have been assessed in multiple studies, and of these, only two brands had adequate levels of performance; their sensitivity estimates were around 80%. These two Ag-RDTs with the available database of more than eight studies, reported a specificity of 97% in the majority of the trials.

On the other hand, present SR includes more than 12 times the number of samples, 2.5 times the number of different all peer-reviewed studies and more than twice the number different brands with respect to earlier study [5] which did not include mass-surveillance reports, as shown in Table 3. Top performers of our SR include one brand with 10 datasets with pooled sensitivity of 81.8%, a sample size of 4,697 and with relatively narrow ranges of CIs for both sensitivity and specificity. Another good performer of our SR presents the highest sensitivity value of 87.2%, with 3 datasets and 410 samples.

3.15. Results of Comparing Performances of SARS-CoV-2 Ag-IRRDTs against Combination of SARS-CoV-2 Ag-RDTs and SARS-CoV-2 Ag-IRRDTs

Pooled sensitivity measure reported in another SR [10] was compared with the pooled sensitivity reported in this SR, when the datasets from preprints (about 37% of their dataset count) were excluded. In this case, the new sensitivity value was reported as 0.672 (95% CI: 0.629–0.713) which came close to the value of overall pooled sensitivity reported in our present study. However, it should be noted here that, this new sensitivity value is for the overall combination of Ag-RDTs and Ag-IRRDTs.

4. Discussion

Lower sensitivities of Ag-IRRDT tests are due to false-negative results in some patients. Therefore, any negative result for a symptomatic patient should be confirmed by qRT-PCR test. This reduces the clinical utility of rapid antigen tests in low prevalence areas. Nevertheless, Ag-IRRDT tests can be useful in areas where molecular testing is not available or overloaded.

It should be noted here that it is currently unclear how test positivity (by any test) translates into clinical infectiousness and person-to-person spread [52].

Ag-IRRDT tests may vary in analytical sensitivity. This is one reason for differing clinical sensitivities of these tests. It was shown that [23] the relationship between Ct and viral load was poor for samples with Ct values >33. The large variation of clinical sensitivities between different brands of Ag-IRRDTs could also be due to individual study design, operator competencies and quality of the Ag-IRRDT itself. The lower sensitivity demonstrated by certain brands of Ag-IRRDTs can be overcome in high prevalence areas with high frequency of testing that may partly relieve some concerns around sensitivity [7, 57].

In reference to qRT-PCR validation, ideal Ag-IRRDT sensitivity as a function of Ct value would be a flat curve. However, this is not the case in practice, and sensitivity decreases as Ct value increases. The rate of decrease in sensitivity happens to be at a faster pace beyond a certain Ct level. Thus, the likelihood of false-negative antigen test results becomes higher at lower viral loads. While some studies detected no difference in the mean Ct values between symptomatic patients and asymptomatic patients [41], others reported that symptomatic patients displayed lower Ct values than asymptomatic COVID-19 patients, and a Ct value of 30 is the threshold for SARS-CoV-2 infectivity [22, 38]. Moreover, it was shown [37] that different sensitivity versus Ct value patterns prevail in symptomatic and asymptomatic patient groups.

It should be noted here that all measurement conditions cannot be expected to be the same for every study. For example, measurement temperature may also affect Ag-IRRDT sensitivity and specificity results [58], but only few reports include their measuring temperature ranges. Similarly, a lack of evidence to guide optimal nasal swab testing can increase the risk of false-negative test results [59]. Whether SARS-CoV-2 antigen-detection using a rapid test with self-collected nasal swab or professional-collected nasopharyngeal swab makes a difference can be another issue [60]. Cross-reactivity from other viral samples (like dengue, syphilis, hepatitis B and rheumatoid factor) are usually not considered by most researchers. Currently, most disturbing parameter is the existence of new SARS-CoV-2 variants [61] that may adversely affect rapid antigen test performance [62]. It should be pointed out here that the sensitivity of any COVID-19 tests to new SARS-CoV-2 variants were not considered in the studies included in this review.

As the research on specific problems [6365] related to COVID-19 is exponentially growing, use of reliable, cost effective and fast means of diagnosing the disease become very valuable. In order to meet this need, numerous non-molecular tests such as SARS-CoV-2 Ag-RDTs and SARS-CoV-2 Ag-IRRDTs have been introduced by different manufacturers in the worldwide market. The SR presented in this paper have shown that (contrary to expectations), overall pooled sensitivity and specificity of SARS-CoV-2 Ag-IRRDTs did not demonstrate a significant superiority over SARS-CoV-2 Ag-RDTs which do not require a reader instrument, even in the case where surveillance and screening datasets were excluded from the analysis. Nevertheless, they provide connectivity advantages and reduce operator interface (reading) issues.

One possible limitation of the present SR design is the assumption (as in the previously published SRs) that qRT-PCR testing is the standard measure of reference. Viral culture might provide a better measure of comparison; however, it suffers considerable implementation problems in practice. In addition, the present SR did not assess the influences of age, gender, symptom duration and sample collector (a swab sample obtained by a trained professional or a self-collected swab) on the accuracy of Ag-IRRDTs.

5. Conclusions

Most manufacturers of Ag-IRRDTs can produce high specificity tests, but their sensitivities are low and there are significant differences in their sensitivity (15%–99%). The lower sensitivity of certain brands of Ag-IRRDTs can be overcome in high prevalence areas with high frequency of testing. Conformity to the manufacturers’ instructions for use in testing procedure improves the accuracy of these tests. New SARS-CoV-2 variants are major concern and they should be evaluated in the future studies.

Abbreviations

Ab:Antibody
Ag:Antigen
Ag-RDT:Antigen rapid diagnostic test
Ag-IRRDT:Instrument-read antigen rapid diagnostic test
AN:Anterior Nasal
b:Limit estimate value of intercept (An indicator of funnel plot asymmetry)
CI:Confidence interval (taken as 95% of indicated parameter value)
CLEIA:Chemiluminescent Immunoassay
COVID-19:Coronavirus disease 2019 caused by SARS-CoV-2
Ct:Cycle threshold
DOR:Diagnostic odds ratio
df:Degree of freedom
FIA:Fluorescence immunoassay
IFU:Instructions for use
JBI:Joanna Briggs Institute (University of Adelaide)
χ2:Chi-squared value
LR:Likelihood ratio
MT:Mid-turbinate
NP:Nasopharyngeal
OP:Oropharyngeal
POC:Point of Care
qRT-PCR:Quantitative Reverse-Transcriptase Polymer Chain Reaction
ρ:Correlation coefficient
ROC:Receiver Operating Characteristic
SARS-CoV-2:Severe Acute Respiratory Syndrome Coronavirus-2
SE:Standard Error.

Data Availability

Data available from corresponding author upon reasonable request.

Ethical Approval

This study did not require an ethical approval because the systematic review was based on published research.

Conflicts of Interest

The authors declare that there is no conflicts of interest.

Authors’ Contributions

AUK conceived the study, co-ordinated contributions from the co-authors. PC, AET drafted the work, prepared the figures and tables. All authors took part equally in designing the study, searching the databases, screening papers against eligibility criteria, methodological quality assessment of included studies and analysing the data. All authors participated in the preparation of final manuscript, gave final approval of the version to be published and agree to be accountable for all aspects of the work.

Acknowledgments

All product names and trademarks are the property of their respective owners. The authors received no funding for this work.