• KSAN
  • Contact us
  • E-Submission
ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS

Articles

Review Article

Assessing the Proficiency of Emergency and Critical Care Nurses in Electrocardiogram Interpretation and the Integration of Computerized Electrocardiogram Analysis—Benefits and Limitations: A Systematic Review

Published online: May 26, 2026

1Graduate Student PhD Candidate, Department of Clinical Nursing, The University of Jordan School of Nursing, Amman, Jordan

2Associate Professor, Department of Clinical Nursing, The University of Jordan School of Nursing, Amman, Jordan

Corresponding author: Amer Hussein Alwahsh Department of Clinical Nursing, The University of Jordan School of Nursing, Queen Rania Street, Amman 11942, Jordan. Tel: +962-79-8559748 E-mail: ameralwahsh@hotmail.com
• Received: November 25, 2025   • Revised: April 18, 2026   • Accepted: May 4, 2026

© 2026 Korean Society of Adult Nursing

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 35 Views
  • 2 Download
  • Purpose
    This systematic review aimed to evaluate electrocardiogram interpretation competency among emergency and critical care nurses and to examine the diagnostic performance, benefits, and limitations of computerized and artificial intelligence–based electrocardiogram interpretation systems.
  • Methods
    This systematic review was conducted in accordance with PRISMA 2020 guidelines and registered in the International Prospective Register of Systematic Reviews under registration number CRD420251169307. Six electronic databases and additional sources were searched for studies published between January 2020 and October 2025, with the final search conducted in October 2025. Studies were included if they involved registered nurses interpreting electrocardiograms in acute care settings or evaluated computerized electrocardiogram interpretation systems using adult datasets. Methodological quality was assessed using validated tools appropriate to study design, including the Joanna Briggs Institute critical appraisal tools, ROBINS-I, and QUADAS-2.
  • Results
    Mean electrocardiogram interpretation scores among nurses ranged from 43% to 68%, with fewer than 40% of participants meeting predefined competency thresholds. Performance was strongest for asystole recognition and weakest for tachyarrhythmias, myocardial ischemia, and conduction abnormalities. Artificial intelligence–based systems demonstrated high diagnostic accuracy, with area under the curve values ranging from 0.91 to 0.97 and sensitivity exceeding 94% across major diagnostic tasks.
  • Conclusion
    Emergency and critical care nurses demonstrated insufficient electrocardiogram interpretation competency in several safety-critical domains. Computerized and artificial intelligence–based systems showed high diagnostic accuracy and may serve as effective complementary tools when integrated with ongoing nurse education and appropriate clinical oversight.
Electrocardiogram (ECG) remains one of the most essential diagnostic tools in emergency and critical care settings, providing real-time information that guides life-saving interventions such as rapid defibrillation, reperfusion therapy, and cardiac pacing in patients with arrhythmias, acute coronary syndrome (ACS), or conduction abnormalities [1,2]. Nurses are often the first healthcare professionals to assess critically ill patients in high-acuity settings, including intensive care units (ICUs) and emergency departments (EDs). Consequently, they are expected to rapidly obtain and accurately interpret ECG findings before physician or cardiology review [3,4].
Nonetheless, growing evidence indicates that nurses’ ECG interpretation skills remain inconsistent and frequently inadequate, particularly in identifying life-threatening rhythms such as ventricular tachycardia, ventricular fibrillation, and atrial fibrillation, as well as myocardial ischemia and conduction block [1]. These deficiencies have been associated with limited access to structured training, inconsistent educational support, and the absence of standardized assessment instruments in critical care nursing practice [3,4]. Furthermore, ECG interpretation knowledge and diagnostic accuracy tend to decline without ongoing practice or refresher training, creating a “use-it-or-lose-it” phenomenon that may compromise patient safety during time-sensitive emergencies [5,6].
Simultaneously, computerized ECG interpretation systems using computerized interpretation engines (CIEs) and artificial intelligence (AI)–based algorithms have emerged as potential tools to support frontline clinicians by providing rapid, standardized, and automated ECG analysis [2,6]. Kashou et al. [7] reported that incorporation of computerized ECG interpretation increased overall ECG interpretation accuracy among healthcare professionals by 15.1%. The same study also demonstrated a 10.3% improvement in ventricular rate determination accuracy and a modest 1.9% increase in mean QRS axis interpretation accuracy.
AI has also demonstrated strong performance in arrhythmia detection. For example, AI-based models achieved an area under the curve (AUC) of 0.87 for atrial fibrillation detection [8]. Similarly, the review conducted by Neupane et al. [9] reported that deep learning integration improved ECG interpretation performance and facilitated earlier detection of cardiac abnormalities. However, despite these advances, concerns remain regarding generalizability, algorithmic bias, interpretability, and excessive clinician reliance on AI systems when human ECG interpretation skills are not adequately maintained [1,6].
The central issue therefore lies at the intersection of human expertise and technological advancement. On one hand, nurses in EDs and ICUs demonstrate inconsistent ECG interpretation performance with potentially significant clinical consequences. On the other hand, despite their considerable potential, computerized and AI-based systems cannot yet be relied upon independently because of limitations related to bias, explainability, and generalizability [3,4].
Addressing this issue is important for two major reasons. First, improving nurse competency may help ensure that frontline clinicians retain the ability to identify and respond to critical ECG abnormalities in real time, even in the absence of technological support, thereby reducing the likelihood of delayed or inappropriate treatment [1,5]. Second, the integration of computerized and AI-based ECG interpretation systems as complementary, rather than substitutive, tools may improve diagnostic accuracy, enhance workflow efficiency, and provide an additional safety layer for both novice and experienced clinicians, ultimately improving patient outcomes in emergency and critical care settings [2,6].
Such a dual approach, combining sustained human competency development with responsible technological integration, represents an important strategy for addressing current competency gaps and improving the quality of acute cardiac care internationally [3,4].
For the purposes of this review, competency refers to the ability to correctly interpret ECG findings using standardized assessment tools or predefined performance thresholds. Accuracy refers to the proportion of correct ECG interpretations within a specific diagnostic domain, such as rhythm recognition or ischemia detection, whereas proficiency is used as a broader construct encompassing knowledge, interpretive skill, and applied clinical performance. These terms are used consistently throughout the manuscript to improve conceptual clarity.
Delayed recognition of malignant arrhythmias, misinterpretation of ischemic ST segment and T wave changes, and inaccurate identification of conduction abnormalities have been associated with delayed escalation of care, inappropriate treatment decisions, and increased morbidity risk during acute cardiac emergencies [10]. Observational studies have suggested that missed or incorrectly interpreted ECG findings by frontline nurses may contribute to delayed reperfusion therapy, delayed defibrillation, and prolonged time to definitive cardiac intervention, particularly in resource-limited or high-workload settings [3].
Concurrently, advances in computerized and AI-based ECG interpretation systems have demonstrated high diagnostic accuracy for arrhythmias, ACSs, and conduction abnormalities under controlled validation conditions [11]. Although these systems provide rapid, standardized, and reproducible ECG analysis, concerns remain regarding their generalizability, potential algorithmic bias, limited explainability, and the risk of clinician over-reliance [9]. Importantly, the existing literature has largely evaluated nurse ECG interpretation competency and AI-based ECG interpretation systems as separate domains, without systematically examining how limitations in human performance align with the strengths and weaknesses of computerized interpretation systems.
Therefore, an important research gap exists in understanding how nurse ECG interpretation proficiency and AI-based ECG interpretation performance intersect within real-world emergency and critical care settings. To our knowledge, no previous systematic review has synthesized evidence from both domains to evaluate whether AI systems can complement, rather than replace, nurse-led ECG interpretation in safety-critical environments. Addressing this gap is essential for informing evidence-based educational strategies, clinical governance, and the responsible integration of AI-supported ECG interpretation into acute care practice.
Accordingly, the aims of this review were to (1) evaluate the level of ECG interpretation competency among emergency and critical care nurses and identify factors associated with performance variation; and (2) synthesize evidence regarding the diagnostic performance, benefits, and limitations of computerized and AI-based ECG interpretation systems. By integrating evidence from both domains, this review seeks to examine the potential role of a complementary human-AI model in improving diagnostic accuracy and patient safety.
1. Study Design
The protocol for this review was registered in the International Prospective Register of Systematic Reviews (PROSPERO) under registration number CRD420251169307. The protocol specified predetermined outcomes and an analysis plan to enhance methodological transparency and reduce bias. Eligible study designs included cross-sectional studies, cohort studies, before-and-after studies, randomized controlled trials, mixed-methods studies, and quality improvement (QI) projects. For studies of computerized ECG interpretation, clinical validation studies, external test-set analyses, and methodological studies with quantitative diagnostic performance benchmarking were also eligible. The search was limited to studies published between January 2020 and October 2025. This study was reported in accordance with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines.
2. Eligibility Criteria
This systematic review used the Population, Interventions/Exposures, Comparators, Outcomes, and Study designs (PICOS) framework to define eligibility criteria across two analytical arms.
3. Inclusion Criteria
Arm A included registered nurses working in acute care environments, such as EDs, ICUs, coronary care units (CCUs), and telemetry units. These settings were selected because nurses in these contexts are routinely responsible for frontline ECG interpretation, consistent with previous competency studies conducted across multiple regions. Arm B included computerized ECG interpretation systems, including both traditional rule-based CIEs and machine learning or deep learning platforms evaluated using adult ECG datasets in clinical or realistic testing environments.
For interventions and exposures, Arm A focused on factors influencing nurse ECG interpretation competency, including formal ECG education, clinical experience, continuing professional development, and guideline-based or structured training programs. Arm B focused on exposure to computerized ECG interpretation systems designed to support rhythm recognition, ischemia detection, conduction abnormality analysis, and interval measurement. Comparators for Arm A included educational modalities, clinical units, experience levels, and pre-post training designs. Comparators for Arm B included expert cardiologist interpretation, traditional CIEs, and direct comparisons between human and AI-based interpretations. Table 1 summarizes the eligibility criteria according to PICOS.
4. Exclusion Criteria
For Arm A (nurse ECG interpretation competency), studies were excluded if they did not include registered nurses, did not involve nurses working in acute care environments such as EDs, ICUs, CCUs, or telemetry units, did not report ECG interpretation competency or accuracy, or were not published in English.
For Arm B (computerized and AI-based ECG interpretation systems), studies were excluded if they did not evaluate AI-based or computerized systems for ECG interpretation or were not published in English.
For both arms, studies were excluded if they focused exclusively on pediatric populations; were case reports, narrative reviews, or editorials; evaluated non-ECG cardiac monitoring tools, such as echocardiography; used simulation without measurable ECG interpretation outcomes; or reported insufficient or unavailable data [1,6].
5. Search Strategy and Selection Process
A comprehensive search strategy was developed to identify studies addressing: (1) ECG interpretation competency among emergency and critical care nurses and (2) the diagnostic accuracy, benefits, and limitations of computerized or AI-based ECG interpretation systems. The search strategy combined controlled vocabulary and free-text terms adapted to each database. For the human proficiency arm, the search combined the following terms: (“electrocardiogram” OR “ECG”) AND (nurs*) AND (“emergency” OR “critical care” OR “ICU” OR “CCU”) AND (“competenc*” OR “proficien*” OR “knowledge” OR “skill*”). The search strategy was developed from the study objectives, with key concepts derived from the research question and relevant keywords and synonyms incorporated accordingly.
For the computerized interpretation arm, the strategy included the following terms: (“computer” OR “algorithm” OR “diagnosis” OR “AI” OR “deep learning” OR “CNN”) AND (“ECG” OR “electrocardiogram”) AND (“accuracy” OR “sensitivity” OR “validation”), based on previous reviews and validation studies of computerized ECG interpretation [12-14].
The electronic databases searched were MEDLINE, Embase, CINAHL, Scopus, Cochrane CENTRAL, and IEEE Xplore. Additionally, IEEE Xplore was searched to identify computational studies. Additional sources, including gray literature, doctoral theses, society guidelines, and registered clinical trials, were also searched to minimize publication bias. Supplementary Material provides the detailed search strategy for each database. To ensure comprehensive coverage, the reference lists of included studies and forward citations were hand-searched.
The literature search was limited to studies published between January 2020 and October 2025 to capture contemporary evidence reflecting current nursing education standards, evolving emergency and critical care practice, and rapid advances in AI-based ECG interpretation technologies. Earlier studies were excluded because of substantial changes in ECG training frameworks, digital ECG acquisition, and the emergence of deep learning–based interpretation systems in recent years.
All records identified through database and register searches and other sources were imported into a reference management system, where automatic and manual deduplication were performed. A total of 723 records were identified, including 670 records from databases and registers and 53 records from other sources, including reference lists, conference abstracts, and gray literature.
After removing 350 duplicate records, 320 records underwent title and abstract screening by two independent reviewers using predefined eligibility criteria. During this stage, 258 records were excluded, and 62 reports were assessed for full-text eligibility. Among records identified from other sources, 47 were excluded after title and abstract screening, and six reports were further assessed for eligibility.
Full-text screening resulted in the exclusion of 46 reports because of nonclinical validation (n=10), algorithm description only (n=16), lack of focus on ECG interpretation competency (n=10), or lack of relevance to review outcomes (n=10). Ultimately, 22 studies met the inclusion criteria and were included in the review. Based on the analytical framework of the review, 16 studies were categorized into Arm A and six studies into Arm B (Table 2). Inter-rater agreement during title/abstract and full-text screening was high (Cohen’s κ=0.82), indicating strong consistency between reviewers. The study selection process for each arm is presented in Figure 1 in accordance with PRISMA 2020 guidelines.
6. Outcomes of Interest

1) Primary outcomes

The primary outcomes of this systematic review differed by analytical arm.
For Arm A (nurse ECG interpretation competency), primary outcomes included the proportion of nurses classified as competent according to each study’s predefined criteria and mean ECG interpretation scores expressed as the percentage of correct responses. Performance was assessed across clinically relevant diagnostic domains, including rhythm recognition, ischemia detection, interval interpretation, and accurate lead placement.
For Arm B (computerized and AI-based ECG interpretation systems), primary outcomes focused on diagnostic performance metrics, including sensitivity, specificity, positive predictive value, negative predictive value, F1-score, and AUC for detection of atrial fibrillation, ventricular tachycardia or fibrillation, ACSs, and conduction abnormalities. Quantitative accuracy of ECG interval measurements was also assessed and reported as absolute measurement error.

2) Secondary outcomes

Secondary outcomes for Arm A included factors associated with nurse ECG interpretation performance, such as educational level, years of clinical experience, workload, prior ECG training exposure, and professional certification. Evidence regarding knowledge retention, skill transfer to clinical practice, and durability of training effects over time was also examined.
For Arm B, secondary outcomes included workflow efficiency, turnaround time for ECG interpretation, influence on clinical decision-making, clinician confidence in system outputs, potential for misdiagnosis, cost-effectiveness considerations, and issues related to equity, bias, and fairness in algorithmic performance.
7. Risk-of-Bias/Quality Appraisal
Two reviewers independently conducted critical appraisal of each included study using validated instruments appropriate to study design. For cross-sectional and observational studies, the Joanna Briggs Institute critical appraisal checklists and the National Institutes of Health (NIH) Quality Assessment Tool were used. The Cochrane Risk of Bias 2 (RoB 2) tool and the ROBINS-I framework were used to assess randomized controlled trials and controlled before-and-after studies, where applicable. Mixed-methods and QI studies were assessed using the Mixed Methods Appraisal Tool (MMAT) or the Quality Improvement Minimum Quality Criteria Set (QI-MQCS). Studies assessing computerized or AI-based ECG interpretation were appraised using PROBAST-AI and QUADAS-2, with additional consideration of dataset shift, data leakage, ground-truth representativeness, and external validation.
All studies were rated and summarized qualitatively as having low, moderate, or high risk of bias, and disagreements were resolved by consensus. Overall, the appraisal summaries indicated that all 22 studies were of sufficient methodological quality for inclusion in the synthesis, although limitations such as small sample sizes, lack of blinding, and absence of external validation were noted in individual studies (Table 3) [1-4,6,7,12-27]. To ensure consistent and transparent risk-of-bias assessment across Arm A and Arm B, the recommended appraisal tool was used for each study design. Consequently, risk-of-bias results are reported according to the framework of each respective tool.
8. Data Extraction
Two reviewers independently extracted data using a pilot-tested, standardized data extraction form tailored to the two review domains: Arm A, nurse ECG interpretation proficiency; and Arm B, computerized ECG interpretation. For Arm A studies, extracted information included study country, healthcare facility, sample size, nursing cadre, years of experience, prior ECG training exposure, training type and duration, ECG exposure, assessment tools, scoring thresholds, domain-specific scores, predictors of performance, intervention characteristics, and evidence of knowledge or skill retention. Domain-specific scores included rhythm recognition, ischemia detection, interval interpretation, and lead placement.
For Arm B studies, extracted information included system type, categorized as either a traditional CIE or an AI/machine learning system; training and validation datasets; reference standards, such as expert panel review or angiographic confirmation; diagnostic tasks; performance metrics, including sensitivity, specificity, positive predictive value, negative predictive value, F1-score, and AUC; interval measurement errors in milliseconds; external validation; computational runtime; workflow or integration information; bias, fairness, and calibration analyses; and regulatory approval status, where reported. Disagreements between reviewers were resolved through discussion, and cross-verified datasets were extracted accordingly. Although extracted variables differed between Arm A and Arm B, results were synthesized using the most comparable performance measures available, such as accuracy and sensitivity.
9. Data Synthesis and Analysis
Because of substantial heterogeneity in study designs, assessment tools, outcome measures, and performance benchmarks, meta-analysis was not feasible. A structured narrative synthesis was therefore conducted in accordance with PRISMA 2020 recommendations.
For Arm A, results were synthesized by summarizing mean ECG interpretation scores, proportions of participants meeting predefined competency thresholds, domain-specific performance, such as arrhythmia recognition and ischemia detection, and reported predictors of competency. For Arm B, diagnostic performance metrics, including sensitivity, specificity, F1-score, and AUC, were summarized by diagnostic task. Findings were grouped thematically to allow within-arm comparison while avoiding direct quantitative comparison across fundamentally different outcome metrics.
1. Characteristics of Included Studies
The characteristics of the included studies are summarized in Table 2, including study design, setting, population or dataset characteristics, study arm classification, and primary outcomes [1-4,6,7,12-27]. The 22 included articles provided evidence on both nurse ECG interpretation proficiency and computerized ECG interpretation systems. Most nurse-focused studies in Arm A were conducted in the Middle East and Africa, including Jordan (n=2) [2,4], United States (n=1) [7], Saudi Arabia (n=1) [18], Palestine (n=2) [17,19], Iraq (n=1) [20], Egypt (n=1) [21], and Ethiopia (n=2) [16,22]. Additional studies were conducted in Iran (n=1) [3], South Korea (n=1) [23], and Australia (n=1) [24]. One large systematic review included international data [1]. Singh et al. synthesized regional evidence from India [6] and Dossel et al. reviewed computer modeling for ECG interpretation [25]. In contrast, Arm B studies of computerized ECG interpretation had broader geographic representation, including Italy/Belgium/Slovakia/Israel (n=1) [12], studies led by investigators in the United States (n=3) [13,15,26], and a global AI consortium study (n=1) [14]. In terms of design, cross-sectional observational surveys were the most common study type (n=12), followed by systematic or modeling reviews (n=4), methodological or validation studies evaluating AI and computerized interpretation systems (n=5), and one quasi-experimental study (n=1) [23]. Sample sizes varied widely, from 40 nurses in Egypt [21] to 932,711 ECGs used to train AI models in Belgium [12]. Among nurse-centered primary studies, sample sizes ranged from 100 nurses in Iraq [20] to 287 nurses in Jordan [4], with most studies including between 150 and 250 participants. Participants were predominantly ED, ICU, CCU, or telemetry-unit nurses, primarily bachelor’s-prepared and often early-career clinicians. Taken together, these studies provide a broad overview of human limitations in ECG interpretation and the capabilities of computerized and AI-assisted systems across diverse healthcare settings.
2. Arm A: Nurse ECG Interpretation Proficiency (Levels and Domains)
Across the 14 nurse-focused studies and two evidence syntheses, overall ECG interpretation competency among emergency and critical care nurses was generally below study-defined competency thresholds. Study-level mean knowledge or skill scores typically ranged from 43% to 68%, and the proportion of nurses classified as competent was usually below 40% when competency was defined using thresholds of 65% to 80% correct responses [1,4,16,18].
The proportion of nurses meeting competency criteria varied by assessment tool and cutoff. For example, 23.5% to 31.0% of emergency nurses in Ethiopia met the excellence threshold of ≥65%, 17.1% of nurses in the West Bank were classified as competent using a cutoff of ≥7.5/10, and 17.5% of nurses in Iran were classified as competent using the same cutoff. These low-to-moderate competency estimates were consistent with absolute mean scores: emergency nurses in Ethiopia scored 6.82/20 (34%), registered nurses in Australia scored 55% on a 20-item test, nurses in Saudi Arabia achieved 68% knowledge scores despite practice gaps, emergency and intensive care nurses in South Korea scored 13/20 (65%), and nurses in Iraq were categorized as having good, fair, and poor competency in 32%, 44%, and 24% of cases, respectively.
Domain-specific accuracy showed a similar pattern. Recognition of asystole and other readily apparent life-threatening rhythms was relatively stronger, often reaching 70% to 90% accuracy. In contrast, recurrent areas of poor performance included tachyarrhythmias, such as atrial fibrillation, supraventricular tachycardia, ventricular tachycardia, and ventricular fibrillation in some groups; atrioventricular block, particularly high-degree block; ischemia or myocardial infarction reflected by ST-T abnormalities; QT and QRS measurement or correction; axis determination; and accurate precordial lead placement. Many studies reported accuracy below 70% in these domains, with particularly low precision for advanced conduction blocks and myocardial infarction localization. This dispersion was also supported by evidence syntheses. Some ED cohorts demonstrated high rates of correct responses for selected rhythm-knowledge items, exceeding 90%, whereas intensive care and emergency samples showed very low accuracy, below 30%, for malignant ventricular arrhythmias. Differences between knowledge-based performance and practical skills, such as lead placement and interval interpretation, were also common [1,6].
Several factors were associated with higher ECG interpretation scores, including higher educational level, previous ECG coursework, especially face-to-face training lasting more than 20 hours, basic life support or advanced cardiovascular life support certification, ICU or CCU placement, and greater daily ECG exposure. In contrast, total years of clinical experience showed inconsistent or no association with performance. Confidence and recent evidence-seeking behavior were also positively correlated with ECG interpretation performance [17-19,23]. Short-term knowledge gains were reported after interventions such as team-based learning, lecture-discussion sessions, structured modules, and unit-level guidelines; however, without periodic refresher training, knowledge retention declined after the initial post-test period [1,23]. Overall, the evidence addressing Review Question 1 indicates low-to-moderate baseline ECG interpretation proficiency among ED and ICU nurses, with clinically important deficits in safety-critical domains, including tachyarrhythmias, atrioventricular blocks, ischemia, QT and QRS interpretation, and lead placement. These deficits can be modestly improved through structured education but appear to require repeated training and sustained clinical exposure to be maintained [1-4,6,16-19,21-24].
3. Arm B: Computerized/AI ECG Interpretation Performance (Levels and Comparisons)
Across six computerized or AI-focused studies and technology reviews, AI-enabled 12-lead ECG interpretation demonstrated high diagnostic performance across rhythm, ACS, conduction abnormality, ectopy, chamber enlargement, and axis-determination tasks. These systems frequently outperformed traditional CIEs and, in some settings, approached or exceeded expert benchmarks for specific diagnoses [12-15,26,27].
In a large multi-institutional validation study, a deep learning system achieved F1-scores of approximately 0.96 for rhythm interpretation, 0.93 for ACS detection, 0.89 for conduction block detection, 0.97 for ectopy detection, 0.97 for chamber enlargement detection, and 0.90 for axis interpretation. The same system achieved atrial fibrillation sensitivity of approximately 0.95 and specificity of approximately 1.00, with positive and negative predictive values of approximately 0.99. For ST-segment elevation myocardial infarction, sensitivity was approximately 0.99 and the F1-score was approximately 0.95. Interval measurement differences, such as QRS duration +3 msec and QT interval −4 msec, remained within International Electrotechnical Commission tolerance thresholds, indicating robust quantitative agreement [12].
In head-to-head comparisons, the AI system substantially reduced false-negative results compared with a state-of-the-art CIE, including a 41.7% reduction for atrial fibrillation and elimination of false negatives for ST-segment elevation myocardial infarction in the tested subset. The AI system also outperformed CIEs in challenging conduction diagnoses, including left posterior fascicular block and high-degree atrioventricular block [12]. Narrative and quantitative syntheses reported AUCs of approximately 0.91 to 0.97 for arrhythmia and ACS tasks across multiple AI models and datasets. They also reported AUCs of approximately 0.90 to 0.93 for detection of left ventricular ejection fraction ≤35% and credible performance for structural diseases, such as hypertrophic cardiomyopathy and amyloidosis, as well as electrolyte abnormalities. Emerging evidence also supports the potential use of AI-based ECG interpretation for predicting future atrial fibrillation and heart failure, although concerns remain regarding generalizability and bias [13,14].
Foundational enablers of AI-based ECG interpretation include improved data pipelines, such as ECG image digitization methods that achieve high-fidelity signal recovery and support the incorporation of legacy or paper ECGs into modern AI workflows. Artifact-rich paired image-signal datasets also support training and testing for robustness under real-world conditions [15,26]. Systems-level reviews emphasize that advances in sensors, wearable devices, wireless transmission, and standardized data formats may facilitate broader deployment, while also introducing privacy and integration challenges [27].
From the perspective of Review Question 2 the quantitative evidence indicates that AI systems now provide high task-specific accuracy and interval measurements within accepted standards. These systems may therefore serve as reliable adjuncts to human interpretation, particularly in domains where nurse proficiency is weakest, including tachyarrhythmias, atrioventricular blocks, and ACS. However, important limitations remain, including potential dataset shift, algorithmic bias, explainability gaps, and the need for external validation across populations and care pathways before routine, unmonitored use [12-14,20]. Accordingly, the most appropriate implementation strategy is human-AI complementarity: AI should be deployed for high-sensitivity triage and second-reading support, fail-safes should be embedded for critical alerts such as ST-segment elevation myocardial infarction, ventricular tachycardia, ventricular fibrillation, and atrioventricular block, and recurrent nurse education should be maintained to preserve human interpretive competency where AI may be uncertain or where contextual clinical synthesis is essential [1,3,6,12] (Table 4).
4. Integrated Synthesis of the Findings
Integrating findings from both domains indicates that ECG interpretation competency among emergency and critical care nurses is often inadequate, particularly for identifying arrhythmias, atrioventricular block, and ischemic changes. In contrast, AI-based interpretation systems demonstrate high diagnostic accuracy in these domains and generally outperform traditional CIEs. Nevertheless, AI-based systems remain limited by concerns regarding generalizability, potential bias, and explainability.
Collectively, these findings suggest that AI tools may improve ECG interpretation when used as adjuncts to nurse-led assessment. A human-AI approach could enhance diagnostic accuracy in critical care settings while preserving the need for clinical judgment, ongoing education, and appropriate oversight.
The results of this review reveal a consistent pattern across the included studies: nurses working in emergency and critical care settings generally demonstrated low-to-moderate ECG interpretation proficiency, whereas computerized and AI-driven systems showed high diagnostic accuracy for several critical cardiac conditions. The nurse-centered evidence indicates that, despite professional expectations for timely ECG interpretation, many nurses did not reach study-defined competency thresholds, with average test scores ranging from the low 40% range to the mid-60% range. These findings are consistent with previous studies [28,29]. Only a small proportion of nurses exceeded the competency standards specified in the included studies [4,16,18,20,24].
Consistent with Buluba et al. [28], the included studies converge on a key finding: nurses demonstrated relatively high accuracy in detecting gross abnormalities, such as asystole, whereas performance declined substantially when they were required to identify tachyarrhythmias, ischemic changes, atrioventricular blocks, and interval abnormalities. This pattern suggests that although more obvious ECG abnormalities may be recognized with reasonable accuracy, subtle and time-sensitive findings remain insufficiently detected, with potentially important implications for patient safety [1,6].
However, findings from studies evaluating computerized and AI-driven ECG interpretation showed a markedly different pattern. Deep learning models demonstrated high sensitivity and specificity for arrhythmias, ACS, and conduction abnormalities, with F1-scores and AUCs generally in the high-performance range [12-14]. These findings are consistent with Ribeiro et al. [30], who reported that AI-driven ECG interpretation outperformed cardiology residents in identifying six types of abnormalities on 12-lead ECG recordings, achieving F1-scores above 80% and specificity greater than 99%.
Errors in interval measurements also remained within acceptable international standards, further supporting the reliability of these systems. AI systems clearly outperformed conventional CIEs, particularly in conditions in which both human interpretation and CIE performance are often weakest, including atrioventricular block, posterior fascicular block, and ST-segment elevation myocardial infarction detection [31,32]. Direct comparisons showed that AI systems reduced false-negative findings and improved detection of life-threatening rhythms [7].
Although AI-based ECG interpretation systems demonstrated high diagnostic accuracy across multiple domains, these findings should be interpreted with caution. Most AI studies used large, curated datasets from high-income healthcare systems, often with expert-annotated reference standards. Variability in ECG acquisition quality, patient demographics, and disease prevalence may affect algorithm performance when these systems are deployed in different clinical contexts. Furthermore, concerns regarding dataset shift, algorithmic bias, and limited explainability underscore the importance of maintaining human oversight and domain expertise [9]. AI systems should therefore be viewed as supportive tools that enhance, rather than replace, clinical judgment.
Van de Leur et al. [32] reported that AI-based ECG interpretation achieved high diagnostic accuracy and reduced cardiologists’ ECG-reading workload. Consistent with these findings, this review found that AI-based systems already outperform traditional software and may soon approach, or function as a supplement to, specialist-level interpretation in selected diagnostic tasks [15,25-27].
Comparing the two evidence domains reveals both similarities and differences. Both sets of studies identified tachyarrhythmias and conduction abnormalities as important diagnostic challenges: for nurses, these represented knowledge and skill gaps, whereas for traditional CIEs, they have historically represented areas of weaker performance [3,4,13,14,16]. The key distinction lies in the expected trajectory of improvement. Human performance can improve after educational interventions but may decline rapidly without continued practice or refresher training, whereas AI performance is expected to improve as datasets expand and algorithms advance [1,12,21]. Another similarity is that both domains are vulnerable to limitations in generalizability. Nurse performance varies substantially across countries, institutions, and levels of prior training, whereas AI systems are constrained by dataset representativeness and may perform less reliably in underrepresented populations [13,14,17,19]. Thus, both domains are context-dependent: nurses rely on adequate training systems, and AI systems rely on inclusive datasets and robust validation procedures.
The divergences between the two domains are equally important. Human interpretation is inherently variable and can be influenced by overconfidence, educational background, workload, and clinical exposure, whereas validated AI systems may provide more consistent output under similar input conditions [23,33]. Among nurses, rhythm recognition tends to be stronger than ischemia interpretation, whereas AI systems often maintain high performance across multiple diagnostic domains, including domains in which human performance is weakest [12,18,24]. In addition, although years of clinical experience were not consistently associated with nurse competency, the volume and heterogeneity of training data were important determinants of AI reliability [14,17]. These differences suggest that human interpreters may be more vulnerable to errors in complex pattern recognition, whereas AI systems may excel at pattern detection but remain limited in contextual clinical interpretation.
Taken together, the comparative findings suggest that neither nurse-led ECG interpretation nor AI-based interpretation alone is sufficient to ensure patient safety in emergency and critical care settings. Nurses remain central to clinical decision-making, but current competency gaps may place patients at risk, especially when rapid and accurate interpretation is required. AI-based systems offer a strong adjunctive solution; however, they cannot yet replace human oversight because of concerns regarding trust, explainability, and generalizability [13,25]. The logical implication of this synthesis is a complementary model: nurses should receive standardized ECG education and periodic reassessment to maintain safe competency, while AI systems should serve as high-sensitivity safety nets, particularly in domains where human performance is weakest. This balanced approach acknowledges the limitations of both evidence domains while leveraging their respective strengths to improve diagnostic accuracy, workflow efficiency, and ultimately patient outcomes [1,2,6]. This conclusion is consistent with Kashou et al. [34], who recommended human over-reading to improve diagnostic accuracy.
This review has several strengths. Its dual-arm design synthesized evidence on both nurse ECG interpretation proficiency and computerized or AI-based ECG interpretation, allowing the findings from both domains to be compared and interpreted as potentially complementary. The inclusion of 22 studies from diverse geographic regions strengthens the external relevance of the findings. The methodology was rigorous, with a predefined protocol, duplicate screening, standardized data extraction, and validated risk-of-bias tools selected according to study design. In addition, the inclusion of both conventional clinical studies and recent AI validation studies provides a distinctive synthesis that connects gaps in human performance with the potential role of technological support, offering insights for education, clinical practice, and future research.
This review also has several limitations. Nurse-focused studies were predominantly cross-sectional, used heterogeneous competency assessment tools and thresholds, and included relatively few interventional designs. The rapidly evolving nature of AI research may also limit the long-term generalizability of the findings. In addition, nurse-focused studies were primarily conducted in low- and middle-income countries, whereas AI-based studies largely originated from high-income settings with advanced digital infrastructure, limiting direct comparability between the two evidence domains. Substantial heterogeneity in study design, assessment instruments, outcome measures, and performance thresholds across both arms precluded quantitative pooling and requires cautious interpretation of the synthesized findings.
In conclusion, this systematic review found that ECG interpretation competency among emergency and critical care nurses remains variable, with clinically important gaps in safety-critical domains, whereas AI-based ECG interpretation systems demonstrate high diagnostic performance under validated conditions. However, methodological heterogeneity, contextual differences, and implementation challenges limit broad generalization. These findings support a complementary human-AI model that emphasizes ongoing nurse education, structured competency assessment, and cautious integration of AI systems as decision-support tools rather than autonomous diagnostic solutions.
For nursing education, this review highlights the importance of integrating ECG interpretation programs into nursing curricula and maintaining ongoing competency assessment to support patient safety. In clinical practice, AI should be used as an adjunct to, rather than a substitute for, clinical judgment in ECG interpretation.

CONFLICTS OF INTEREST

The authors declared no conflict of interest.

AUTHORSHIP

Study conception and design acquisition - AHA; data collection - AHA and AAH; analysis and interpretation of the data - AHA; discussion and conclusions - suggestions - AAH; English review – AHA; abstract and references and final submission – AHA; drafting and critical revision of the manuscript – AHA.

FUNDING

None.

ACKNOWLEDGEMENT

None.

DATA AVAILABILITY STATEMENT

The data can be obtained from the corresponding authors.

Supplementary materials can be found via https://doi.org/10.7475/kjan.2025.1125.
Figure 1.
PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) flow diagram. ECG=electrocardiogram.
kjan-2025-1125f1.jpg
Table 1.
Summary of Eligibility Criteria according to the PICOS Framework
PICOS elements Arm A Arm B
Population Registered nurses working in emergency departments, intensive care units, coronary care units, and telemetry units Computerized electrocardiogram interpretation systems evaluated using adult electrocardiogram datasets
Interventions/exposures Formal electrocardiogram education, clinical experience, continuing education, guideline-based or structured training programs Use of computerized interpretation engines or artificial intelligence platforms for electrocardiogram analysis
Comparators Standard training versus structured programs; differences by unit, experience level, or pre-post intervention Expert cardiologist interpretation; traditional computerized engines versus artificial intelligence–based systems
Outcomes Electrocardiogram interpretation competency, accuracy by diagnostic domain, predictors of performance, training effects Diagnostic accuracy metrics, interval measurement error, comparison with human interpretation
Study designs Cross-sectional studies, cohort studies, before-after studies, randomized controlled trials, mixed-methods studies, quality improvement projects Diagnostic validation studies, external test-set analyses, methodological and performance evaluation studies

Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; PICOS=Population, Interventions/Exposures, Comparators, Outcomes, and Study Designs.

Table 2.
Characteristics of the Included Studies (Arm A and Arm B)
Studies Country/region Study Arm Study design Setting/data source Population or dataset Sample size/dataset size Main outcomes assessed
Ayasreh et al. [4] (2024) Jordan Arm A Cross-sectional Emergency departments Emergency nurses 287 Nurses ECG interpretation competency score; rhythm recognition
Belay et al. [16] (2024) Ethiopia Arm A Cross-sectional Adult emergency rooms ER nurses 252 Nurses ECG interpretation knowledge and practice
Chen et al. [1] (2022) International Arm A Systematic review Acute care settings Emergency and critical care nurses 24 Studies Nurse ECG competency, education effects
Dossel et al. [25] (2021) Germany Arm A Modeling review Computational ECG models Simulated and clinical ECGs N/A Computer modeling for ECG interpretation
Hasanien et al. [2] (2023) Jordan Arm A Cross-sectional ED and ICU Emergency and critical care nurses 210 Nurses 12-lead ECG proficiency and arrhythmia monitoring
Jalal [18] (2024) Saudi Arabia Arm A Cross-sectional Hospitals Registered nurses 204 Nurses ECG monitoring and interpretation competency
Jassim et al. [20] (2023) Iraq Arm A Cross-sectional Hospitals Nurses 100 Nurses ECG knowledge level
Kashou et al. [7] (2023) USA Arm A Observational Clinical settings Medical professionals 892 Clinicians Factors influencing ECG interpretation proficiency
Kim and Yoo [23] (2025) South Korea Arm A Cross-sectional ER and ICU Nurses 230 Nurses ECG confidence and educational needs
Mohammed Ali et al. [21] (2022) Egypt Arm A Quasi-experimental Hospital units Nurses 40 Nurses Effect of ECG guideline on performance
Ng and Christensen [24] (2024) Australia Arm A Cross-sectional Acute care units Registered nurses 120 Nurses ECG rhythm interpretation knowledge
Obied et al. [17] (2024) Palestine Arm A Cross-sectional Emergency departments Emergency nurses 266 Nurses ECG interpretation competency
Qaddumi et al. [19] (2025) Palestine Arm A Cross-sectional Hospitals Registered nurses 198 Nurses ECG interpretation and arrhythmia management
Rahimpour et al. [3] (2021) Iran Arm A Comparative cross-sectional ED and EMS Nurses and EMS personnel 180 Participants ECG interpretation accuracy
Singh et al. [6] (2022) India Arm A Systematic review Acute care Nurses 18 Studies ECG interpretation competency
T/Mariam et al. [22] (2024) Ethiopia Arm A Cross-sectional ED and ICU Nurses 225 Nurses ECG knowledge and practice
Fortune et al. [26] (2022) USA Arm B Methodological validation ECG image digitization ECG images/signals 12,000 ECGs Signal recovery accuracy
Herman et al. [12] (2024) Europe (multi-country) Arm B Validation study Clinical ECG databases Adult ECGs About 930,000 ECGs AI diagnostic accuracy (AUC, F1, sensitivity)
Husain et al. [27] (2021) International Arm B Technical review ECG hardware/software ECG sensor systems N/A ECG interoperability and digital integration
Muzammil et al. [14] (2024) International Arm B AI validation study Multicenter datasets Adult ECG datasets >500,000 ECGs AI-based ECG diagnosis performance
Ose et al. [13] (2024) USA Arm B AI review Clinical ECG systems AI ECG platforms N/A Benefits and limitations of AI ECG interpretation
Reyna et al. [15] (2024) USA Arm B Dataset development ECG image database ECG images 67,000 ECG images Dataset quality and artifact robustness

AI=artificial intelligence; Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; AUC=area under the receiver operating characteristic curve; ECG=electrocardiogram; ED=emergency department; EMS=emergency medical services; ER=emergency room; ICU=intensive care unit; N/A=not applicable.

Table 3.
Risk of Bias and Quality Appraisal of Included Studie
Studies Study design Appraisal tool Key appraisal findings Risk-of-bias judgment Final decision
Ayasreh et al. [4] (2024) Cross-sectional Joanna Briggs Institute checklist Clear sampling strategy; reliance on self-reported data; no longitudinal follow-up Moderate Included
Belay et al. [16] (2024) Cross-sectional Joanna Briggs Institute checklist Multicenter design strengthened validity; recall bias possible; no objective clinical outcomes Moderate Included
Chen et al. [1] (2022) Systematic review Joanna Briggs Institute systematic review checklist Comprehensive search strategy; high heterogeneity; some reporting gaps Low to moderate Included
Dossel et al. [25] (2021) Modeling review QUADAS-2 (adapted) Strong conceptual framework; no empirical dataset; limited external validation High Included (contextual evidence)
Fortune et al. [26] (2022) Methodological validation study QUADAS-2 Robust digitization and signal recovery metrics; no direct diagnostic outcomes Moderate Included
Hasanien et al. [2] (2023) Cross-sectional Joanna Briggs Institute checklist Standardized assessment tool; single-country setting; modest sample size Low to moderate Included
Herman et al. [12] (2024) Artificial intelligence validation study PROBAST-AI Robust external validation; high diagnostic accuracy; potential dataset shift concerns Low Included
Husain et al. [27] (2021) Technical review QUADAS-2 (adapted) Comprehensive hardware and software perspective; not focused on diagnostic accuracy Moderate Included (technical context)
Jalal [18] (2024) Cross-sectional Joanna Briggs Institute checklist Appropriate sample size; limited validation of competency tool; self-report bias Moderate Included
Jassim et al. [20] (2023) Cross-sectional Joanna Briggs Institute checklist Small sample size; basic assessment instrument; no retention assessment High Included
Kashou et al. [7] (2023) Observational study National Institutes of Health quality assessment tool Good survey methodology; voluntary participation; limited generalizability Moderate Included
Kim and Yoo [23] (2025) Cross-sectional Joanna Briggs Institute checklist Adequate sample size; robust assessment tool; reliance on self-confidence measures Low to moderate Included
Mohammed Ali et al. [21] (2022) Quasi-experimental study ROBINS-I Non-randomized design; modest intervention effects; risk of confounding Moderate to high Included
Muzammil et al. [14] (2024) Artificial intelligence validation study PROBAST-AI Large and diverse dataset; external validation performed; equity and bias concerns Low to moderate Included
Ng and Christensen [24] (2024) Cross-sectional Joanna Briggs Institute checklist Validated test instrument; potential selection bias Moderate Included
T/Mariam   et al. [22] (2024) Cross-sectional Joanna Briggs Institute checklist Hospital-based sample; moderate size; training exposure insufficiently described Moderate Included
Obied et al. [17] (2024) Cross-sectional Joanna Briggs Institute checklist Large sample size; regional focus; limited competency assessment tool Moderate Included
Ose et al. [13] (2024) Artificial intelligence review QUADAS-2 Comprehensive synthesis; variability in external validation across included studies Moderate Included
Qaddumi et al. [19] (2025) Cross-sectional Joanna Briggs Institute checklist Adequate sample size; limited reporting on assessment validity Moderate Included
Rahimpour et al. [3] (2021) Cross-sectional Joanna Briggs Institute checklist Nurse versus emergency medical services comparison; self-report bias Moderate Included
Reyna et al. [15] (2024) Methodological dataset development study QUADAS-2 High-quality dataset; not designed to evaluate diagnostic performance Moderate Included
Singh et al. [6] (2022) Systematic review Joanna Briggs Institute systematic review checklist Broad evidence synthesis; substantial methodological variation Moderate Included

PROBAST-AI=Prediction Model Risk of Bias Assessment Tool–Artificial Intelligence; QUADAS-2=Quality Assessment of Diagnostic Accuracy Studies 2; ROBINS-I=Risk of Bias in Non-randomized Studies—of Interventions.

Table 4.
Quantitative Synthesis of ECG Interpretation Proficiency (Arm A) and Computerized/AI Systems (Arm B)
Dimension Arm A (14 primary studies+2 reviews) Arm B (6 validation and review studies)
Overall performance Mean interpretation scores: from 43% to 68%. High diagnostic performance with AUC 0.91–0.97.
Fewer than 40% of nurses met predefined competency thresholds (typically ≥65%–80% correct). F1-scores: 0.93–0.97.
Lowest performance: emergency department nurses in Jordan (about 15% competent; mean score 4.35/10). Sensitivity often above 94%.
Highest performance: Iranian emergency departments (about 38% competent).  - Specificity approached 98%–100%, particularly for atrial fibrillation, acute coronary syndromes, and malignant ventricular arrhythmias.
Domain-specific performance Strengths: Recognition of asystole and gross lethal rhythms (typically ≥70%–90% accuracy). Consistent accuracy: arrhythmia detection, acute coronary syndromes, conduction abnormalities, ectopy, chamber enlargement, and axis determination.
Weaknesses: Tachyarrhythmias atrioventricular blocks, ischemic ST–T changes, interval measurement, and accurate lead placement, often with ≤50% accuracy in multiple cohorts. Small interval errors: interval measurements were small (for example, QRS approximately +3 msec, corrected QT approximately −4 msec), remaining within accepted technical standards.
Education interventions and retention Structured educational interventions, including team-based learning, guideline-based training, and formal modules, resulted in short-term score improvements of approximately 20%–25%. No retraining is needed for clinicians.
Score improvement declined over time without refresher training. External validation across populations and settings is essential.
Accurate in independent test-sets among studies.
Predictors/moderators of performance Positive predictors: higher educational level, >20-hour ECG courses, professional certification. Model performance was strongly influenced by dataset size, diversity, and quality of reference standards (expert adjudication or angiographic confirmation).
ICU, CCU placement ECG exposure, and higher self-confidence. Years of general clinical experience showed inconsistent or no association with performance. Bias and reduced generalizability were reported when training datasets lacked demographic or clinical diversity.
Human vs. computerized interpretation (head-to-head evidence) In studies directly comparing performance, nurses demonstrated less than 50% accuracy in ischemia detection and advanced arrhythmia recognition, indicating clinically significant safety gaps in high-risk scenarios. AI reduced false-negative rates compared with traditional computerized interpretation engines. Including marked reductions for atrial fibrillation. No missed acute coronary syndrome cases in validated studies. Performance was superior for complex conduction abnormalities.
Strengths of evidence base Multinational, real-world studies. Identification of specific, safety-critical competency gaps relevant to frontline practice. High diagnostic accuracy across multiple ECG domains; reproducible quantitative performance, scalable and meets technical standards.
Key limitations Varied methods, mostly cross-sectional studies; limited randomized or longitudinal intervention data. Risk of dataset shift, algorithmic bias, limited explainability, and uneven regulatory approval; most studies conducted in high-income settings, limiting generalizability to low-resource environments.
Rare assessment for retention knowledge.
Implications for review question 1 Baseline ECG interpretation competency among emergency and critical care nurses is low to moderate, with persistent deficits in safety-critical domains. AI provides a high-accuracy complementary tool, particularly in domains where human performance is weakest.
Competency improves with structured education but is not sustained without reinforcement. Optimal practice supports a human-AI partnership with ongoing nurse training and appropriate.

Sensitivity: percentage of true positive correctly identified; Specificity: percentage of true negatives correctly identified; F1-score: accuracy.

AI=artificial intelligence; Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; AUC=area under the receiver operating characteristic curve; CCU=coronary care unit; ECG=electrocardiogram; F1-score=harmonic mean of precision and recall; ICU=intensive care unit; QRS=QRS complex; QT=QT interval; ST–T=ST segment and T wave.

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Download Citation

      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:

      Include:

      Assessing the Proficiency of Emergency and Critical Care Nurses in Electrocardiogram Interpretation and the Integration of Computerized Electrocardiogram Analysis—Benefits and Limitations: A Systematic Review
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Assessing the Proficiency of Emergency and Critical Care Nurses in Electrocardiogram Interpretation and the Integration of Computerized Electrocardiogram Analysis—Benefits and Limitations: A Systematic Review
      Close

      Figure

      • 0
      Assessing the Proficiency of Emergency and Critical Care Nurses in Electrocardiogram Interpretation and the Integration of Computerized Electrocardiogram Analysis—Benefits and Limitations: A Systematic Review
      Image
      Figure 1. PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) flow diagram. ECG=electrocardiogram.
      Assessing the Proficiency of Emergency and Critical Care Nurses in Electrocardiogram Interpretation and the Integration of Computerized Electrocardiogram Analysis—Benefits and Limitations: A Systematic Review
      PICOS elements Arm A Arm B
      Population Registered nurses working in emergency departments, intensive care units, coronary care units, and telemetry units Computerized electrocardiogram interpretation systems evaluated using adult electrocardiogram datasets
      Interventions/exposures Formal electrocardiogram education, clinical experience, continuing education, guideline-based or structured training programs Use of computerized interpretation engines or artificial intelligence platforms for electrocardiogram analysis
      Comparators Standard training versus structured programs; differences by unit, experience level, or pre-post intervention Expert cardiologist interpretation; traditional computerized engines versus artificial intelligence–based systems
      Outcomes Electrocardiogram interpretation competency, accuracy by diagnostic domain, predictors of performance, training effects Diagnostic accuracy metrics, interval measurement error, comparison with human interpretation
      Study designs Cross-sectional studies, cohort studies, before-after studies, randomized controlled trials, mixed-methods studies, quality improvement projects Diagnostic validation studies, external test-set analyses, methodological and performance evaluation studies
      Studies Country/region Study Arm Study design Setting/data source Population or dataset Sample size/dataset size Main outcomes assessed
      Ayasreh et al. [4] (2024) Jordan Arm A Cross-sectional Emergency departments Emergency nurses 287 Nurses ECG interpretation competency score; rhythm recognition
      Belay et al. [16] (2024) Ethiopia Arm A Cross-sectional Adult emergency rooms ER nurses 252 Nurses ECG interpretation knowledge and practice
      Chen et al. [1] (2022) International Arm A Systematic review Acute care settings Emergency and critical care nurses 24 Studies Nurse ECG competency, education effects
      Dossel et al. [25] (2021) Germany Arm A Modeling review Computational ECG models Simulated and clinical ECGs N/A Computer modeling for ECG interpretation
      Hasanien et al. [2] (2023) Jordan Arm A Cross-sectional ED and ICU Emergency and critical care nurses 210 Nurses 12-lead ECG proficiency and arrhythmia monitoring
      Jalal [18] (2024) Saudi Arabia Arm A Cross-sectional Hospitals Registered nurses 204 Nurses ECG monitoring and interpretation competency
      Jassim et al. [20] (2023) Iraq Arm A Cross-sectional Hospitals Nurses 100 Nurses ECG knowledge level
      Kashou et al. [7] (2023) USA Arm A Observational Clinical settings Medical professionals 892 Clinicians Factors influencing ECG interpretation proficiency
      Kim and Yoo [23] (2025) South Korea Arm A Cross-sectional ER and ICU Nurses 230 Nurses ECG confidence and educational needs
      Mohammed Ali et al. [21] (2022) Egypt Arm A Quasi-experimental Hospital units Nurses 40 Nurses Effect of ECG guideline on performance
      Ng and Christensen [24] (2024) Australia Arm A Cross-sectional Acute care units Registered nurses 120 Nurses ECG rhythm interpretation knowledge
      Obied et al. [17] (2024) Palestine Arm A Cross-sectional Emergency departments Emergency nurses 266 Nurses ECG interpretation competency
      Qaddumi et al. [19] (2025) Palestine Arm A Cross-sectional Hospitals Registered nurses 198 Nurses ECG interpretation and arrhythmia management
      Rahimpour et al. [3] (2021) Iran Arm A Comparative cross-sectional ED and EMS Nurses and EMS personnel 180 Participants ECG interpretation accuracy
      Singh et al. [6] (2022) India Arm A Systematic review Acute care Nurses 18 Studies ECG interpretation competency
      T/Mariam et al. [22] (2024) Ethiopia Arm A Cross-sectional ED and ICU Nurses 225 Nurses ECG knowledge and practice
      Fortune et al. [26] (2022) USA Arm B Methodological validation ECG image digitization ECG images/signals 12,000 ECGs Signal recovery accuracy
      Herman et al. [12] (2024) Europe (multi-country) Arm B Validation study Clinical ECG databases Adult ECGs About 930,000 ECGs AI diagnostic accuracy (AUC, F1, sensitivity)
      Husain et al. [27] (2021) International Arm B Technical review ECG hardware/software ECG sensor systems N/A ECG interoperability and digital integration
      Muzammil et al. [14] (2024) International Arm B AI validation study Multicenter datasets Adult ECG datasets >500,000 ECGs AI-based ECG diagnosis performance
      Ose et al. [13] (2024) USA Arm B AI review Clinical ECG systems AI ECG platforms N/A Benefits and limitations of AI ECG interpretation
      Reyna et al. [15] (2024) USA Arm B Dataset development ECG image database ECG images 67,000 ECG images Dataset quality and artifact robustness
      Studies Study design Appraisal tool Key appraisal findings Risk-of-bias judgment Final decision
      Ayasreh et al. [4] (2024) Cross-sectional Joanna Briggs Institute checklist Clear sampling strategy; reliance on self-reported data; no longitudinal follow-up Moderate Included
      Belay et al. [16] (2024) Cross-sectional Joanna Briggs Institute checklist Multicenter design strengthened validity; recall bias possible; no objective clinical outcomes Moderate Included
      Chen et al. [1] (2022) Systematic review Joanna Briggs Institute systematic review checklist Comprehensive search strategy; high heterogeneity; some reporting gaps Low to moderate Included
      Dossel et al. [25] (2021) Modeling review QUADAS-2 (adapted) Strong conceptual framework; no empirical dataset; limited external validation High Included (contextual evidence)
      Fortune et al. [26] (2022) Methodological validation study QUADAS-2 Robust digitization and signal recovery metrics; no direct diagnostic outcomes Moderate Included
      Hasanien et al. [2] (2023) Cross-sectional Joanna Briggs Institute checklist Standardized assessment tool; single-country setting; modest sample size Low to moderate Included
      Herman et al. [12] (2024) Artificial intelligence validation study PROBAST-AI Robust external validation; high diagnostic accuracy; potential dataset shift concerns Low Included
      Husain et al. [27] (2021) Technical review QUADAS-2 (adapted) Comprehensive hardware and software perspective; not focused on diagnostic accuracy Moderate Included (technical context)
      Jalal [18] (2024) Cross-sectional Joanna Briggs Institute checklist Appropriate sample size; limited validation of competency tool; self-report bias Moderate Included
      Jassim et al. [20] (2023) Cross-sectional Joanna Briggs Institute checklist Small sample size; basic assessment instrument; no retention assessment High Included
      Kashou et al. [7] (2023) Observational study National Institutes of Health quality assessment tool Good survey methodology; voluntary participation; limited generalizability Moderate Included
      Kim and Yoo [23] (2025) Cross-sectional Joanna Briggs Institute checklist Adequate sample size; robust assessment tool; reliance on self-confidence measures Low to moderate Included
      Mohammed Ali et al. [21] (2022) Quasi-experimental study ROBINS-I Non-randomized design; modest intervention effects; risk of confounding Moderate to high Included
      Muzammil et al. [14] (2024) Artificial intelligence validation study PROBAST-AI Large and diverse dataset; external validation performed; equity and bias concerns Low to moderate Included
      Ng and Christensen [24] (2024) Cross-sectional Joanna Briggs Institute checklist Validated test instrument; potential selection bias Moderate Included
      T/Mariam   et al. [22] (2024) Cross-sectional Joanna Briggs Institute checklist Hospital-based sample; moderate size; training exposure insufficiently described Moderate Included
      Obied et al. [17] (2024) Cross-sectional Joanna Briggs Institute checklist Large sample size; regional focus; limited competency assessment tool Moderate Included
      Ose et al. [13] (2024) Artificial intelligence review QUADAS-2 Comprehensive synthesis; variability in external validation across included studies Moderate Included
      Qaddumi et al. [19] (2025) Cross-sectional Joanna Briggs Institute checklist Adequate sample size; limited reporting on assessment validity Moderate Included
      Rahimpour et al. [3] (2021) Cross-sectional Joanna Briggs Institute checklist Nurse versus emergency medical services comparison; self-report bias Moderate Included
      Reyna et al. [15] (2024) Methodological dataset development study QUADAS-2 High-quality dataset; not designed to evaluate diagnostic performance Moderate Included
      Singh et al. [6] (2022) Systematic review Joanna Briggs Institute systematic review checklist Broad evidence synthesis; substantial methodological variation Moderate Included
      Dimension Arm A (14 primary studies+2 reviews) Arm B (6 validation and review studies)
      Overall performance Mean interpretation scores: from 43% to 68%. High diagnostic performance with AUC 0.91–0.97.
      Fewer than 40% of nurses met predefined competency thresholds (typically ≥65%–80% correct). F1-scores: 0.93–0.97.
      Lowest performance: emergency department nurses in Jordan (about 15% competent; mean score 4.35/10). Sensitivity often above 94%.
      Highest performance: Iranian emergency departments (about 38% competent).  - Specificity approached 98%–100%, particularly for atrial fibrillation, acute coronary syndromes, and malignant ventricular arrhythmias.
      Domain-specific performance Strengths: Recognition of asystole and gross lethal rhythms (typically ≥70%–90% accuracy). Consistent accuracy: arrhythmia detection, acute coronary syndromes, conduction abnormalities, ectopy, chamber enlargement, and axis determination.
      Weaknesses: Tachyarrhythmias atrioventricular blocks, ischemic ST–T changes, interval measurement, and accurate lead placement, often with ≤50% accuracy in multiple cohorts. Small interval errors: interval measurements were small (for example, QRS approximately +3 msec, corrected QT approximately −4 msec), remaining within accepted technical standards.
      Education interventions and retention Structured educational interventions, including team-based learning, guideline-based training, and formal modules, resulted in short-term score improvements of approximately 20%–25%. No retraining is needed for clinicians.
      Score improvement declined over time without refresher training. External validation across populations and settings is essential.
      Accurate in independent test-sets among studies.
      Predictors/moderators of performance Positive predictors: higher educational level, >20-hour ECG courses, professional certification. Model performance was strongly influenced by dataset size, diversity, and quality of reference standards (expert adjudication or angiographic confirmation).
      ICU, CCU placement ECG exposure, and higher self-confidence. Years of general clinical experience showed inconsistent or no association with performance. Bias and reduced generalizability were reported when training datasets lacked demographic or clinical diversity.
      Human vs. computerized interpretation (head-to-head evidence) In studies directly comparing performance, nurses demonstrated less than 50% accuracy in ischemia detection and advanced arrhythmia recognition, indicating clinically significant safety gaps in high-risk scenarios. AI reduced false-negative rates compared with traditional computerized interpretation engines. Including marked reductions for atrial fibrillation. No missed acute coronary syndrome cases in validated studies. Performance was superior for complex conduction abnormalities.
      Strengths of evidence base Multinational, real-world studies. Identification of specific, safety-critical competency gaps relevant to frontline practice. High diagnostic accuracy across multiple ECG domains; reproducible quantitative performance, scalable and meets technical standards.
      Key limitations Varied methods, mostly cross-sectional studies; limited randomized or longitudinal intervention data. Risk of dataset shift, algorithmic bias, limited explainability, and uneven regulatory approval; most studies conducted in high-income settings, limiting generalizability to low-resource environments.
      Rare assessment for retention knowledge.
      Implications for review question 1 Baseline ECG interpretation competency among emergency and critical care nurses is low to moderate, with persistent deficits in safety-critical domains. AI provides a high-accuracy complementary tool, particularly in domains where human performance is weakest.
      Competency improves with structured education but is not sustained without reinforcement. Optimal practice supports a human-AI partnership with ongoing nurse training and appropriate.
      Table 1. Summary of Eligibility Criteria according to the PICOS Framework

      Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; PICOS=Population, Interventions/Exposures, Comparators, Outcomes, and Study Designs.

      Table 2. Characteristics of the Included Studies (Arm A and Arm B)

      AI=artificial intelligence; Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; AUC=area under the receiver operating characteristic curve; ECG=electrocardiogram; ED=emergency department; EMS=emergency medical services; ER=emergency room; ICU=intensive care unit; N/A=not applicable.

      Table 3. Risk of Bias and Quality Appraisal of Included Studie

      PROBAST-AI=Prediction Model Risk of Bias Assessment Tool–Artificial Intelligence; QUADAS-2=Quality Assessment of Diagnostic Accuracy Studies 2; ROBINS-I=Risk of Bias in Non-randomized Studies—of Interventions.

      Table 4. Quantitative Synthesis of ECG Interpretation Proficiency (Arm A) and Computerized/AI Systems (Arm B)

      Sensitivity: percentage of true positive correctly identified; Specificity: percentage of true negatives correctly identified; F1-score: accuracy.

      AI=artificial intelligence; Arm A=nurse electrocardiogram interpretation; Arm B=computerized/artificial intelligence electrocardiogram interpretation; AUC=area under the receiver operating characteristic curve; CCU=coronary care unit; ECG=electrocardiogram; F1-score=harmonic mean of precision and recall; ICU=intensive care unit; QRS=QRS complex; QT=QT interval; ST–T=ST segment and T wave.

      TOP