
Background: Systematic reviews (SRs) are a cornerstone of evidence-based medicine and clinical guideline development, but are increasingly resource-intensive. The review process typically involves protocol development, title and abstract screening, full-text assessment, data extraction, and meta-synthesis. Among these steps, title and abstract screening is one of the most labor-intensive, requiring large numbers of publications to be assessed against predefined eligibility criteria. Current methodological standards recommend that at least two independent reviewers perform screening to reduce bias and enhance robustness, substantially increasing time and personnel requirements.Tools that support the screening phase may therefore offer important efficiency gains. Recent advances in artificial intelligence (AI) enable systems that apply explicit, reviewer-defined eligibility criteria to assist with screening decisions, offering the potential to support reviewers while maintaining methodological rigor. One such system is Silvi®, a tool that supports multiple phases of the SR workflow. Silvi®’s basic non-AI functionalities have been used in multiple published reviews [1-2].
Objectives: To assess the efficiency and accuracy of AI-assisted title and abstract screening in a large-scale rheumatology SR, comparing performance across differing levels of reviewer-defined input.
Methods: This study was designed to reflect realistic title and abstract screening workflows used by healthcare researchers without AI expertise. Screening data were obtained from an SR on the incidence and prevalence of venous thromboembolism in inflammatory rheumatic diseases (PROSPERO-ID: CRD42024543792), comprising 28,928 records identified through database searches in EMBASE, MEDLINE, Scopus, and CENTRAL. Title and abstract screening was performed independently by two human reviewers according to a predefined protocol, resulting in 738 eligible studies. Human screening decisions served as the reference standard. The AI system was evaluated across three predefined screening scenarios that differed in the amount of reviewer-defined input. Scenario 1 included brief inclusion and exclusion criteria corresponding to the direct application of the review protocol in Silvi®. Scenario 2 used more detailed and explicit eligibility descriptions, estimated to require approximately two hours of preparation. Scenario 3 used the same criteria as Scenario 2 and provided additional examples of previously screened studies labelled as included or excluded, with the option for corrective feedback. The AI system was used as an off-the-shelf tool. For each record, the system generated a binary recommendation (include or exclude) that was directly compared with the human screening outcomes. Performance was assessed using overall agreement and confusion matrices, with particular emphasis on false negative classifications, as they can affect review validity. To estimate screening efficiency, reviewers recorded screening speed during conventional screening, expressed as records per hour and adjusted to reflect independent double screening. For AI-assisted scenarios, total screening time was estimated by combining preparation time, AI processing time (minutes), and the estimated human time required to review false-positive and false-negative classifications, based on the reference screening speed. Assuming that AI misclassifications are detectable through human oversight, facilitated by system-provided uncertainty indicators and transparent decision rationales, these estimates were used to quantify potential workload savings.
Results: The reference SR comprised 28,928 unique records screened at the title and abstract level, of which 738 studies proceeded to full-text assessment. Human reviewers screened an average of 529.8 records per hour.
Across scenarios, AI-assisted screening performance improved as reviewer-defined input increased (see Table 1). In Scenario 1, using only brief eligibility descriptions, raw agreement with human screening was 93%, with 102 false negatives and 1,921 false positives. This scenario was associated with an estimated time saving of 101.6 hours compared with conventional screening. In Scenario 2, where more detailed eligibility descriptions were provided, raw agreement increased to 95%, with 73 false negatives and 1,382 false positives, corresponding to an estimated time saving of 101.7 hours. The highest performance was observed in Scenario 3, combining detailed eligibility criteria with a small number of previously screened example studies. Raw agreement reached 96.7%, with 55 false negatives and 912 false positives, and an estimated time saving of 103.6 hours.
Conclusions: AI-assisted title and abstract screening reduced the estimated screening workload by 101.5–103.5 hours (93–95%), corresponding to a 14–19-fold reduction compared with double-blinded human screening. Agreement with human decisions increased with more detailed reviewer input, while false negatives decreased from 102 to 55 across scenarios. Scenario 3 achieved the highest agreement and lowest number of false negatives, suggesting a favorable trade-off for large-scale SRs with an embedded human-in-the-loop workflow. When applied within a standard systematic review protocol, AI-assisted screening using Silvi® allows large volumes of records to be triaged quickly with high agreement, while experts retain complete control over uncertain or high-risk decisions.
REFERENCES: [1] Holm-Larsen T, Van den Ende M, Wendelboe HG, Verbakel I, Kheir GB, Hervé F, Everaert K. The cost of lifelong LUTS-A systematic literature review. Neurourol Urodyn. 2024 Jun;43(5):1058-1065. doi: 10.1002/nau.25389. Epub 2024 Jan 25. PMID: 38270351.
[2] Everaert K, Holm-Larsen T, Bou Kheir G, Rottey S, Weiss JP, Vande Walle J, Kabarriti AE, Dossche L, Hervé F, Spinoit AF, Nørgaard JP, Juul KV. Potential clinical applications of current and future oral forms of desmopressin (Review). Exp Ther Med. 2024 May 29;28(2):303. doi:10.3892/etm.2024.12592. PMID: 38873038
Acknowledgments: NIL.
Disclosure of Interests: Dina Husum: None declared, Caroline Moos: None declared, Karen Schreiber: None declared, Tove Holm-Larsen Silvi(R), CEO of Silvi(R), Rasmus Hvingelby Silvi(R), CTO of Silvi(R).