Extracting post-acute sequelae of SARS-CoV-2 infection symptoms from clinical notes via hybrid natural language processing

Bai, Z; Xu, Z; Sun, C; et al., npj Health Systems, August 2025

View Publication on PubMed

August 2025
npj Health Systems

Short Summary

In this RECOVER study, researchers created a new computer tool to help doctors identify Long COVID more quickly and accurately. The tool, called a hybrid natural language processing (NLP) pipeline, quickly scans patients’ electronic health records (EHRs) to find descriptions of symptoms and figure out if a patient has them. Researchers tested this tool across 11 US health systems and found it was very accurate at identifying the right Long COVID symptoms. Patient EHRs contain a lot of important information, but it can take a long time to find specific symptoms across thousands of records. Researchers can now look at large amounts of health records from thousands of patients at once using the hybrid NLP pipeline technology. This study is important because it gives researchers a faster, more accurate way to identify Long COVID.

This summary was prepared by the RECOVER Initiative.

Publication Details

DOI: 10.1038/s44401-025-00033-4

Abstract

Accurately and efficiently diagnosing Post-Acute Sequelae of COVID-19 (PASC) remains challenging due to its myriad symptoms that evolve over long- and variable-time intervals. To address this issue, we developed a hybrid natural language processing pipeline that integrates rule-based named entity recognition with BERT-based assertion detection modules for PASC-symptom extraction and assertion detection from clinical notes. We developed a comprehensive PASC lexicon with clinical specialists. From 11 health systems of the RECOVER initiative network across the U.S., we curated 160 intake progress notes for model development and evaluation, and collected 47,654 progress notes for a population-level prevalence study. We achieved an average F1 score of 0.82 in one-site internal validation and 0.76 in 10-site external validation for assertion detection. Our pipeline processed each note at 2.448 ± 0.812 seconds on average. Spearman correlation tests showed ρ > 0.83 for positive mentions and ρ > 0.72 for negative ones, both with P < 0.0001. These demonstrate the effectiveness and efficiency of our models and its potential for improving PASC diagnosis.

Authors

Zilong Bai, Zihan Xu, Cong Sun, Chengxi Zang, H Timothy Bunnell, Catherine Sinfield, Jacqueline Rutter, Aaron Thomas Martinez, L Charles Bailey, Mark Weiner, Thomas R Campion, Thomas W Carton, Christopher B Forrest, Rainu Kaushal, Fei Wang, Yifan Peng

Keywords

Not available