A natural language processing pipeline for identifying pediatric long COVID symptoms and functional impacts in freeform clinical notes: A RECOVER study
Bunnell, HT; Reedy, C; Lorman, V; et al., JAMIA Open
Published
October 2025
Journal
JAMIA Open
Abstract
Objective: To develop a natural language processing (NLP) pipeline for unstructured electronic health record (EHR) data to identify symptoms and functional impacts associated with Long COVID in children.
Materials and methods: We analyzed 48,287 outpatient progress notes from 10,618 pediatric patients from 12 institutions. We evaluated notes obtained 28 to 179 days after a COVID-19 diagnosis or positive test. Two samples were examined: patients with evidence of Long COVID and patients with acute COVID but no evidence of Long COVID based on diagnostic codes. The pipeline identified clinical concepts associated with 21 symptoms and 4 functional impact categories. Subject matter experts (SMEs) screened a sample of 4,586 terms from the NLP output to assess pipeline accuracy. Prevalence and concordance of each of the 25 concepts was compared between the 2 patient samples.
Results: A binary assertion measure comparing SME and NLP assertions showed moderate accuracy (N = 4,133; F1 = .80) and improved substantially when only high-confidence SME assertions were considered (N = 2,043; F1 = .90). Overall, the 25 Long COVID concept categories were markedly more prevalent in the presumptive Long COVID cohort, and differences were noted between concepts identified in notes versus structured data.
Discussion: This preliminary analysis illustrates the additional insight into a syndrome such as Long COVID gained from incorporating notes data, characterizing symptoms and functional impacts.
Conclusion: These data support the importance of incorporating NLP methodology when possible into designing computable phenotypes and to accurately characterize patients with Long COVID.
Authors
H Timothy Bunnell, Cara Reedy, Vitaly Lorman, Ravi Jhaveri, Andrea Rivera-Sepulveda, Katherine S Salamon, Payal B Patel, Keith E Morse, Mattina A Davenport, Lindsay G Cowell, Levon Utidjian, Dimitri A Christakis, Suchitra Rao, Marion R Sills, Abigail Case, Eneida A Mendonca, Bradley W Taylor, Jacqueline Rutter, Aaron Thomas Martinez, Rebecca Letts, L Charles Bailey, Christopher B Forrest, RECOVER Consortium
Keywords
NLP; PEDSnet; RECOVER; pediatrics