Skip to main content

EHR-based case identification of pediatric Long COVID: A report from the RECOVER EHR cohort

Botdorf, M; Dickinson, K; Lorman, V; et al., medRxiv

Caution: Preprints are preliminary reports of work that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
View Preprint on PubMed


May 2024




Objective: Long COVID, marked by persistent, recurring, or new symptoms post-COVID-19 infection, impacts children's well-being yet lacks a unified clinical definition. This study evaluates the performance of an empirically derived Long COVID case identification algorithm, or computable phenotype, with manual chart review in a pediatric sample. This approach aims to facilitate large-scale research efforts to understand this condition better. Methods: The algorithm, composed of diagnostic codes empirically associated with Long COVID, was applied to a cohort of pediatric patients with SARS-CoV-2 infection in the RECOVER PCORnet EHR database. The algorithm classified 31,781 patients with conclusive, probable, or possible Long COVID and 307,686 patients without evidence of Long COVID. A chart review was performed on a subset of patients (n=651) to determine the overlap between the two methods. Instances of discordance were reviewed to understand the reasons for differences. Results: The sample comprised 651 pediatric patients (339 females, M age = 10.10 years) across 16 hospital systems. Results showed moderate overlap between phenotype and chart review Long COVID identification (accuracy = 0.62, PPV = 0.49, NPV = 0.75); however, there were also numerous cases of disagreement. No notable differences were found when the analyses were stratified by age at infection or era of infection. Further examination of the discordant cases revealed that the most common cause of disagreement was the clinician reviewers' tendency to attribute Long COVID-like symptoms to prior medical conditions. The performance of the phenotype improved when prior medical conditions were considered (accuracy = 0.71, PPV = 0.65, NPV = 0.74). Conclusions: Although there was moderate overlap between the two methods, the discrepancies between the two sources are likely attributed to the lack of consensus on a Long COVID clinical definition. It is essential to consider the strengths and limitations of each method when developing Long COVID classification algorithms. 


Morgan Botdorf, Kimberley Dickinson, Vitaly Lorman, Hanieh Razzaghi, Nicole Marchesani, Suchitra Rao, Colin Rogerson, Miranda Higginbotham, Asuncion Mejias, Daria Salyakina, Deepika Thacker, Dima Dandachi, Dimitri A Christakis, Emily Taylor, Hayden Schwenk, Hiroki Morizono, Jonathan Cogen, Nate M Pajor, Ravi Jhaveri, Christopher B Forrest, L Charles Bailey


Chart review; Chronic COVID-19 Syndrome; Electronic health records; Electronic phenotyping; Late sequelae of COVID-19; Long COVID; Long haul COVID; Long-term COVID-19; PEDSnet; Post COVID syndrome; Post-acute COVID-19; Post-acute sequelae SARS-CoV-2 infection; Rule-based phenotyping



Back to Top