Skip to main content

Generalisable Long COVID subtypes: Findings from the NIH N3C and RECOVER programmes

Reese, JT; Blau, H; Casiraghi, E; et al., eBioMedicine

View Full Publication on PubMed

Published

January 2023

Journal

eBioMedicine

Abstract

Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz. 

Authors

Justin T Reese, Hannah Blau, Elena Casiraghi, Timothy Bergquist, Johanna J Loomba, Tiffany J Callahan, Bryan Laraway, Corneliu Antonescu, Ben Coleman, Michael Gargano, Kenneth J Wilkins, Luca Cappelletti, Tommaso Fontana, Nariman Ammar, Blessy Antony, T M Murali, J Harry Caufield, Guy Karlebach, Julie A McMurry, Andrew Williams, Richard Moffitt, Jineta Banerjee, Anthony E Solomonides, Hannah Davis, Kristin Kostka, Giorgio Valentini, David Sahner, Christopher G Chute, Charisse Madlock-Brown, Melissa A Haendel, Peter N Robinson; N3C Consortium; RECOVER Consortium

Keywords

COVID-19; Human Phenotype Ontology; Long COVID; Machine learning; Precision medicine; Semantic similarity

Short Summary

In this study, RECOVER researchers used a computer program to identify possible types of Long COVID based on electronic health records (EHRs). They used the computer program to review EHRs of people diagnosed with Long COVID and group them based on patterns in their symptoms and health conditions.

The computer program found 6 different types of Long COVID, which were related to 1) many symptoms and health conditions with unusual lab test results, 2) the lungs, 3) the brain, 4) the heart, 5) pain and feeling weak and tired (fatigue), and 6) many symptoms and conditions with pain. Each type of Long COVID also differed based on health conditions people had before COVID and how severe their COVID infection was. This research could help identify people with different types of Long COVID to better diagnose and treat them and invite them to join research studies.

Resources

Tags

Summary
Back to Top