RECOVER Publication: Scientists use Electronic Health Records and Machine Learning to Better Define Long COVID
In a study published in The Lancet Digital Health, Pfaff and colleagues used electronic health record (EHR) data to find more than 100,000 likely cases of Long COVID in an EHR database of more than 13 million people.
The authors examined information from nearly 98,000 COVID-19 patients on demographics, use of health care services, medications, and diagnoses in the National COVID Cohort Collaborative (N3C) database—a national, centralized public database led by NIH’s National Center for Advancing Translational Sciences (NCATS). They used those data with information from nearly 600 patients at Long COVID clinics to create machine learning (ML) computer models that could identify potential Long COVID patients.
The ML models proved to be accurate and identified about 100,000 people in the database whose profiles matched those of people with Long COVID. The study findings will help researchers understand the characteristics and risk factors linked to Long COVID diagnosis and will also help identify potential Long COVID patients for clinical trials. As more data sources are identified, these models can be improved and adapted based on study needs.