Skip to main content

Clinical encounter heterogeneity and methods for resolving in networked EHR data: A study from N3C and RECOVER programs

Leese, P; Anand, A; Girvin, A; et al., Journal of the American Medical Informatics Association

View Full Publication on PubMed

Published

May 2023

Journal

Journal of the American Medical Informatics Association

Abstract

Objective: Clinical encounter data are heterogeneous and vary greatly from institution to institution. These problems of variance affect interpretability and usability of clinical encounter data for analysis. These problems are magnified when multisite electronic health record (EHR) data are networked together. This article presents a novel, generalizable method for resolving encounter heterogeneity for analysis by combining related atomic encounters into composite "macrovisits." Materials and methods: Encounters were composed of data from 75 partner sites harmonized to a common data model as part of the NIH Researching COVID to Enhance Recovery Initiative, a project of the National Covid Cohort Collaborative. Summary statistics were computed for overall and site-level data to assess issues and identify modifications. Two algorithms were developed to refine atomic encounters into cleaner, analyzable longitudinal clinical visits. Results: Atomic inpatient encounters data were found to be widely disparate between sites in terms of length-of-stay (LOS) and numbers of OMOP CDM measurements per encounter. After aggregating encounters to macrovisits, LOS and measurement variance decreased. A subsequent algorithm to identify hospitalized macrovisits further reduced data variability. Discussion: Encounters are a complex and heterogeneous component of EHR data and native data issues are not addressed by existing methods. These types of complex and poorly studied issues contribute to the difficulty of deriving value from EHR data, and these types of foundational, large-scale explorations, and developments are necessary to realize the full potential of modern real-world data. Conclusion: This article presents method developments to manipulate and resolve EHR encounter data issues in a generalizable way as a foundation for future research and analysis. 

Authors

Peter Leese, Adit Anand, Andrew Girvin, Amin Manna, Saaya Patel, Yun Jae Yoo, Rachel Wong, Melissa Haendel, Christopher G Chute, Tellen Bennett, Janos Hajagos, Emily Pfaff, Richard Moffitt

Keywords

database; electronic health records; informatics

Short Summary

Using computer processing to make data from medical records easier to use in research 

An electronic health record (EHR) is a digital medical chart that has health data like doctor visits, lab results, and other information. These data are useful for understanding trends in health information, including how Long COVID affects people. Because of this, EHR data from different settings can be difficult to compare and use in research. 

Researchers from the National COVID Cohort Collaborative (N3C) looked at over 15 million EHRs from 75 hospitals and clinics. Their goal was to make data from different healthcare settings more compatible. To do this, they had to understand and describe how definitions of patient visits differ between healthcare settings so the EHR data from different healthcare settings would be compatible.  

The researchers focused on identifying patterns in EHR data. They hoped these patterns would help them gain a better understanding of a patient’s complete care experience, including:  

  • How long the patient received care 
  • The number and types of treatments and medical procedures the patient received 
  • The order in which the patient received these treatments and procedures 

To detect these patterns, researchers created two sets of rules for computer processing of EHR data. These rules are called algorithms. The first algorithm allowed researchers to group EHR data in new ways to make it easier to understand how the information in an EHR is related and easier to analyze individual EHRs and to compare EHRs from different sources.  

The second algorithm allowed researchers to identify when EHRs indicated that a patient had been admitted to the hospital. Better data about hospitalizations will help future researchers study COVID and its long-term effects, including Long COVID.  

By using algorithms, N3C researchers are trying to make large amounts of EHR data more consistent, manageable, and understandable. Algorithms like the two tested by these researchers can help other researchers enhance the quality of EHR data, making it more consistent and capable of producing important insights about conditions like Long COVID.  

Resources

Tags

Summary
Back to Top