Data Quality In Electronic Medical Record Research

Precision rehabilitation, like other precision medicine initiatives, aims to provide effective and efficient care for patients through the understanding of patient variability. Large datasets, like those generated by the electronic health record (EHR), that include detailed data about heterogeneous patient groups, are critical to precision rehabilitation. Generating large datasets that capture this type of data outside of clinical care is costly, creating a unique role for EHR data in precision rehabilitation. Despite the distinct advantages to using EHR data, there are also many barriers. Data quality is particularly challenging because the primary purpose of the EHR is not research; yet, the quality of these data are essential to conducting rigorous research. 

Realizing the potential power of the EHR for precision rehabilitation, The Department of Physical Medicine and Rehabilitation at Johns Hopkins Medical System created a Rehabilitation Data Repository, which contains EHR data from over 17,000 individuals with stroke. Researchers access subsets of these data to answer targeted research questions that relate to precision rehabilitation after stroke. During the creation of the Rehabilitation Data Repository and for each study using these data, assessing the quality of the data is an essential step.

In our work, we conduct data quality assessments to address three key areas of data quality: conformance, completeness, and plausibility.  Each of these areas of data quality can be assessed in terms of verification (i.e., compared to an internal standards) and validation (i.e., compared to an external standard). Conformance refers to the data’s format. For example, in our data, sex is always documented as “Male”, “Female”, or “Unknown.” These are the categories expected within our healthcare system, indicating excellent conformance. Completeness examines the frequency of the data. In our data, we found fewer than expected outpatient physical therapy visits, suggesting an issue with completeness. These are examples of verification. Plausibility assesses the believability of specific variables. For example, when examining gait speed, we found measurements as high as 40 m/s, which is well above the average healthy gait speed of about 1.2 m/s. This indicates a problem with plausibility related to validation. The findings from these data quality assessments have resulted in modifications directly to our EHR system to improve the quality of our data. For example, based on examination of our gait speed data, we determined that the plausibility of our data was poor because some clinicians were documenting the time (i.e., seconds) required to walk a given distance rather than speed (i.e., meters per second). To correct this, the gait speed field in our EHR now explicitly states the expected units, removing the ambiguity about what data is expected to be entered.

Assessing data quality can be a time consuming process. Thus, to streamline the process of assessing data quality and to maximize the reproducibility of these assessments, we use version control and shared code amongst our research team. This is important given the requirements of The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) to provide details related to data quality and data processing. Future steps to improve the reproducibility of data quality assessments include the development of code to assess data quality that are publicly available to the rehabilitation research community.


About this
Applied LeaRRning Case

Margaret “Maggie” French, DPT, PhD, NCS Her primary research goal is to understand variability in post-stroke recovery to improve the efficiency and efficacy of the healthcare system. This necessitates using large, heterogeneous real-world data sources, such as the electronic health record. In this Applied LeaRRning Case, Dr. French discusses approaches for assessing the quality of data from the electronic health record and techniques to maximize the repeatability and reproducibility of these quality checks.

“The electronic health record can help us understand variability in patient progress, but the quality of the data within the electronic health record must be thoroughly and systematically assessed prior to their use.”

Margaret “Maggie” French, DPT, PhD, NSC