Solving Poor Data Quality in Healthcare

Mike Wasserman

by Mike Wasserman


With the slow down in EHR deployments to meet meaningful use compliance and single patient record objectives, many health organizations and EHR vendors are driving towards improved analytics.

The challenge facing the big EHR vendors and most healthcare providers is the data quality issue associated with many different applications holding various bits and pieces of patient information.

Historically, health organizations would invest in large Business Intelligence initiatives and face frustration when the data didn’t easily conform to the dashboards and reports they saw during the software demos.

Providers can also rely on EHR vendors who understand the data in the EHR but struggle to integrate external sources. Pre-built data models provided by specialty medical data warehouse vendors are emerging with a high cost claiming to remove complex ETL, data quality, and data warehouse work.

At the end of the day, the data quality work and data warehouse design isn’t going to disappear. However, there are steps you can take to test and solve these issues before committing significant dollars and resources.

Understanding and Planning for Data Quality

A solid Data Quality/Governance plan should be followed to implement and preserve Data Quality.

Similar to anti-virus solutions, these three Pillars of Data Quality will achieve this continuity:

  • Initial Cleansing – Based on the Data Quality Assessment, data should be cleansed through a combination of automated and manual processes to ensure accuracy, completeness, relevance, consistency, and controlled accessibility. Address cleansing with postal directory data along with highly refined data standardization dictionaries should be used to automate the cleansing and duplicate detection as much as possible without harming the data.  In some cases, data must ultimately be reviewed by Business Analysts or Data Stewards where deeper investigations are required on high-impact data.
  • Data Entry Cleansing – As new data is added, pre-cleansing must take place to ensure accurate, complete data is provided. Typically, it’s best to build/support real-time interfaces with online entry systems so data correction requirements may be applied by the user who is actually entering the data.  In many cases missing data may be available to the user entering the data but lost once incomplete data is loaded.  This not only holds true for cleansing/standardizing of data but also where new data potentially matches existing data (EG:  A pick list of potential “near” matches should be posted to the authorized user entering/reviewing the data to help capture duplicate entries or fraudulent accounts before they degrade master data or cause a loss in revenue).
  • Data Cleansing Maintenance – Over time, cleansed data may become “stale” or outdated making it more difficult to find potential duplicates between newly entered data and existing “stale” data. Some reasons for these evolving inconsistencies include governance changes or more notoriously, postal address updates.  Up to 14 percent of US addresses alone may have a postal code change within a given year.

Prototype Data Quality in the Cloud

Historically it was very difficult to find a software manufacture willing to engage in a comprehensive data warehouse, data quality, and Business Intelligence prototype with customer data. The level of effort and return on investment does not usually work well in these proof of concepts or prototypes for anyone.  Furthermore standing up the solutions necessary in a customer’s data center when their staff and resources are already stretched to their breaking point is not realistic.

Recently our team was engaged with a customer that had a mission to transform and geocode data to support improved case management and visualizations. We knew that solving the problem for the amount of data in the customer environment required significant horsepower and advanced data quality tools.

To solve this challenge we turned to the cloud, specifically the Amazon Web Services marketplace. The customer provided us with sample data that we were quickly able to ingest into a cloud environment on Amazon Web Services using in-memory computing and advanced data quality solutions that we consumed as a utility, effectively pay by the hour.

This allowed to ensure we could meet the customers requirement, at a very reasonable price point, using our own resources and compute power plus software platforms that would normally run in the hundreds of thousands of dollars on-premise.