Data Quality Framework to measure, monitor and improve data health

for a Pharma Manufacturer

Pharma Manufacturer
8 Weeks
5,000+ Employees


  • The SBU of a top Pharma company had initiated processes to measure and improve data quality, but the processes were manual, reactive and static
  • Significantly more time was being spent on data queries, validation and rework, at the cost of analytics and insight generation
  • Manual, reactic and static processes led to significant time consumed in data validation and rework. As a result, more time was consumed in data prep rather than insight generation.


  • DataZymes conducted a Blueprint Data Health Assessment that was highly customized to assess the current process, data, data management processes, quality control steps and potential areas for improvement

  • The assessment included:
         - Source analysis for 6 data sources
         - ETL steps and data transformation steps
         - Analysis of automated and manual QC processes
         - Downstream impact of data quality
         - Review of data models, business rules, process documents and process maps

  • Historical data queries and issue logs were assessed to quantify the impact of data quality
  • A DQM Framework was designed to enable proactive measuring and impacting of data quality
  • The framework included over 60 automated quality checks at entity and file level. The key measures were automated to feed into a Tableau data quality dashboard that allowed for tracking of individual data refresh cycles as well as historical data health metrics
  • The DZ Team improved the data health by 30% through automated and proactive health checks.Proactive alerts meant ease in handling data issues before publish to downstream applications.


    The data health was improved across data sources by a minimum of 30%, through automated proactive health checks. Impact of data quality was largely improved to end users of the data via proactive alerts that allowed IT Teams to handle data health issues before publish to downstream applications. This allowed teams to have higher trust in the data, and enabled reduction in time spent on data queries, validation and rework.