Guest Post: Data Quality as an Initiative
This article was authored by David Guimbellot and was originally published on Orasi’s “Eye on Quality” blog.
If your business is capturing or coordinating persistent data as a primary function then it behooves you to create a data quality initiative that improves your Test Data Management (TDM). There are common opportunities in the way that IT organizations manage, coordinate and execute their processes around test data.
- The actual data are not securely stored or obfuscated during the testing process
- The coverage gaps in the data used for testing are not measured nor methodically improved
- The time spent creating or finding data for validation and test execution are not a focus of work in process optimization
- The data validation processes are not a central tenet to the solution on par with the mechanisms for code, usability, security or other quality metrics.
Because of these gaps in the solution definition, it is typical to fall into sub-optimal patterns and practices. What follows is a high level recommendation that follows our practices around improved TDM within the Data Quality Initiative.
Clear Test Data Management Vision
The management for TDM hardware, software and requirements should be centralized. This will yield benefits in terms of reduced software and hardware overhead. Common faults along this thread are overuse of storage and server capacity, as well as poor license management of Oracle, SQL server and the like. When the core team works from clarified goals and responsibilities, they will improve the uptime and reliability of the TDM function. The overall QA pipeline is a significant opportunity for optimization in the SDLC timeline.
Test Data Security
A centralized process provides a point of clarity and thus reduced risk profile for the security of production data used in any testing processes. The less restricted use of PII or PCI data are not solved via complex contracts and signatories. The process to secure the data can be done by replacing the sensitive data with masked data supported by algorithms that provide continuity within the entity relationships. This process is done prior to moving the data onto the test bed. The net goal is to provide a smaller attack surface to the potential threat.
Data Coverage Analysis
Information Technology (IT) holds and manages the resources of a knowledge based organization. These most important digital assets are persisted in transactional databases.
As an aside I have observed a wave of change wherein semi-precious artifacts are stored in transitional storage such as NoSQL and other text formatted logging formats. This separation is great news for performance and reduced costs.
The accuracy of the code which accesses the RDBMS and the constraints and triggers within the RDBMS itself are being stressed more than ever. Along one line, IT architects are connecting multiple systems of record to create improved value. The ease of design using web services creates an anti-pattern of data inconsistency faults. As each system connects, the quality suffers without a reconciliation process that involves data quality.
Downstream solutions for Business Intelligence re-use the data to create data warehouses and other data mining vectors. These solutions have higher costs for development and risk unless they have robust or trustworthy data. BI solutions must compute in the risk of their data quality, but the math they use to calculate their data quality fall off quickly without a clear understanding of the existing error rates in the static tables.
For example a 5% error rate in customer names can quickly explode into a 50% error rate for a marketing campaign. I worked on one CRM solution that was deemed too error prone to provide any use at all. They retired a $25MM system due to inaccuracy and replaced it with an expensive external hygiene process.
Test Data Work in Process
Testing is a process where a subset of the tasks provide value to the released software. For example, these might include validation points confirmed, test cases executed, defects generated or other similar tasks. Tests are occurring more often in Continuous Delivery shops. If you have read The Phoenix Project or Principles of Product Development Flow then you are aware of a new optimization concept around Work-in-process Analysis for software. The process analyst would examine the work activities of the team and quantify/qualify them. Then they would prioritize areas for improvement based upon how they decrease throughput or add latency.
In our experience, many QA organizations perform test data tasks that are not valuable as core testing deliverables. Many of these tasks are so long that they become the primary candidates for optimization to the work-in-process for the entire software delivery lifecycle.
Data Validation – Prioritization & Scoping
For better or worse the primary role of quality assurance has focused on the code. As a best practice you should perform root cause analysis on the solution incidents. When we did this analysis on a major production release, the QA team asked the question, “How could we improve the testing to discover this?” We grouped the incidents by themes and scope. The themes that we enumerated included:
And the scope set was:
- Strategic change
- Improved test plan/focus
- Missing test case
At first, our data incidents looked like missing test cases when the first few of them were tallied up. However as the count grew, the team also realized that the methodology for test data was essentially a manual process done by any of 100 different people. We knew we had a strategy gap.
Data Validation – Reconciliation
Another major focus of our data quality risk was data reconciliation. In this example a large of the reports within the solution were financial records that had specific accounting rules that needed to be accurate for Sarbanes-Oxley compliance. This provided the QA team with an opportunity to work closely with the financial management organization to use a new validation process. They provided us with a list of input rules as well as the downstream reports which should eventually calculate the correct results. These were the checks that they were doing manually. As part of our financial sign off, we automated this validation process into our test automation and TDM process to provide an improved layer of DQI.
Since then we have worked with other business areas and customers to find these same opportunities!