4 min read

GDPR and testing: A few questions to ask yourself

Picture of Thomas Pryce Thomas Pryce 28 February 2020 10:27:21 GMT

Test Data Management Enterprise Test Data

GDPR and testing: A few questions to ask yourself

I’ve been harping on about GDPR and other recent developments in compliance for years now, and it’s good to see QA organisations are now seriously grappling with compliance as a pressing issue. With each new data breach, and each new study on consumer concern for data privacy, the need to consider data privacy is only re-affirmed. Yet, what I consider to be higher risk practices still remain common in testing, and the latest World Quality Report finds that 60% of organisations still use raw production data in test environments for example.

Below, I’ve gathered together some research and news articles that have come out within the last year or so, each related to GDPR and compliance in some way. The intention is to use fresh data to re-iterates a point already well made by others: the practice of using raw production data in less secure test environments should be examined seriously. It should be scrutinised in terms of security, data breach prevention, and compliance, and only then should it be judged to be “okay”.

I’m no legal expert, and the below represents only my personal interpretation on the importance of recent legislation for testing best practices. However, I hope some of these questions provide some pause for thought. Please feel free to leave your comments on the impact of legislation for QA below, or drop me a direct message.

Questions to ask yourself:

Do you have informed and actively given consent, or another legitimate ground for using that data? Can you show that you have permission from the EU ‘data subject’ to use their information in the way it’s being applied in test environments? You might have a lot of test cases, and a lot of data; what measures are in place to ensure consent, or another legitimate grounds for data processing, are being satisfied in testing?
Are you abiding by the rules around Purpose Limitation and Data Minimisation? Do you know that the data is being used by only enough people, and kept for only long enough, to fulfil the service for which that person consented to the use of their data? Can you prove it if audited, or do you have another legitimate purpose for processing the data? Can you be sure that your test teams are not holding on to data indefinitely, perhaps unaware they still have it, even after consent has expired or been withdrawn?
What about Purpose Limitation and the Right to Erasure? How reliably can you remove every instance of that person’s data in test environments if they request its deletion, or if you no longer need it to fulfil the service for which they provided it? Finding every instance of data quickly and reliably can be difficult with large IT estates, especially with a mixed back of new and legacy components. Storing sensitive data in test environments can make this worse, and tools and techniques will be needed for performing rapid and reliable data profiling and lookups. What if testers and automation engineers are keeping handy data in a handy spreadsheet on their local machine?
What about citizen’s Right to Data Portability and Right to Erasure? Again, can you find every instance of data if someone asks for it to be deleted, or if they ask for a copy of it in a format readable by them? This must occur “without delay” – how good is your current infrastructure for finding every instance of data, copying it, and provisioning it in a readable format like an Excel spreadsheet?

The stakes are high:

You might answer ‘yes’ to some or all of the above questions, and some of the most advanced tech organisations can evidently rapidly find and provision user data upon request for example. However, in my view, these questions deserve careful, honest, and ongoing consideration. The stakes are high:

Since the implementation of GDPR in 2018, there have been a whopping 278 data breach notifications per day.In time, we will also learn of the impact of the California Consumer Privacy Act, introduced this past New Year’s Day.
The UK’s ICO and other national agencies are showing their willingness to serve unprecedented fines for data breaches. In July the ICO announced planned fines of £183 million and £99.2 million for instance.
Consumers and the general public today care about data privacy, and are prepared to act on it. 97% US adults are “somewhat or very concerned about protecting their personal data.” 32% globally are “privacy actives”, who have already acted by switching companies or providers over data or data-sharing policies.

In my experience, several organisations lack the infrastructure or understanding of their complex data to be able to guarantee that they have located every instance of sensitive information in test environments. Extracting and provisioning that data rapidly can likewise be tricky, especially when working with a mixed bag of homegrown techniques. If that sounds familiar, the above questions around Erasure, Portability and Data Minimisation might be particularly pertinent.

If I decide that I cannot use production data, should I mask or generate? Or both?

The latest World Quality Report also finds that 65% of organisations anonymize at least some of the production data they use in testing, and over half generate synthetic test data. Masking can offer a way to mitigate against many compliance requirements when testing, as well as against the risk of a data breach. However, a few things should be considered when deciding how to create data to provision to test environments:

Test data environments are necessarily less secure and would ideally therefore contain no personally identifiable information (PII) from a security standpoint. Ask yourself: How sure are you that no sensitive information can be garnered from masked data? What about when the information left visible in masked data sets is combined with other sources, for instance readily available information online or in other data sources available at your organisation?
Masking is complex and can damage the integrity of data. This is particularly true when reckoning with complex data trends, for example temporal patterns in historical data. If you can mask and retain all the data relationships, you can most likely synthetically generate data from scratch using the same data model. While some great technologies exist for masking, you might consider generation in some as a way to create wholly fictitious data. This will furthermore also unlock the benefits of synthetic data generation.
Masking existing data does nothing to improve the quality of the test data, or the speed with which it is allocated to tests. Why not turn compliance into an opportunity for faster, higher coverage, and potentially more accurate testing? Generating missing data needed to test applications rigorously offers a method for doing just this, especially when the data “Find and Makes” are performed automatically as a standard step within test execution.

In other words, synthetic test data generation is a technology that can enable greater security, while also facilitating more rigorous, faster testing. The reality is that few organisations will be able to wholly replace their data with comprehensive synthetic data over night. However, a hybrid approach is possible, gradually replacing production data sources with synthetic or virtualized data streams. This in turn feeds accurate and rigorous testing, often with less likelihood of sensitive data making it to test environments.

What do you think – do these align with your interpretation of current legislation and its relation to testing, and what are the main challenges we’re facing as a community to meet consumer concern for how we use their data or not? Please feel free to drop me an email with your thoughts.

Key risk factors to mitigate during a data migration

Thomas Pryce

Part one in this article series summarized the shockingly high failure rates for migration projects, identifying data migration as a key area of...

Test Data Management Enterprise Test Data

We Need to Talk About Test Data “Strategy”

Huw Price

For many organisations, test data “best practices” start and end with compliance. This reflects a tendency to focus on the problem immediately in...

Test Data Management DevOps Enterprise Test Data

5 Ways to Keep Your Test Data Compliant

Mantas Dvareckas

As a result of the constantly evolving environment of global data protection legislation, test data management has become increasingly complex....

Test Data Management Enterprise Test Data

28 questions to ask yourself when picking a data generation tool

Thomas Pryce

Data generation enables organisations to create data of the right variety, density, and volume for different testing and development scenarios, all...

Test Data Management Automation Enterprise Test Data

Is test data the engineering problem to solve in 2024?

Thomas Pryce

It’s 2024 and the risks associated with poor test data practices show no signs of abating.

Test Data Management DevOps Enterprise Test Data

Time to migrate from your legacy test data (TDM) tools? Here’s how.

Thomas Pryce

If you’re reading this, you’re probably already painfully familiar with the complaints that Curiosity hear from organisations seeking alternatives to...

Test Data Management Enterprise Test Data

The Democratisation of (Test) Data

Thomas Pryce

A glance at industry research from recent years shows that test data remains one of the major bottlenecks to fix in DevOps and CI/CD:

Test Data Management DevOps Enterprise Test Data

Test data compliance: How to rewrite your organization’s DNA

Rich Jordan

“We mustn’t use live data for testing”. This is the reason why most organizations start to look at superficial solutions to certain challenges that...

Test Data Management The Delivery Times

Test Data Strategy Success: Tooling to Meet The Strategy

Mantas Dvareckas

Today, many organisations rely on rudimental tools and techniques for creating and managing their test data. These outdated techniques not only...

Test Data Management Enterprise Test Data

Curiosity Modeller

Enterprise Test Data

Explore Curiosity's Solutions

Explore Curiosity's Resources

Better Software, Faster Delivery!

GDPR and testing: A few questions to ask yourself

Questions to ask yourself:

The stakes are high:

If I decide that I cannot use production data, should I mask or generate? Or both?

Key risk factors to mitigate during a data migration

We Need to Talk About Test Data “Strategy”

5 Ways to Keep Your Test Data Compliant

28 questions to ask yourself when picking a data generation tool

Is test data the engineering problem to solve in 2024?

Time to migrate from your legacy test data (TDM) tools? Here’s how.

The Democratisation of (Test) Data

Test data compliance: How to rewrite your organization’s DNA

Test Data Strategy Success: Tooling to Meet The Strategy

Our Platform

Solutions

Resources

Company