Q&A’s: Data Masking

Common Data Masking Questions:

Do you copy data out of production? Static data masking is a simple, easy, and effective way to protect it and prevent a breach.

1. Why mask? Because we can’t protect the data outside of production: Imagine copying customer data for testing. How could you protect it after copying it? Without data masking, you will expose all names, addresses, phone numbers, emails, financial information, and more. Static masking replaces these values with good fakes so you can test without jeopardizing your confidential information or that of the people who entrusted it to you.

2. Reverse the masking? Impossible. That’s the point: Unlike encryption, static masking is a one-way transformation. The masked data resembles the original but doesn’t reveal it. This irreversible process ensures your sensitive information isn’t exposed even if the masked data falls into the wrong hands.

3. Data integrity? Is a must. Otherwise, the application won’t function properly in the test environment, or the test will be ineffective. The masking process must preserve data validity, consistency, and referential integrity. It’s like an elaborate disguise: everything looks different but has to work the same way.

4. A single algorithm? Of course not. There are many ways to mask each type of data. Choosing the strategy that fits your requirements will ensure you achieve your security goals while getting the most out of your data.

For example, value manipulation will preserve some aspects of the original data but can potentially offer weaker security. Data generation will provide perfect security but may impair test quality. Data profiling and custom profiles are two other strategies that balance security and test quality.

5. Should I worry about performance? Yes and no: data masking performance isn’t an issue unless it’s so slow that masking is impossible. Let’s explain more:

There’s a common preconception that static data masking is inherently slow and resource-intensive, but it’s not a big deal since we only have to do it once after copying the data. Some would say just once.

The Truth

It doesn’t matter if a masking process takes 30 seconds, 5 minutes, or half an hour. It’s not something that runs too often, and it never runs on production systems, slowing down business-critical processes.

However, it’s not entirely true that it doesn’t matter since masking becomes impractical if it takes days or weeks to run. Nor is it true that masking runs only once, as it must run every time you refresh your test data. As masking becomes faster and easier, you can update your test data more frequently, getting more out of your data.

Performance Culprits

Slow masking is usually due to one of these reasons:

  • Product selection: different solutions offer different performance capabilities. Common reasons include code quality, database APIs, transaction size, etc. For example, chatty protocols combined with high latency can result in very slow masking.
  • Database performance: like any database-driven product, masking performance also depends on the performance of the underlying database. Most databases aren’t normally tuned for masking.
  • Triggers: can be one of the most challenging problems as these small pieces of code execute whenever data changes. When updating millions of rows, a trigger will run millions of times, causing the masking process to run forever. However, triggers are often essential for data validity and integrity, and you shouldn’t automatically disable them.

Taming the “performance beast”

Addressing these issues will allow data masking to become an integral component of a dynamic data lifecycle rather than a slow, unusable burden everyone wants to avoid.

Here are some ideas to consider:

Product selection is always challenging. Like all IT purchases, with data masking, you should also test several solutions in your environment using your network, database, and data volume. While trials can be time-consuming, they are the only way to ensure the solutions work well in your environment. Be careful not to rely on brand recognition, market analysts, or friendly advice, as they can backfire and result in a failed project.

Database performance can be improved with a little know-how. Data masking is a very write-intensive process that requires different database tuning since most applications are read-intensive. To temporarily improve database performance during masking, you can, for example, remove indexes and constraints, stop archiving, suspend replication, etc. Pre- and post-masking actions can help automate these actions during masking. Work with your DBAs to maximize your database write speed.

Finally, trigger performance issues can be challenging and require time and know-how. First, identify the triggers that run when you mask your data. Determine which are relevant to the data you are masking and disable the rest. Second, convert the necessary triggers into a vertical update procedure and use that procedure during masking instead of the triggers. It works because a single update of all the rows is much faster than millions of small updates. Core Audit can help speed up this process by identifying the SQLs that need to be rewritten as vertical updates.

Masking is essential

When looking at the details of performance problems, data integrity, etc., it’s easy to lose sight of the big picture: why is data masking so important?

  • Reduces risk: eliminating sensitive data outside of production will dramatically reduce your exposure and risk profile.
  • Simplifies compliance: Masking is essential to reduce the scope of various compliance and data privacy regulations. Systems that contain masked data aren’t typically subject to compliance.
  • Improves development: Masked data drives development and test environments, improving product quality, shortening development cycles, and accelerating project timelines.

Final Thoughts

Data masking is a critical component in the data lifecycle, enabling us to use our data to drive and improve many aspects of our business. From product development to testing, data analytics, and more, securely using our data outside of production lets us multiply the value we derive from it.

Masking is simple and essential but not trivial. Many projects fail for a variety of reasons, such as using an inappropriate solution, failing to define the right masking policies, performance issues, and more.

Through our experience working with customers, we found customers always succeed when they have the right solution and a support team committed to their success. Missing either component greatly decreases the chances of success, and lacking both guarantees failure.

Contact us today at info@bluecoreresearch.com to learn more about how we can help you mask and secure your data.