Significance
Data masking is not a daily task, so why is performance a vital subject to consider?
While it’s of minor significance whether a data masking process takes 5 seconds or 5 minutes, it’s critical if it takes five days or will never finish. Impossibly long run times are not unusual and render the product useless. Many masking projects fail for this reason.
Performance aspects
So, what makes a data masking solution run fast or slow? Three factors significantly influence masking performance:
- Implementation of the data masking solution
- Database Triggers
- Database tuning
Implementation
When a data masking solution runs, it updates millions of data values in the database. There are two critical aspects relating to the solution implementation that can make a job run very fast or painfully slow:
- Network round-trips
- SQL execution time
Network round-trips are the most critical aspect of the implementation. The masking job will be excruciatingly slow if every update for each of the millions of values executes separately. Each execution is a round-trip, and A good round-trip for LAN applications is about 2-5ms. That means that when sending updates for 24 hours in a loop, you could average between 17 and 43 million updates per day. That’s not as much as it might seem when multiplying the number of rows in the database by the number of columns that require masking.
However, if the data masking product uses array binding (with prepared statements), it can send thousands of updates in a single round-trip. By eliminating most round-trips, the execution time will shrink from 1 day to less than a minute. Note that once round-trip becomes insignificant, other aspects become the bottleneck, so execution time doesn’t often become that short.
SQL execution time also depends on the implementation of the data masking solution. Every database has different capabilities, and implementations must be optimized for each database. Beyond the prepared statements and array binding mentioned above, some databases offer row addressing that is faster than indexes. Oracle, for example, has ROWIDs that allow direct row access. SQL Server has SQLBulkOperations. Similarly, every database requires a unique implementation to maximize speed.
Triggers
Triggers are bits of code in the database that run whenever you change data. They have many purposes, including validating data, adjusting values, synchronizing data between tables and columns, and more. Most database tables don’t have triggers, but when they exist, they are considered an integral part of the database, vital to correct database operation.
While a single trigger execution time is usually short, these intervals add up when updating millions of values. Since data masking performs millions of updates, triggers on tables that require masking often cause execution time to be impossibly long.
It isn’t simple to solve the trigger problem because disabling triggers can cause many problems. Solving the trigger problem, therefore, requires an alternative mechanism to replicate the trigger functionality without triggers. So how can we do that?
Triggers operate one row at a time. That makes sense for code that runs every time a single value changes. However, databases can perform vertical updates and change all the rows in the table with a single SQL. Converting triggers to a procedure with vertical updates will allow you to temporarily disable the triggers during masking and maintain the trigger functionality by running that procedure.
Recoding the triggers as vertical updates is a manual process. However, the main challenge is often to identify all the triggers that fire and the code they run. Core Audit, our database auditing solution, shows you any SQL that runs in the database, including SQLs that run inside triggers. It is the perfect solution for identifying the triggers, and we often use it during masking installations for this purpose.
Tuning
The final aspect of masking performance is database tuning. With well-implemented solutions like Core Masking, there’s usually no need. However, further improvement is possible.
Most databases are read-intensive and tuned for read performance. Querying data is, by far, the most common activity in databases. However, data masking is a write-intensive operation and will benefit from a tuning designed to improve write speed.
DBAs are well-equipped to tune a database for fast writes, but here are a couple of suggestions that might help:
- Dropping indexes on masked columns. Indexes aim to improve querying, but every change in an indexed column requires an index update. Dropping indexes before masking and recreating them afterward is significantly faster.
- Redo logs and archive logs. We only mask non-production databases, so there’s never a need to recover in case of a crash. Redo logs and archive logs are database facilities that work hard during write activity to ensure a database can recover. Disabling or limiting this database functionality during masking will significantly improve write performance.
Databases have many other tunable parameters that can help improve performance, but, as mentioned earlier, it’s rarely necessary to tune the database for masking.
Switching the database to and from a write-intensive configuration can be automated with pre and post-masking actions. Those can drop indexes or change database configuration, reverting to read-intensive tuning once the masking process ends.
Advice
So, how should customers approach a masking project?
If you already own a solution, try identifying the performance bottlenecks and address them. You can also consider professional services like the ones offered by our partners. Alternatively, consider purchasing a different solution like Core Masking.
If you haven’t purchased a solution yet, consider the suggestions below. However, the most critical aspect of buying a data masking solution is ensuring you have a vendor and partner who will stand by their solutions and ensure everything works. No evaluation can substitute the capabilities vendors can wield to resolve problems if they care enough about you as a customer.
Theoretical evaluation
It’s good to evaluate the theoretical basis of a masking solution. For example, consistency algorithms, supported methodologies, pros and cons for data security and usability, and more. Theoretical evaluation and comparison between solutions can reveal technological weaknesses that may be difficult to identify through testing. Such weaknesses rarely resolve through upgrades or patching and will likely remain for the remaining life of the product.
Practical evaluation
Many masking evaluations round corners. Trying to ensure something works but are not adamant about going through all the use cases from beginning to end.
Due to time constraints, customers tend to postpone the trigger issue to the post-purchase implementation. There’s some logic in that since triggers are not a problem in the masking software. However, your ability to resolve it is a requirement for using the software. So whether you can solve it yourself or need the vendor or partner to assist you – it’s valuable to go through the exercise during evaluation.
Sometimes, evaluations don’t even mask tables entirely. That’s simply a matter of time because it might take too long. But that’s how performance ends up being an issue only after purchasing. Also, evaluations often test the data was masked by comparing a few rows. However, they never validated that every row in the table was masked. While it’s not a trivial exercise, it’s not that difficult.
To sum up, practical evaluations are vital, and customers should be adamant about going through their use cases from beginning to end. Ensure masking finishes in an acceptable amount of time and that all the data is well masked.
Identify your requirements
Any product selection should start with identifying current and future requirements. That is particularly significant in data masking since the possibilities of the data you might want to mask and the possible applications for the masked data are endless. Every product has different strengths, but their weaknesses will be debilitating for particular use cases.
However, requirements are something most customers don’t have. That makes purchasing somewhat of a guessing game while following vendor recommendations. We recommend you identify a few challenging use cases and ask the vendors to meet them from beginning to end. Vendors can distinguish themselves through their ability to overcome these challenges and not give cookie-cutter solutions.
Include stakeholders
Many masking evaluations are conducted by DBAs. That makes sense since masking is a database product. However, it can be helpful to include representatives from two other teams. One person from the security team to ensure proper masking. More importantly, at least one person from the target users of the masked data. In other words, representatives from the intended QA or development teams who can validate the masked data are useful for them.
The reason for these inclusions is that if the customers of the data haven’t validated its utility, they are likely to reject it and continue to use unmasked data.
Including multiple stakeholders from different teams can complicate the evaluation process. So try identifying the correct individuals who can work together, cooperate, and get things done quickly.
Final thoughts
Data masking is, in many ways, a simple process, and most customers view it as a simple purchase. While it’s simple, it’s not trivial, and skipping some details can result in wasting money on useless software.
Many data masking vendors view data masking in the same light as customers – a simple and minor product without significant technological barriers. Again, this is not entirely false, but this attitude results in solutions that only work in some use cases and are unusable in others. That happens in both large and small vendors. Large vendors neglect smaller products in the portfolio, and small vendors reduce costs by using cheap developers in 3rd world countries.
Our advice is to test the solution you aim to purchase thoroughly and ensure you are working with a partner and vendor that will resolve your post-purchasing issues and get everything working.
Many masking projects fail. Work with us, and we’ll ensure your success.