Introduction
SQL Injection is one of the most well-known attack vectors, and that’s largely because it poses a significant security challenge. The only way to understand the benefits and deficiencies of different solutions is by understanding the problem and the approach each solution has to solving it.
The Problem
The best way to understand the problem is to look at examples. There are many variations in SQL injection, and we’ll demonstrate most of them using this simple application example. So, let’s first understand the example.
Application Example
Account | 0 |
First name | John |
Last name | Doe |
Last 4 of social | 5555 |
Submit |
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 5555
Let’s assume our application is a simple web page that allows customer support representatives to see the credit card numbers of customers who call in.
For security reasons, the representative must enter the customer’s account number, first name, last name, and the last 4 digits of the customer’s social security number.
When all the information matches the customer’s data, the application will display the credit card numbers used by the customer.
It appears to be a secure design. Customer support representatives can only view the credit cards of real customers they talk to. How could they use this form to obtain information about customers they didn’t talk with?
The problem is not in the conceptual design but in the actual implementation of the code.
To make this page work, the application running inside the web server must run a SQL against the database. That SQL might look like the example above.
That SQL seems to be correct, with no apparent way to get credit card information without entering the correct customer’s information.
So, what is the problem with this implementation?
The Attack
Account | 0 |
First name | John |
Last name | Doe |
Last 4 of social | 0 OR 1=1 |
Submit |
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 0 OR 1=1
When you fill the form this way, you can see the resulting SQL next to it. Take a look at how the application placed each value in the correct location in the SQL.
When the database executes a SQL the database compares the conditions against every row in the table.
Let’s assume the first row is for customer 1001 named JOE SMITH, with SSN 9999. Here’s what that database evaluation of the conditions looks like:
- Is the customer number 0? No, it’s 1001
- Is the first name JOHN? No, it’s JOE
- Is the last name DOE? No, it’s SMITH
- Is the SSN 0? No, it’s 9999
- Is 1=1? Yes! 1 is always equal to 1.
In SQL, the order of operations evaluates AND before OR, resulting in something like this:
(FALSE and FALSE and FALSE and FALSE) or (TRUE)
The ultimate result of this comparison is TRUE! That means the database will return the credit card number for JOE SMITH, and the application will display it.
The database will evaluate this condition to be true for every row in the table. So, the application will display every credit card in the table.
We filled out the form in a way the programmer who wrote the application did not expect. Therefore, the code did not protect against this attack, and we were able to trick the application into displaying every credit card in the database.
The String Variation
Here’s another little insight about condition evaluation: the database will consider ssn = ‘5555’ to be a valid comparison. The database will automatically convert the string ‘5555’ into the number 5555. To protect against the above attack, let’s embed the ssn as a string. Maybe then things would look different. Let’s fix the application to run this SQL instead:
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ
AND last = โDoeโ
AND ssn = '5555'
Now, the database will return an error if we fill the form with “0 OR 1=1” because it cannot convert that to a number. Have we prevented SQL injection?
Unfortunately, no. That is why there are so many variations on SQL injection. The attacker just needs to modify the attack. Consider this:
Account | 0 |
First name | John |
Last name | Doe |
Last 4 of social | ‘ or ‘a’ = ‘a |
Submit |
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = '' or 'a' = 'a'
With this slight modification, the attack will still extract all the credit cards because ‘a’ is always equal to ‘a’. Take a look at how replacing 5555 with this new input results in the above SQL.
Truncating the SQL
Account | 0 OR 1=1 — |
First name | John |
Last name | Doe |
Last 4 of social | 0 OR 1=1 |
Submit |
SELECT card FROM cards WHERE
account = 0 OR 1=1 -- AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 5555
Another complication an attacker might face is if there’s another AND condition after the injected code. In the above example, we injected into the account number that has several AND conditions after it.
In SQL, double dash (- -) comments out the rest of the line. That way, the attacker can dispose of any additional conditions further down the line.
In MySQL, # is another way to comment out the rest of the line with identical effect to – –
These comment styles comment out the rest of the line. But what if a SQL has multiple lines?
Another way to add a comment to a SQL is by enclosing the comment in /* */. This comment style will span multiple lines.
If a comment starts (/*) but doesn’t end (*/), most databases will consider it to be a comment until the end of the SQL. So, adding /* instead of — will comment out the rest of the SQL on single or multi-line SQLs.
Another point to remember is that in MySQL /*! */ is not a comment, as the ! symbol disables the comment. That can help attackers bypass security measures that ignore comments.
Parameter Fragmentation (HPF)
Account | 0 OR/* |
First name | */1/* |
Last name | */=/* |
Last 4 of social | */1 |
Submit |
SELECT card FROM cards WHERE
account = 0 OR/* AND
first = โ*/1/*โ AND
last = โ*/=/*โ AND
ssn = */1
Another variation is to comment out entire sections of the SQL and split the attack across multiple parameters.
In SQL, any text between /* and */ is a comment and is ignored.
The nice thing about this attack is that in addition to disabling entire portions of the SQL, it also splits the attack across multiple parameters. That makes it almost impossible to detect using WAF technology. More on that later.
Splitting the SQL
Account | 0 |
First name | John |
Last name | Doe |
Last 4 of social | 1 UNION SELECT SSN FROM USERS |
Submit |
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 1 UNION SELECT SSN FROM USERS
What if the vulnerable form doesn’t access the table or columns we want to attack? Will that prevent us from doing a SQL injection?
The CARDS table only includes the last 4 digits of the social security number. Let’s assume we want to read the full SSN stored in the SSN column of the USERS table.
But we have a problem. This SQL doesn’t retrieve the USERS table and its SSN column. How can we exploit this form that reads the cards table to display information from a completely different table?
The answer is the UNION operator. It combines the output of two different SQLs into a single result set. The above example uses it to append all the social security numbers from the USERS table after the list of credit card numbers.
In this example, the first SELECT returns no data from the CARDS table because the condition doesn’t match. However, the second SELECT will dump the entire SSN column from the USERS table. The union operator combines both into a single result set. Therefore, the application will display all the social security numbers from the USERS table as if they were credit cards.
SQL Batch
Account | 0 |
First name | John |
Last name | Doe |
Last 4 of social | 0; CREATE LOGIN hacker WITH PASSWORD = โpasswordโ |
Submit |
SELECT card FROM cards WHERE
account = 0 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 0; CREATE LOGIN hacker WITH PASSWORD = 'password'
In Microsoft SQL Server and Sybase, SQLs are sent to the database in batches. A batch is a sequence of SQLs sent to the database in a single request. All requests in these databases are batches.
You can use a semi-column to separate SQLs in the batch:
SELECT c1 FROM t1; SELECT c2 FROM t2
However, the semi-column is optional and this batch will yield the same result:
SELECT c1 FROM t1 SELECT c2 FROM t2
Batches give SQL injection an entirely new level of malevolence. The injected fragment can execute a completely different SQL allowing it to change data, alter the database, and more.
The example above assumes the application is connected to SQL Server with a privileged account. Note that the semi-column was added for clarity but is optional.
When executed, this SQL will not return any credit cards but will create a new user in the database called ‘hacker’ with the password ‘password’.
This method lets attackers modify data, alter permissions, change the database, and more. Attacks are only limited by the privileges granted to the application. If the application has permission, the attacker can execute the shutdown command and shut down the database.
If complete control over the database isn’t enough, SQL Server also has the xp_cmdshell procedure that runs operating system commands. There’s also xp_regread that can read the registry. Using these, an attacker can gain complete control over the machine.
While it’s always true that applications should never use a privileged account, in SQL Server this rule is crucial. Applications that connect to SQL Server databases must NEVER use a privileged account. Unfortunately, many applications in SQL Server connect using the highly privileged SA account, and that exposes them to even greater risks.
Variations
So far, all the examples we looked at used OR 1=1. That is the classic example everyone uses. It can create the illusion that if a SQL does not contain 1=1, there is no SQL injection.
Unfortunately, there are endless variations on conditions that are always true. 1=1 is just one grain of sand in an infinite desert.
Let’s start with trivial examples that use simple operators:
Expression | Explanation |
---|---|
17 != 21 | 17 is always different than 21 |
99 < 100 | 99 is always smaller than 100 |
9-1 > 2+1 | 9-1 is always greater than 2+1 |
‘ABC’ = ‘ABC’ | The string ‘ABC’ is always identical to ‘ABC’ |
‘hello’ != ‘world’ | The string ‘hello’ is always different than the string ‘world’ |
We can also use some database functions to create slightly more complex expressions:
Expression | Explanation |
---|---|
‘XYZ’ like ‘%Y%’ | XYZ contains a Y |
‘B’ in (‘A’, ‘B’, ‘C’) | B is in the list |
7 in (select 7) | 7 is in the result set |
translate(‘AB’,’AB’,’XY’) = ‘XY’ | replacing A with X and B with Y in AB results in XY |
While there are endless examples like this, they all use literals (constants). If we had a tool that was smart enough to calculate every expression the database can perform, it would be able to determine that all these expressions are always true.
No security solution is that smart, but this theoretical ability to detect SQL injections only means one thing. It means we need to take another small step forward and examine expressions no solution can determine are always true.
The expressions below use column values, and therefore, no solution can calculate them. It is impossible to distinguish these expressions from actual application logic.
Expression | Explanation |
---|---|
last != ‘XYZ’ | last name is not XYZ |
ssn > 0 | SSN is greater than 0 |
id not in (0, 1, 2) | ID is not 0, 1, or 2 |
first != last | first and last names are different |
id is not null | The ID is not a null value |
While these expressions will probably always be true, it’s impossible to determine that without understanding or evaluating the data in each column. We’ll discuss this further in the section about static analysis.
Non-Trivial Input
All the examples so far entered data into fields in the form. That is the most trivial way to interact with the application, but not the only one.
The HTML form use used will usually send the query to the application web server using a GET or POST request method. A GET request might look like:
โฆ/cards?id=0&first=JOHN&last=DOE&ssn=5555
You’ve probably seen strings like that in the URL of your web browser. You can also send the same GET request using AJAX, and those are not visible in the URL. However, you can easily see them in the developer console (try hitting F12 in your web browser).
POST requests are not visible in the URL but usually contain the same information. You can also see them in the developer console of any web browser.
When attackers hack, they usually send GET and POST requests directly. That allows them to manipulate fields in ways the application form might not allow.
For example, a form can have hidden fields you cannot see, or the application may use JavaScript to add additional fields.
These non-trivial inputs are slightly more challenging to identify and modify. While harder to exploit, this lack of accessibility means developers and testers usually don’t protect or test them properly.
Here are a few examples of information you cannot edit in a form but can change in a GET or POST request:
- Hidden input fields – these are input fields that are not visible to the user. Applications use these to keep context information about the state of the application or the user’s flow through it. While the user cannot see these fields on the screen, it is not difficult to modify their content.
For example, a hidden field can store an ID, and the application could use it to query the database. - Select fields – These are single-choice combo boxes or multiple-choice selections. While the user can only select from the list of options, an attacker can send any value.
For example, if you can only choose between two options that send a value of 0 or 1, an attacker can send the value of ‘0 or 1=1’. - Links – We often click on links, buttons, trees, images, and other components in web applications. These links frequently contain embedded values attackers can manipulate.
For example, the “Next Page” link usually contains the next page number (e.g., page=2) or the number of the first item on the next page (e.g., start=11). Attackers can modify them to include a SQL injection. - Cookies – Cookies are a way to store state information in the web browser and send it to the application. For example, this is where most applications store the session ID.
These types of inputs are never tested by QA or used by regular users. Most automated vulnerability detection systems will also fail to test these. That means that flaws in validating these types of inputs can lie dormant for years, never noticed or addressed.
These non-trivial inputs tend to be more vulnerable and provide fertile ground for targeted attacks by experienced hackers.
Parameter Pollution (HPP)
Parameter Pollution is an attack that requires manipulation of the GET/POST requests. For example, what happens if the above application receives two SSN fields in the request instead of one? In other words, the GET request would look like:
?id=0&first=JOHN&last=DOE&ssn=5555&ssn=7777
The answer depends on the technology stack used by the application and the methods used to extract the parameters. Some technologies will return a comma-separated list of the duplicate parameter. So the above request will return:
5555,7777
That creates an opportunity for an attack similar to Parameter Fragmentation (HPF) but without requiring multiple parameters or being aware of their order in the SQL. For example, sending these 4 SSNs (broken into lines for clarity):
โฆ/cards?id=0&first=JOHN&last=DOE &ssn=0 OR /* &ssn=*/ 1 /* &ssn=*/ = /* &ssn=*/ 1
That will result in this SQL:
SELECT card FROM cards WHERE
account = 0 AND
first = โJOHNโ AND
last = โDOEโ AND
ssn = 0 OR /*,*/ 1 /*,*/ = /*,*/ 1
That is a fragmented SQL injection attack using a single parameter.
No Keywords
Account | ACCOUNT/* |
First name | John |
Last name | Doe |
Last 4 of social | 1 |
Submit |
SELECT card FROM cards WHERE
account = ACCOUNT /* AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 1
So far, every attack contained SQL keywords such as OR and UNION. Even though these are simple words in English that would appear in regular input, they make the input look suspicious.
But with a little investigation and effort, it’s possible to create a SQL injection that has no keywords or even spaces.
In this example, we check whether a column is equal to itself and comment out the rest of the SQL. Since the account is always equal to itself, this SQL will return all the cards in the table.
Truncating without Comments
Account | 1 OR 1=1 OR 1=1 |
First name | John |
Last name | Doe |
Last 4 of social | 1 |
Submit |
SELECT card FROM cards WHERE
account = 1 OR 1=1 OR 1=1 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 1
Another recurring motif in these examples is leveraging comments. Many times, we use comments to remove unwanted portions of the SQL. But there’s another way.
We highlighted the OR. In the order of precedence, ORs are last, so anything separated by an OR is evaluated separately. The red portion always evaluates to true, and so the entire WHERE clause is always true.
Misleading Numerical Inputs
Account | cast(1 as varchar(10)) OR 1=1 |
First name | John |
Last name | Doe |
Last 4 of social | 1 |
Submit |
SELECT card FROM cards WHERE
account = cast(1 as varchar(10)) OR 1=1 AND
first = โJohnโ AND
last = โDoeโ AND
ssn = 1
One approach for defeating some defense systems is to create confusion about the nature of the parameter.
In this example, the Account parameter doesn’t appear to be numerical input. An automatic defense that validates it as a string will consider it to be a safe string to execute. It’s safe because it has no tags (‘) that terminate the string.
cast() is a standard database operator that converts the number to a string. The database will consider this to be valid input. To compare the string value to the account, it will convert it back to a number.
Login Screens
Login screens are also vulnerable to SQL injection attacks. There are two ways the application code can verify the user. Both methods are susceptible to attacks.
Query Validates Password
Username | adminโ– |
Password | x |
Login |
SELECT id FROM users WHERE username = 'admin'--' and password = hash('x')
In this authentication method, the query validates the password. The select is looking for a row with the correct user and password. If the password matches, the query returns the row. Otherwise, the query returns no data. The SQL will look like this:
SELECT id FROM users where username = 'admin' and password = hash('x')
In the where clause, the SQL will contain the hash of the password or call to a hash function.
When filling out the login screen like in the above example, the password-validating portion of the SQL is disabled. The double dash (- -) comments out the rest of the SQL, and the query will return the row for admin. That means the password is correct.
Query Retrieves Password
Username | โ union select โ5f4dcc3b5aa765d61d8327deb882cf99 |
Password | password |
Login |
SELECT password FROM users WHERE
username = '' union
select '5f4dcc3b5aa765d61d8327deb882cf99'
This authentication method retrieves the hash from the database and validates it in the code. The SQL will look like this:
SELECT password FROM users WHERE username = 'admin'
When filling out the login screen like in the above example, the SQL will return the MD5 of the word ‘password.’ When the application compares this md5 to the password we provided, they will match. In this method, we return the password hash we want instead of the hash stored in the database.
Introduction to Solutions
It is vital to understand a SQL injection is fundamentally exploiting a flaw in the code. You cannot ensure the code has no SQL injection vulnerabilities, just like you cannot guarantee it has no bugs. However, like with any code vulnerability, the best solution is to fix the flaw in the code. The alternative is to try and prevent, detect, or make it difficult to exploit the vulnerabilities, but all these approaches have limitations.
The following sections review the different approaches and discuss the benefits and limitations of each. But before diving into the details, let’s have a high-level overview to get the lay of the land.
Solutions in the Code
As mentioned above, the best solution is to have high-quality code and proper coding techniques that avoid the possibility of a SQL injection bug.
The best way to avoid a SQL injection bug is to use bind variables. That is the perfect solution and it completely prevents SQL injections. Slightly less robust methods include escaping input and input validation. It is also recommended to reduce client error reporting.
More on all these techniques, their value, limitations, and how to implement them later.
Database Solutions
SQL injection is ultimately a database attack that takes advantage of the flexible nature of SQL. Therefore, the most obvious place to look for a solution is in the database.
There are several database technologies to detect or prevent SQL injections but most are not effective. An exception is the unique anomaly analysis technology in Core Audit which is extremely effective.
More on these database technologies, their differences, pros, and cons later.
Web Application Firewall (WAF)
The difficulties in identifying SQL injection at the database level gave rise to a new product category: the Web Application Firewall (WAF).
The idea was that it would be easier to detect the SQL injection attack when examining the application input rather than when looking at the SQL sent to the database.
It’s a nice theory, but it only works on specific examples. When looking at the full scope of SQL injection attacks described in this paper, the problem becomes insurmountable. More on that later.
In-Application Firewall
A new technology developed by Blue Core Research aims to take WAF technology to the next level and incorporate it into the application.
The Core Audit Java Agent provides detailed auditing about everything that happens in the application. One of its additional capabilities is to apply firewall-type blocking rules around the entire application or sensitive areas in the code.
An In-Application Firewall can benefit from context and information available only inside the application and can apply targeted protections to specific methods.
Detect
The wide range of approaches described above doesn’t only differ in their technology and effectiveness but also in the type of protection they deliver. While many security people tend to dismiss detective capabilities, that is a mistake.
Detective approaches aim to alert you of a possible SQL injection. The downside is that they don’t block the attack, and you’ll need to take protective action as well as fix the code.
It may appear as though the value of such a detective solution is lower. However, unlike prevention, you can tune detection to be more sensitive because false positives don’t prevent legitimate users from working. That means detection can provide an effective mechanism with little to no false negatives (undetected events).
A powerful detective engine such as the one found in Core Audit is likely to notify you of any SQL injection attempt.
Prevent
There are two primary categories for preventive approaches: blocking and curing. Depending on the type of solution, one or both options may be available. The Core Audit Java Agent offers both.
While blocking prevents an attack altogether, curing aims to remove only the offensive part of the request. The benefit of blocking is that any attack the solution recognizes will be stopped. Curing bears a potential risk that part of the attack will get through.
There are significant differences between various solutions, but none can claim 100% effectiveness. That is to block all attacks and pass all legitimate activity. That’s the benefit of detection, but more on that later.
Because preventive solutions can block legitimate activity, they let you disable some of their filters (thereby limiting their effectiveness even further). As the activity becomes more versatile and blocking more strict, the likelihood of blocking legitimate activity increases.
Sanitize
Finally, some solutions aim to cure the problem without blocking it. With these solutions, malicious injections are removed from the input and allowed to run safely. Core Audit also has this capability.
While this is a very nice advertisement, in many solutions it comes at a heavy price of introducing additional vulnerabilities. The problem is that an attacker can craft an attack that will leverage the sanitizing process as part of the attack.
This means that In addition to the limited effectiveness of SQL injection detection, using sanitizing can reduce the effectiveness of the solution by offering the attacker additional tools to bypass the security. More on that later.
Escaping the Parameter Cage
When looking at the various examples it becomes clear that SQL injection is about changing the SQL contract by escaping the parameter cage.
If the parameter is a string, then a SQL injection will need to terminate the string before it can modify the SQL. That usually means having a tag (‘) in the input.
Numerical parameters are terminated by whitespace (space, tab, newline, etc.) or other characters that are not part of the literal (for example, a /).
Detecting an escape from the parameter cage is one method for recognizing SQL injection in application firewalls but it relies on correctly identifying the parameter type.
Another common method is the identification of suspicious keywords or special characters. This method also has significant limitations.
Solutions in the Code & Best Practices
As mentioned above, the best solution is to have good code. Proper coding techniques can reduce and even eliminate the risk of SQL injection. This section will discuss these in detail.
Bind Variables
All SQL injection attacks involve escaping the previously mentioned parameter cage. Fortunately, all databases come with a full-proof API for preventing that. It’s called bind variables.
Bind variables are a way of sending the parameters separately from the SQL rather than embedding the constants into it. For example, the previous SQL used in the examples would look like this with bind variables:
SELECT card FROM cards WHERE account = :1 AND first = :2 AND last = :3 AND ssn = :4
When this SQL is sent to the database, the programmer needs to also send the values for the bind variables (:1, :2, :3, :4).
Writing SQLs this way avoids the database misinterpreting part of the value for being part of the SQL construct. The values can contain any character and the database will not be confused.
Using bind variables also has a performance side benefit. When the same SQL executes multiple times with different bind variables it only has to be parsed once. That is, actually, the original purpose for bind variables.
Unfortunately, rewriting the entire application to use bind variables might be expensive, difficult, or entirely impractical.
Escaping Input
The most common approach for fixing the application code is to escape all input fields. For example, if an input field contains a tag (‘), replace it with two tags (‘ ‘). The escaped quote can be embedded in a SQL without causing an injection attack.
Escaping the input John O’Brian in our example will yield this SQL:
SELECT card FROM cards WHERE account = 0 AND first = โJohnโ AND last = โO''Brianโ AND ssn = 5555
Escaping input is only half the solution because it only solves the problem for string input. Numerical input requires a different approach mentioned below.
While this is not the best method for preventing SQL injection, it is the most commonly used one. This is due to the relatively small impact on the code and, consequently, cheaper implementation costs.
Numeric Input
Fixing numerical input is, on the one hand, a very simple solution. However, this solution has to be individually implemented for every numerical input making it potentially time-consuming, difficult, and costly to implement.
Every programming language contains a function for converting strings to numbers and means of embedding those numbers into a string. By converting the input into a number before embedding it into the SQL string we ensure that it is an actual number.
For example, unsafe code in PHP might look like this:
$sql .= " id= ".$_GET["ssn"];
While the safe version looks like this:
$sql .= " id= ".intval($_GET["ssn"]);
$_GET[“ssn”] reads the GET parameter ssn and wrapping it in intval() will ensure that it’s a number.
In Java, the unsafe looks like this:
sql += " id = "+ssn;
While the safe version is this:
sql += " id = "+Integer.parseInt(ssn)
Similar examples can be made for any programming language from Python to C. The idea is always the same — convert to a number and then back into a string.
It should be noted that there are slight differences in the way this technique works in different languages and those depend on how the conversion function works.
For example, the intval() function in PHP would convert part of the string if possible, so the string “7 OR 1=1” would be converted to 7. If the input could not be parsed, the function would return the value 0.
In Java, however, the Integer.parseInt() function would attempt to convert the entire string to a number and any obstacle would cause it to throw an exception.
Regardless of the differences, all these implementations are safe and none would allow a SQL injection to pass through the numerical parameter.
Input Validation
Input validation is the concept of ensuring the input is acceptable and fits the operational parameters of the function. It is a much wider concept than stopping SQL injection aimed at preventing many types of bugs including other types of attacks.
Because of its wide nature, it’s hard to consider code that performs input validation as resilient to SQL injection. Input validation could include tests that ensure strings only have escaped tags (‘) and numbers contain valid numbers. However, these could just as easily not be included in the input validation tests.
For example, consider this little piece of PHP code:
if ($_GET['ssn'] > 0 && $_GET['ssn'] < 9999) return; $sql .= " id= ".$_GET["ssn"];
This code checks that the ssn is between 0 and 9999 and will reject any input that is not within range. While this seems like a safe piece of code, sending it “7 OR 1=1” will bypass the test and result in a SQL injection.
The reason is that to perform the comparison, PHP must convert the string input to a number. It is as if each of the $_GET[‘ssn’] expressions in the if() statement is wrapped by the intval() function. Therefore, “7 OR 1=1” will be converted to the number 7 which falls within the valid range.
Therefore, if input validation is used to help prevent SQL injection, it is important to specifically require and test that inputs are valid from a SQL injection standpoint. This would usually entail escaping the input as described above.
Error Messages & Blind Injection
Application developers often display to the user the errors they got from the database. While this is useless information to the user, it is valuable information to the developers that need to debug the problem.
However, this same information is also valuable to a hacker that wants to perform a SQL injection attack. Database error messages can contain fragments of the SQL and other indications that can help a hacker tune the attack to work.
Not having these database error messages requires a hacker to do what’s known as Blind Injection. This involves more complex and difficult techniques that attempt to obtain information without seeing the database error messages.
One such technique is to use if() conditions that call a sleep function (e.g. waitfor) if they are true. In other words, know whether something is true or not based on how long it takes the query to complete.
The bottom line is that Blind Injection is more difficult and there’s no good reason to display error messages to the users. When a developer needs to debug a problem, they can always enable error messages for a particular scenario while performing their investigation.
Errors & Vulnerability Detection
Most applications have many pages with even more input fields and most hackers are far too lazy to try them all. So hackers use scanning tools that go through every input on every page and try to enter the type of input that might cause vulnerable code to return an error.
If certain types of inputs result in the application showing an error, there’s good reason to further investigate the vulnerability of that page.
It is important to remember that this discussion about Errors and Vulnerabilities assumes that the hackers have to “experiment” on the live application. Hackers that have another type of access to the application software will use it to detect vulnerabilities and learn how to exploit them.
Other types of access include:
- Off-the-shelf software that can be downloaded and installed
- Access to non-production systems (Dev, QA, pre-prod, etc.)
- Access to the source code or free software
This leads to several interesting conclusions:
- Don’t show errors to the users. Users will complain when they don’t get data. Showing users cryptic error numbers or messages does not improve their experience.
Errors are useful for developers that need to debug the problem and for hackers that want to find places in the application that are vulnerable. Not showing errors at all is one way of hiding where the code is vulnerable. - Monitor for database errors. When operating properly, applications should not generate errors. However, when hackers try to locate vulnerable code and exploit it, you are likely to see database errors.
When concerned about SQL injection, you should especially look out for errors that involve invalid SQL. This is one of the many things Core Audit can help you achieve. - Run the scanning tool yourself. Vulnerability scanning tools used by hackers are freely available on the internet. Running such tools against your application will show you at least some of the places where your code is vulnerable.
Database Solutions
SQL injection is ultimately a database attack that takes advantage of the flexible nature of SQL. This section discusses various technologies that can help detect and/or prevent SQL injection in the database.
Static SQL Analysis
SQL analysis is the oldest technology that attempted to detect SQL injection. The severe limitations of this technology gave rise to WAF technology.
Static-analysis aims to analyze SQL to identify suspicious expressions in the where clause. The classical example of OR 1=1 will always be detected by these tools, but what about more complex ones?
The above section about Variations is a good place to start exploring the strength of a static SQL analyzer.
No static analyzer has support for such a wide array of operators, functions, and expressions. But even if one did, it would still be limited to expressions that use literals.
Expressions that take advantage of column names (such as first != last) or SQL Batch are outside the theoretical limits of static analyzers and they would be unable to detect those.
A shortlist of expressions that could be identified by static analyzers includes:
- Operators — a static SQL analyzer needs to have a complete list of all database operators including +, -, *, /, =, !=, <, <=, >, >=, <>, IN, NOT IN, ANY, SOME, ALL, BETWEEN, LIKE, etc.
- Functions — a static SQL analyzer needs to have a complete list of all database functions including string manipulation (Concat, Substr, Lower, Upper, Trim, Replace, Translate, etc), numerical functions (Ceil, Round, Floor, Trunc, Mod, etc), character functions, date functions, conversion functions, and more.
- Expressions — a static SQL analyzer needs to evaluate expressions the same way the database would, including if conditions, subqueries, and more.
AI & SQL Intention
More advanced solutions than static analyzers use machine learning and AI to learn the difference between a “good” and “bad” SQL. These rely on complex language analysis that aims to understand the intention of a SQL.
It’s difficult to determine the effectiveness of this technology because it relies on every customer teaching it what is good and bad in their particular environment.
As a result, any SQL that is misclassified can simply be attributed to the fact that the customer hasn’t taught it yet. Similarly, any SQL injection that is detected doesn’t mean that the next one will be detected as well. It’s just part of the unpredictable nature of AI.
However, it is difficult to imagine that even the best AI taught by the most diligent customer will be able to detect OR first != last as a SQL injection since it can be a legitimate part of the application logic.
Anomaly Analysis
Blue Core Research developed a unique technology for detecting SQL injection. Unlike static analyzers and AI, we don’t attempt to understand the SQL. As we’ve demonstrated above, the simple truth is that it’s impossible to do this well.
Our solution to SQL injection detection is to leverage our Full Capture technology and our repository technology to create a list of all the SQL constructs that run in the database.
Since applications repeat the same SQLs over and over, over time we can learn every SQL the application uses. All that’s left is to identify SQL constructs that we haven’t seen before and flag those as possible SQL injection attempts.
For example, if we’ve never seen the application run a particular SQL with OR first != last in the where clause, we’ll alert you of a possible SQL injection attempt when we see it.
Anomaly Analysis is also the only technology that will always be able to successfully detect the injection of an entire SQL using a batch. This type of injection is especially difficult to detect because there’s nothing suspicious about the injected SQL itself. The only thing suspicious is that this SQL is something the application never does. Anomaly Analysis will easily detect that.
The Core Audit Anomaly Analysis engine can do a lot more than identifying SQL injections, but that’s just one more benefit of this technology. Anomaly Analysis is the most effective technology for detecting SQL injection.
Application Solutions
Some solutions aim to protect the application from receiving input that will trigger a SQL injection attack. They aim to identify the type of input that can cause it and block it.
The challenge is that as soon above, SQL injection attacks are incredibly versatile and it’s next to impossible to detect all of them.
Web Application Firewall (WAF)
As mentioned above, Web Application Firewall is a technology that was born to deal with the threat of SQL injection. It started when the only other technology at the time was static SQL analysis which was quickly proven to be ineffective.
Some vendors go as far as claiming they can fix the SQL injection problem without fixing the application code. These claims are very appealing to customers, but they are entirely false. These vendors market their solutions as WAF which includes the word “Firewall” in the title and gives customers a false sense of security.
The idea behind WAF was that while it is impossible to detect SQL injection through static analysis of the SQL sent to the database, it should be possible to detect the injection when doing static analysis on the input sent to the application. That’s a nice theory, but the examination we’ll perform below will show it is incorrect.
WAF Concept
The concept behind WAF is to examine the user input before it’s incorporated into the SQL. The idea is that a SQL fragment will look suspicious in the user input but impossible to detect once it blends into the complete SQL.
For example, having OR first!=last in the where clause of a SQL could be a normal part of the application logic and impossible to distinguish from it. However, that phrase would be highly suspicious in regular user input.
Many hackers are lazy and use SQL Injection scanning tools. These tools go through every parameter on every page searching for potential vulnerabilities. Accordingly, many WAF implementations turned from something that protects against SQL injection to something that detects the type of unusual input these tools generate. This isn’t protection against SQL injection nor is it effective since there are many tools (including custom ones).
However, this brings up an interesting question – Why purchase a WAF that checks user input against known patterns in scanning tools when you can find all those vulnerabilities yourself? All you need to do is run a SQL injection scanning tool. These tools are available on the internet for free.
There are several fundamental problems with this concept:
- SQL Injection doesn’t have to look like a SQL fragment — A SQL Injection could be as simple as “id –” (see the section on Keywords). It doesn’t have to have a keyword like OR, UNION, etc. It doesn’t have to have special symbols like = or !=. It could be something that will trick even a human inspection of the input.
- Lack of Grammar — Unlike SQLs, user input doesn’t follow any predictable grammar. There’s no way to know what the input means or why the user entered it. This makes any type of static analysis extremely limited and entirely based on heuristics, signatures, and search patterns.
- Different parameters have different meanings — User input varies a lot across different parameters in the application and across all applications. Sometimes it’s a page number, sometimes a search request, and sometimes someone’s name. Therefore, it’s difficult to define rules that are always true for any input.
- Safety is code dependent — While some parameters may be vulnerable to SQL injections, others might be properly handled and safe. The input O’BRIAN might be suspicious because it has a tag (‘) and, therefore, is trying to search for code vulnerabilities. At the same time, it might be safe if the code handles this particular input properly.
WAF Methods & Challenges
All WAF solutions face the same basic challenges. The difference between the tools is in how they balance those challenges and how good is the implementation.
Parameter Type
As mentioned above, SQL injection is all about escaping the parameter cage. However, the parameter cage depends on the parameter type. To escape the cage of a textual parameter you need a tag (‘). To escape the case of a numerical parameter, you need a literal delimiter character such as space, slash, etc. Numerical parameters can also be replaced by expressions that change these rules.
Detecting the parameter type might not be as simple as it seems considering problems such as Misleading Numerical Inputs mentioned above.
If we prevented any character that is not a letter or a digit, we could completely prevent SQL injection. However, that means that, for example, no one can enter a space anywhere in the application input. That’s not a problem for numerical input, but it is a problem for textual one.
By guessing the parameter type, a WAF can apply different filters to different parameter types and allow a wider range of input increasing the risk of SQL injection.
Special Characters
The parameter type problem expands into various special characters that can be used in SQL injection as well as other types of injections (e.g. file name injections like LFI, RFI, etc).
There are many special characters and when relying solely on filtering special characters we will quickly prevent many types of valid application input.
Examples of special characters are
- Comments can be a double dash (- -), a /*, and in MySQL also a hashtag (#).
- Parenthesis can be a function call or a subquery.
- Strings are terminated by a tag (‘), but in MySQL, strings can also be enclosed by double quotes (“)
- Numeric literals end with whitespace (space, CR, LF, Tab), a slash (/), a dash (-), and many other characters.
- A semi-column (;) can terminate a SQL in a batch
Keywords
The other strategy is filtering keywords that appear in SQL like OR and UNION. However, in batch injection a semi-column (;) is optional and a new SQL can start at any point. That means that SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, and a few other keywords also need to be filtered.
But even without the influx of keywords, “UNION BANK OF CALIFORNIA” contains the word UNION. And it’s not difficult to think of phrases that contain the word OR.
Mixed Strategy
Ultimately, a good WAF solution needs to balance various strategies of looking at special characters, keywords, parameter types, and more.
Unfortunately, even with all the strategies combined, it is difficult to strike the right balance that will allow most of the legitimate inputs and prevent most of the malicious ones.
Automatic or Configurable
One of the challenges WAF solutions face originates in customer expectations. Customers want a one-click solution. Push one button and get protection from SQL injection. Since customers want, salespeople promise to deliver.
Unfortunately, as we’ve explained so far, it’s simply impossible. A solution that will work on the entire application and pass more legitimate input will also pass at least some types of SQL injection attacks. It only takes is one successful type of attack to compromise all the data in the database.
Instead of looking for the “One button” solution, customers that are willing to customize different strategies for different types of input can enjoy superior protection with less false positives.
The more configuration the WAF solution has and the more granular the control, the more likely it is that it can be tailored for your particular application and offer better protection.
Implementation
Just like any software solution, there are significant differences between good and bad WAF implementations. Here are a few examples that might seem funny but are entirely real:
- WAF might filter the word “union” in lower case, but fail to filter “UNION” in upper case or “UnIoN” in mixed case.
- WAF might filter the word “OR” but failed to see that %4F%52 is the same thing in hex.
There are many other potential flaws and weaknesses in WAFs and they ultimately become yet another piece of software whose bugs can be exploited.
Unfortunately, there is no simple and reliable way to measure the quality of a WAF solution and no unbiased publication. Testing it yourself with the many examples on this page can give you an idea of the quality of a solution.
The WAF Sanitizer
Some WAF solutions block malicious activity, others clean it, and some offer both options. The sanitizer is the portion of the WAF that cleans the malicious activity. Like any piece of software, this too can have weaknesses.
Sanitizers have a particular type of weakness that happens a lot – by removing one malicious attack they expose another malicious attack.
If the WAF solution has the option to block potentially malicious activity and avoid the sanitizer, it is safer to do so.
For example, these inputs:
1 O/**/R 1/**/=/**/1 un/**/ion SEselectLECT * F'RO'M
Could be sanitized by removing comments, symbols, and keywords to produce:
1 OR 1=1 union SELECT * FROM
In-Application Firewall
Blue Core Research developed a new type of technology that can supplement or replace a WAF. This technology runs inside the application instead of outside and offers control over the protection applied to different parts of the code without changes to the source.
The Core Audit Java Agent uses native JVM capabilities to add security at runtime to the regular JAR files you’re currently using.
The Core Audit Java Agent can add several types of security, but since this paper is about SQL Injection, we will only go into detail about the Basic Blocking capabilities that aim at addressing this threat.
The types of security the Core Audit Java Agent can add are:
- Auditing (L2-3) — Full Capture of the application sessions and the activity they perform. Core Audit native application auditing levels 2-3 include report generation, alerting, forensics, analysis of behavioral profiles, and more.
- Basic Blocking — Apply WAF-style firewall rules at various points in the application. More details below.
- Blocking Policies (L4) — Apply custom rules about who can do what in the application. For example, restrict which IPs or users can run what activity at what times.
- Reactive Blocking — Configure policies that automatically block IPs or users with too many failed logins or errors across multiple application servers.
Filter & Parameter Control
The Core Audit Agent comes with multiple filters that can be applied as needed to different parameters. You can apply tight SQL injection filters to some parameters and more lenient to others.
There are numerical filters for numerical data, String filters for textual data, general Safe filters that auto-detect the data type, Keyword filters, and more. Having multiple filters that can be applied in different ways gives you the flexibility to tailor the In-Application firewall to your application.
Depth Control
Filters can be applied to any method in the application. They can be applied to URL parameters that are received from the user or to a method that creates a SQL from them.
You can apply lenient filters to search parameters you believe to be safe, and strict filters when those parameters are added to the SQL to ensure they are as safe as you thought.
Areas in the application that have fragile code can be better protected, while well-tested portions of the code can be granted more latitude in their interaction with the user.
Since filters can be applied to any method within the application, they are not limited to filtering web-related activity. Filters can be applied to data read from files, from various network sources, from databases, internally generated activity, and more.
Context & Action Control
The In-Application firewall is part of the Core Audit generic application auditing technology. As such, all information reported about potential SQL injections is linked to the full user session context.
That means that every blocked activity will be related to the IP address, user, program, and every other piece of information reported by the application. You will be able to know what happened before and after even if those actions did not trigger an alert.
The In-Application firewall can block actions by throwing exceptions. These exceptions can be customized to fit the type of exception handled by each area of the code.
Additionally, the In-Application firewall has a high-quality sanitizer that can eliminate the identified threat without introducing another.
In-Application Firewall or WAF
One of the benefits of WAF is that it filters the activity before it reaches the application. The In-Application firewall cannot make this claim.
However, since WAF always runs outside the application, it’s missing the context in which the activity runs, and the ability to apply different filters at different depths in the application stack.
The In-Application firewall offers granular control over the types of filters so they can meet the specific needs of each portion of the application. There’s no need to compromise on a filter that is “generally okay” and doesn’t give the appropriate level of protection.
The In-Application Firewall runs within the application so there’s no need to deploy additional machines and filter the network traffic. It all runs on the same machine and as part of the application.
The In-Application Firewall comes with full application auditing delivered by Core Audit Levels 2-3 and can be upgraded to apply context-based blocking policies or reactive policies with Core Audit Level 4. All these capabilities are only possible with the Core Audit technology stack.
Summary
SQL injection is a difficult attack to combat, and rewriting the application using bind variables is the only true solution to the problem.
If converting the application to entirely use only bind variables is not a viable course of action, Core Audit technologies in the application and the database are the most effective in detecting and preventing possible SQL injection attacks.
Each independent approach has limitations, so the best solution is a mixed approach of improving the code, deploying effective detection in the database, and preventive measures in the application.
Beyond SQL injection, Core Audit technologies can help secure databases and applications to ensure your sensitive data is not compromised by external or internal actors.