SQL Injection

SQL Injection

SQL Injection is one of the most well-known attack vectors and it poses a significant security challenge. Learn more about how SQL injection works, and the different approaches to solving it.

Introduction

SQL Injection is one of the most well-known attack vectors and it poses a significant security challenge. The only way to understand the benefits and deficiencies of each solution is to understand the problem and the approach each solution uses to solve it.

The Problem

The best way to understand the problem is by looking at examples. To demonstrate how SQL injection works, here’s an example of a simple application.

Application Example

Account
0
First Name
John
Last Name
Doe
Last 4 of social
5555
Sumbit
SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 5555

Let’s assume our application is a simple web page that allows customer support representatives to see the credit card numbers of customers who call in.

For security reasons, the representative must enter the customer’s account number, first name, last name, and the last 4 digits of the customer’s social security number.

When all the information matches the customer’s data, the credit card numbers used by the customer will be shown.

Seems like a pretty secure design as the customer support representative has to get the information from a real customer to see their credit cards. How could a customer support representative use this form to get information about any other customer?

The problem is not in the conceptual design but in the actual implementation of the code.

To make this page work, the application running inside the web server needs to run a SQL against the database. That SQL might look like the above example.

However, this SQL seems like the correct SQL to run and there’s no apparent way to get credit card information without entering the customer’s information.

So what’s the problem with the implementation?

The Attack

Account
0
First Name
John
Last Name
Doe
Last 4 of social
0 OR 1=1
Sumbit
SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 0 OR 1=1

When the form is filled this way and the SQL is executed, the database compares the conditions against every row in the table.

Let’s assume the first row is for customer 1001 named JOE SMITH with SSN 5555:

  • Is the customer number 0? No, it’s 1001
  • Is the first name JOHN? No, it’s JOE
  • Is the last name DOE? No, it’s SMITH
  • Is the SSN 0? No, it’s 5555
  • Is 1=1? Yes, 1 is always equal to 1.

In SQL, the order of operations will evaluate the AND before the OR, resulting in something like this:

(FALSE and FALSE and FALSE and FALSE)
or
(TRUE)

The ultimate result of this comparison is TRUE! That means that the credit card number of JOE SMITH will be displayed.

Since this condition will be true for all the rows in the table, every credit card in the table will be shown.

Because we filled the form in a way the programmer did not expect and did not protect against, we were able to trick the application into showing us every credit card in the database.

The String Variation

The database will consider ssn = ‘5555’ as a valid comparison and automatically convert the string ‘5555’ into the number 5555. So maybe if we embed the value as a string things would look different. Let’s fix the application to run this SQL:

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’
  AND last = ’Doe’
  AND ssn = '5555'

In this case, the database will return an error if we fill the form with “0 OR 1=1” because it cannot convert this string into a number. Have we prevented SQL injection?

Account
0
First Name
John
Last Name
Doe
Last 4 of social
0' OR 'a'='a
Sumbit
SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = '0' OR 'a'='a'

With this slight modification, the attack will still extract all the credit cards because ‘a’ is always equal to ‘a’.

Truncating the SQL

Account
0 OR 1=1 --
First Name
John
Last Name
Doe
Last 4 of social
5555
Sumbit
SELECT card FROM cards WHERE
  account = $account2 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 5555

Another complication an attacker might face is if there’s another AND condition after the injected code. In the above example, we injected into the account number that has several AND conditions after it.

The double dash (- -) in SQL, comments out the rest of the line, so any additional conditions can be easily disposed of.

In MySQL, # is another way to comment out the rest of the line with identical effect to – –

Another way of adding a comment to a SQL is by enclosing it in /* */. If a comment starts, most databases will consider it to be a comment until the end of the SQL (with any number of lines). So adding /* instead of – – will also comment out the rest of the SQL.

Another consideration in MySQL is that /*! */ is not a comment as the ! symbol disables the comment. This can be a useful means of bypassing security measures that ignore comments.

Parameter Fragmentation (HPF)

Account
0 OR /*
First Name
*/ 1 /*
Last Name
*/ = /*
Last 4 of social
*/ 1
Sumbit
SELECT card FROM cards WHERE
  account = $account2 AND
  first = ’$first2’ AND
  last = ’$last2’ AND
  ssn = $ssn2

Another option is to comment out entire sections of the SQL and split the attack across multiple parameters.

In SQL, any text between /* and */ is a comment and is ignored.

The nice thing about this attack is that not only did it disable entire portions of the SQL, but it also split the attack across parameters making it more difficult to detect by WAF technology. More on that later.

Splitting the SQL

Account
0
First Name
John
Last Name
Doe
Last 4 of social
1 UNION SELECT SSN FROM USERS
Sumbit
SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1 UNION SELECT SSN FROM USERS

What if the vulnerable form doesn’t access the table or columns we want? Let’s assume we want to read the SSN column from the USERS table because that table contains the full social security number and not only the last 4 digits.

How can a form that reads the cards table show us information from a completely different table?

The above SQL uses the UNION operator to combine the output from two different SQLs into one result set:

In this case, the first SELECT returns no data, but the second will dump the entire SSN column from the USERS table. The union operator will combine both into a single result set and the application will display all the rows in the SSN column as if they were credit cards.

SQL Batch

Account
0
First Name
John
Last Name
Doe
Last 4 of social
0; CREATE LOGIN hacker WITH PASSWORD = 'password'
Sumbit
SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 0; CREATE LOGIN hacker WITH PASSWORD = 'password'

In Microsoft SQL Server and Sybase, SQLs are sent to the database in batches. A batch is a sequence of SQLs sent to the database in a single request, and all requests are batches.

You can use a semi-column to separate SQLs in the batch:

SELECT c1 FROM t1; SELECT c2 FROM t2

However, the semi-column is optional and this batch will yield the same result:

SELECT c1 FROM t1 SELECT c2 FROM t2

In these databases, SQL injection can take on an entirely new level of malevolence. The injected fragment can run an entirely different SQL allowing it to change data, alter the database, and more.

This example assumes the application is connected to SQL Server with a privileged account. Note that the semi-column is optional but used for clarity

When executed, no credit cards will be returned but a new user called ‘hacker’ will be created in the database with the password ‘password’.

This method can be used to modify data, permissions, and much more. Anything that the application account has privileges to do can be done via this type of attack. The attacker can even execute the shutdown command and shut down the database.

If complete control over the database isn’t enough, SQL Server also has the xp_cmdshell procedure that allows running operating system commands and xp_regread that can read the registry. This grants the attacker complete access to the machine.

While applications should never use a privileged account in any database, this rule becomes extremely important in SQL Server databases. Application using SQL Server databases must never use a privileged account. Unfortunately, many applications in SQL Server use the highly privileged SA account.

Variations

So far we’ve mostly discussed using OR 1=1. That is the classic example that is always used and it might create the impression that if a SQL does not contain this expression there is no SQL injection.

In reality, there are endless variations on conditions that are always true and this is just a little taste of those.

We’ll start with trivial examples that use simple operators:

ExpressionExplanation
17 != 2117 is always different than 21
99 < 10099 is always smaller than 100
9-1 > 39-1 is always greater than 3
‘ABC’ = ‘ABC’The string ‘ABC’ is always identical to ‘ABC’
‘hello’ != ‘world’The string ‘hello’ is always different than the string ‘world’

We can also use some database functions to create slightly more complex expressions:

ExpressionExplanation
‘XYZ’ like ‘%Y%’XYZ contains a Y
‘B’ in (‘A’, ‘B’, ‘C’)B is in the list
7 in (select 7)7 is in the result set
translate(‘AB’,’AB’,’XY’) = ‘XY’replacing A with X and B with Y in AB results in XY

However, all these examples use literals (constants). If we had a tool that was smart enough to calculate every expression the database can perform, it would be able to determine that these expressions are always true.

This means that in theory, it could be possible to detect that the above examples are SQL injections. But that just means we need to take another tiny step forward.

The expressions below use column values and are, therefore, no tool can calculate the expressions to determine they are always true. It is impossible to distinguish these expressions from proper application logic.

ExpressionExplanation
last != ‘XYZ’last name is not XYZ
ssn > 0SSN is greater than 0
id not in (0, 1, 2)ID is not 0, 1, or 2
first != lastfirst and last names are different

While these expressions will probably always be true, it’s impossible to determine that without understanding the data in each column. There will be additional discussion on this subject in the section on static analysis.

Non-Trivial Input

All the examples so far entered data into fields in the form. This is the most trivial way to interact with the application, but far from being the only one.

The above HTML form will usually send the query to the application web server using a GET or POST request method. A GET request might look like:

…/cards?id=0&first=JOHN&last=DOE&ssn=5555

You’ve probably seen strings like that in the URL of your web browser. The same GET request can also be sent using AJAX in a way that is not visible in the URL but can be seen just as easily in the developer console (try hitting F12 in your web browser).

POST requests are usually formatted the same as GET requests but are not sent in the URL. They can also be seen easily in the developer console of any web browser.

When attackers hack, they usually send GET and POST requests directly. This allows them to manipulate the fields in ways that the application form might not allow.

These non-trivial inputs are slightly more difficult to identify and modify, and as a result slightly more difficult to exploit. However, this lack of accessibility also means that developers are unlikely to properly protect or test them.

A few examples of information that cannot be edited in a form but can be changed in a GET or POST request (among other ways):

  • Hidden input fields – these are input fields that are not visible to the user. Applications use these to keep context information about the state of the application or the user’s flow through it. While the user cannot see these fields on the screen, it is not difficult to modify their content.
    For example, a session ID could be stored in a hidden field and the application could use it in a query to the database.
  • Select fields – These are single-choice combo boxes or multiple-choice selections. While the user can only select from the options displayed, an attacker can send any value.
    For example, if you can only choose between two options that send a value or 0 or 1, an attacker can send the value of 0 or 1=1.
  • Links – We often click on links, buttons, trees, images, and other components in web applications. These links often have values embedded in them, and those values can be easily manipulated by an attacker.
    For example, the “Next Page” link we often see would have embedded in it the first item number of the next page (e.g. start=10). That first item number could be modified to include a SQL injection.
  • Cookies – Cookies are another way of storing state information in the web browser and sending it to the web server. For example, this is the standard place applications use to keep session IDs.

These types of inputs are never tested by QA or by regular users and most automated vulnerability detection systems will also fail to test these. This means that flaws in these types of inputs can lie dormant for years without even being noticed or addressed.

As a result, these non-trivial inputs tend to be far more likely to be vulnerable and provide fertile ground for targeted attacks by experienced hackers.

Parameter Pollution (HPP)

One type of attack that can only be accomplished by manipulating the GET/POST requests is Parameter Pollution. For example, what happens if the above application receives two SSN fields in the request instead of one? In other words, the GET request would look like:

?id=0&first=JOHN&last=DOE&ssn=5555&ssn=7777

The answer depends on the technology stack used by the application and the methods used to extract the parameters. Some technologies will return a comma-separated list of the duplicate parameter. So the above request will return:

5555,7777

This creates an opportunity for an attack similar to Parameter Fragmentation (HPF) but without requiring multiple parameters or knowing their order in the SQL. For example, sending these 4 SSNs (broken into lines for clarity):

…/cards?id=0&first=JOHN&last=DOE
&ssn=0 OR /*
&ssn=*/ 1 /*
&ssn=*/ = /*
&ssn=*/ 1

Will result in this SQL:

SELECT card FROM cards WHERE
   account = 0 AND
   first = ’JOHN’ AND
   last = ’DOE’ AND
   ssn = 0 OR /*,*/ 1 /*,*/ = /*,*/ 1

That is a fragmented SQL injection attack using a single parameter.

No Keywords

Account
account/*
First Name
John
Last Name
Doe
Last 4 of social
1
Sumbit
SELECT card FROM cards WHERE
  account = $account2 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

So far all the attacks used SQL keywords like OR and UNION. Even though they are simple words in English that could appear in normal data they make the input look suspicious.

But with a little investigation and effort, it’s possible to create a SQL injection that has no keywords or even spaces.

In this example, we just check whether a column is equal to itself and comment out the rest of the SQL. Since the account is always equal to itself, this SQL will return all the cards in the table.

Truncating without Comments

Account
1 OR 1=1 OR 1=1
First Name
John
Last Name
Doe
Last 4 of social
1
Sumbit
SELECT card FROM cards WHERE
  account = $account2 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

Another repeating motif in these examples has been the usage of comments. We many times used comments to remove unwanted portions of the SQL. But there’s another way.

We highlighted the OR in blue. In the order of precedence, ORs are last so anything separated by an OR is evaluated separately. The red portion always evaluates to true, and so the entire WHERE clause is always true.

Misleading Numerical Inputs

Account
cast(1 as varchar(10)) OR 1=1
First Name
John
Last Name
Doe
Last 4 of social
1
Sumbit
SELECT card FROM cards WHERE
  account = cast(1 as varchar(10)) OR 1=1 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

One approach for defeating some defense systems is to create confusion about the nature of the parameter.

The Account parameter doesn’t look like numerical input and automatic defenses that aim to validate it as a string will consider this to be a safe string to execute as it has no tags (‘) that terminate the string.

cast() is a standard database operator that converts the number to a string. The database will consider this to be valid input and convert the string back to a number to compare it to the account.

Login Screens

Login screens are also vulnerable to SQL injection attacks. There are two ways a user can be verified by the application, and either can be breached.

Query Validates Password

Username
admin'--
Password
x
Login
SELECT username FROM users WHERE
  $w

One option is for the query to validate the password. If the password matches the row is returned, otherwise, nothing is returned.

The SQL will contain the hash or a hash function in the where clause. When filling out the login screen like in the above example, the password-validating portion of the SQL is disabled.

This double dash (- -) will comment out the rest of the SQL and the row for admin will be returned suggesting that the password has been authenticated.

Query Retrieves Password

Username
' union select '5f4dcc3b5aa765d61d8327deb882cf99
Password
password
Login
SELECT password FROM users WHERE
  username = '' union
select '5f4dcc3b5aa765d61d8327deb882cf99'

Another option for validating passwords is to retrieve the hash from the database and validate it in the code.

This SQL will return the MD5 of the word ‘password’. When the application compares this md5 to the password we provided they will match. We simply return the password hash we want instead of whatever is stored in the database.

Introduction to Solutions

It is important to understand that SQL injection is, fundamentally, an exploitation of a security flaw in the code. Just like there are no guaranteed ways to ensure the code has no bugs, it is difficult to ensure the code has no SQL injection vulnerabilities. However, just like with any other bug, the best solution is to fix the flaw in the code. The alternative is to try and prevent, detect, or make it more difficult to exploit the vulnerabilities, but all these approaches have limitations.

The next sections will review different approaches and discuss the benefits and limitations of each. But before diving into the details, let’s perform a high-level review to get the lay of the land.

Solutions in the Code

As mentioned above, the best solution is to have high-quality code and proper coding techniques that avoid the possibility of SQL injection.

The best way to avoid a SQL injection attack is to use bind variables. This is the perfect solution and it completely prevents SQL injections. Slightly less robust methods include escaping input and input validation. It is also recommended to reduce client error reporting.

More on all these techniques, their value, limitations, and how to implement them later.

Database Solutions

SQL injection is ultimately a database attack that takes advantage of the flexible nature of SQL. Therefore, the most obvious place to look for a solution is in the database.

Several technologies can help detect and/or prevent SQL injection in the database. The problem with most is that they are not effective. Unlike most other database solutions for detecting SQL injection, the unique technology in Core Audit is extremely effective.

More on these various database technologies and their differences later.

Web Application Firewall (WAF)

The difficulty in identifying SQL injection at the database level created a new product category — Web Application Firewalls (WAF).

The idea is that it would be easier to detect the SQL injection attack when examining the application input rather than when looking at the SQL that was sent to the database.

It’s a nice theory, but it only works on simple or specific examples. When looking to prevent all the different types of SQL injection attacks described in this paper the problem becomes insurmountable. More on that later.

In-Application Firewall

A new technology developed by Blue Core Research aims to take WAF technology to the next level and incorporate it into the application.

The Core Audit Java Agent provides detailed auditing data about everything that happens in the application. It can also apply firewall-type blocking rules around the entire application or sensitive areas in the code.

An In-Application Firewall can benefit from context and information that are only available inside the application to protect the right methods correctly.

Detect

This wide range of approaches don’t only differ in their technology and effectiveness, but also in the value they deliver. However, there is also an inverse correlation between value and effectiveness.

Detective approaches aim to alert you of a possible SQL injection. The downside is that they don’t block it and you’ll need to take action against the attacker as well as fix the code.

The perceived value seems to be lower but the benefit is that the detection can be tuned to be sensitive and provide a fairly effective mechanism. A powerful detective engine such as the one found in Core Audit is likely to notify you of any SQL injection attempt.

Prevent

Preventive approaches are divided into two primary categories: blocking and curing. Depending on the type of solution both options might be available, like in the Core Audit Java Agent.

The benefit of blocking is that SQL injection attempts that can be recognized by the blocking engine will be prevented.

While there are significant differences between various products, none can claim to automatically pass all legitimate activity and block all attacks. More on that later.

In addition to their limited effectiveness, preventive solutions run the risk of blocking legitimate activity. The more versatile the activity and the more strict the blocking, the more likely it is that some valid activity will be blocked.

The Core Audit Java Agent, for example, offers calibration between lenient and strict methods so that you can apply the appropriate level of filtering to the relevant activity.

Sanitize

Finally, some solutions aim to cure the problem without blocking it. With these solutions, malicious injections are removed from the input and allowed to run safely. Core Audit also has this capability.

While this is a very nice advertisement, in many solutions it comes at a heavy price of introducing additional vulnerabilities. The problem is that an attacker can craft an attack that will leverage the sanitizing process as part of the attack.

This means that In addition to the limited effectiveness of SQL injection detection, using sanitizing can reduce the effectiveness of the solution by offering the attacker additional tools to bypass the security. More on that later.

Escaping the Parameter Cage

When looking at the various examples it becomes clear that SQL injection is about changing the SQL contract by escaping the parameter cage.

If the parameter is a string, then a SQL injection will need to terminate the string before it can modify the SQL. That usually means having a tag (‘) in the input.

Numerical parameters are terminated by whitespace (space, tab, newline, etc.) or other characters that are not part of the literal (for example, a /).

Detecting an escape from the parameter cage is one method for recognizing SQL injection in application firewalls but it relies on correctly identifying the parameter type.

Another common method is the identification of suspicious keywords or special characters. This method also has significant limitations.

Solutions in the Code & Best Practices

As mentioned above, the best solution is to have good code. Proper coding techniques can reduce and even eliminate the risk of SQL injection. This section will discuss these in detail.

Bind Variables

All SQL injection attacks involve escaping the previously mentioned parameter cage. Fortunately, all databases come with a full-proof API for preventing that. It’s called bind variables.

Bind variables are a way of sending the parameters separately from the SQL rather than embedding the constants into it. For example, the previous SQL used in the examples would look like this with bind variables:

SELECT card FROM cards WHERE
  account = :1 AND
  first = :2 AND
  last = :3 AND
  ssn = :4

When this SQL is sent to the database, the programmer needs to also send the values for the bind variables (:1, :2, :3, :4).

Writing SQLs this way avoids the database misinterpreting part of the value for being part of the SQL construct. The values can contain any character and the database will not be confused.

Using bind variables also has a performance side benefit. When the same SQL executes multiple times with different bind variables it only has to be parsed once. That is, actually, the original purpose for bind variables.

Unfortunately, rewriting the entire application to use bind variables might be expensive, difficult, or entirely impractical.

Escaping Input

The most common approach for fixing the application code is to escape all input fields. For example, if an input field contains a tag (‘), replace it with two tags (‘ ‘). The escaped quote can be embedded in a SQL without causing an injection attack.

Escaping the input John O’Brian in our example will yield this SQL:

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’O''Brian’ AND
  ssn = 5555

Escaping input is only half the solution because it only solves the problem for string input. Numerical input requires a different approach mentioned below.

While this is not the best method for preventing SQL injection, it is the most commonly used one. This is due to the relatively small impact on the code and, consequently, cheaper implementation costs.

Numeric Input

Fixing numerical input is, on the one hand, a very simple solution. However, this solution has to be individually implemented for every numerical input making it potentially time-consuming, difficult, and costly to implement.

Every programming language contains a function for converting strings to numbers and means of embedding those numbers into a string. By converting the input into a number before embedding it into the SQL string we ensure that it is an actual number.

For example, unsafe code in PHP might look like this:

$sql .= " id= ".$_GET["ssn"];

While the safe version looks like this:

$sql .= " id= ".intval($_GET["ssn"]);

$_GET[“ssn”] reads the GET parameter ssn and wrapping it in intval() will ensure that it’s a number.

In Java, the unsafe looks like this:

sql += " id = "+ssn;

While the safe version is this:

sql += " id = "+Integer.parseInt(ssn)

Similar examples can be made for any programming language from Python to C. The idea is always the same — convert to a number and then back into a string.

It should be noted that there are slight differences in the way this technique works in different languages and those depend on how the conversion function works.

For example, the intval() function in PHP would convert part of the string if possible, so the string “7 OR 1=1” would be converted to 7. If the input could not be parsed, the function would return the value 0.

In Java, however, the Integer.parseInt() function would attempt to convert the entire string to a number and any obstacle would cause it to throw an exception.

Regardless of the differences, all these implementations are safe and none would allow a SQL injection to pass through the numerical parameter.

Input Validation

Input validation is the concept of ensuring the input is acceptable and fits the operational parameters of the function. It is a much wider concept than stopping SQL injection aimed at preventing many types of bugs including other types of attacks.

Because of its wide nature, it’s hard to consider code that performs input validation as resilient to SQL injection. Input validation could include tests that ensure strings only have escaped tags (‘) and numbers contain valid numbers. However, these could just as easily not be included in the input validation tests.

For example, consider this little piece of PHP code:

if ($_GET['ssn'] > 0 &&
$_GET['ssn'] < 9999) return;

$sql .= " id= ".$_GET["ssn"];

This code checks that the ssn is between 0 and 9999 and will reject any input that is not within range. While this seems like a safe piece of code, sending it “7 OR 1=1” will bypass the test and result in a SQL injection.

The reason is that to perform the comparison, PHP must convert the string input to a number. It is as if each of the $_GET[‘ssn’] expressions in the if() statement is wrapped by the intval() function. Therefore, “7 OR 1=1” will be converted to the number 7 which falls within the valid range.

Therefore, if input validation is used to help prevent SQL injection, it is important to specifically require and test that inputs are valid from a SQL injection standpoint. This would usually entail escaping the input as described above.

Error Messages & Blind Injection

Application developers often display to the user the errors they got from the database. While this is useless information to the user, it is valuable information to the developers that need to debug the problem.

However, this same information is also valuable to a hacker that wants to perform a SQL injection attack. Database error messages can contain fragments of the SQL and other indications that can help a hacker tune the attack to work.

Not having these database error messages requires a hacker to do what’s known as Blind Injection. This involves more complex and difficult techniques that attempt to obtain information without seeing the database error messages.

One such technique is to use if() conditions that call a sleep function (e.g. waitfor) if they are true. In other words, know whether something is true or not based on how long it takes the query to complete.

The bottom line is that Blind Injection is more difficult and there’s no good reason to display error messages to the users. When a developer needs to debug a problem, they can always enable error messages for a particular scenario while performing their investigation.

Errors & Vulnerability Detection

Most applications have many pages with even more input fields and most hackers are far too lazy to try them all. So hackers use scanning tools that go through every input on every page and try to enter the type of input that might cause vulnerable code to return an error.

If certain types of inputs result in the application showing an error, there’s good reason to further investigate the vulnerability of that page.

It is important to remember that this discussion about Errors and Vulnerabilities assumes that the hackers have to “experiment” on the live application. Hackers that have another type of access to the application software will use it to detect vulnerabilities and learn how to exploit them.

Other types of access include:

  • Off-the-shelf software that can be downloaded and installed
  • Access to non-production systems (Dev, QA, pre-prod, etc.)
  • Access to the source code or free software

This leads to several interesting conclusions:

  • Don’t show errors to the users. Users will complain when they don’t get data. Showing users cryptic error numbers or messages does not improve their experience.
    Errors are useful for developers that need to debug the problem and for hackers that want to find places in the application that are vulnerable. Not showing errors at all is one way of hiding where the code is vulnerable.
  • Monitor for database errors. When operating properly, applications should not generate errors. However, when hackers try to locate vulnerable code and exploit it, you are likely to see database errors.
    When concerned about SQL injection, you should especially look out for errors that involve invalid SQL. This is one of the many things Core Audit can help you achieve.
  • Run the scanning tool yourself. Vulnerability scanning tools used by hackers are freely available on the internet. Running such tools against your application will show you at least some of the places where your code is vulnerable.

Database Solutions

SQL injection is ultimately a database attack that takes advantage of the flexible nature of SQL. This section will discuss various technologies that can help detect and/or prevent SQL injection in the database.

Static SQL Analysis

SQL analysis is the oldest technology that attempted to detect SQL injection. The severe limitations of this technology gave rise to WAF technology.

Static-analysis aims to analyze SQL to identify suspicious expressions in the where clause. The classical example of OR 1=1 will always be detected by these tools, but what about more complex ones?

The above section about Variations is a good place to start exploring the strength of a static SQL analyzer.

No static analyzer has support for such a wide array of operators, functions, and expressions. But even if one did, it would still be limited to expressions that use literals.

Expressions that take advantage of column names (such as first != last) or SQL Batch are outside the theoretical limits of static analyzers and they would be unable to detect those.

A shortlist of expressions that could be identified by static analyzers includes:

  • Operators — a static SQL analyzer needs to have a complete list of all database operators including +, -, *, /, =, !=, <, <=, >, >=, <>, IN, NOT IN, ANY, SOME, ALL, BETWEEN, LIKE, etc.
  • Functions — a static SQL analyzer needs to have a complete list of all database functions including string manipulation (Concat, Substr, Lower, Upper, Trim, Replace, Translate, etc), numerical functions (Ceil, Round, Floor, Trunc, Mod, etc), character functions, date functions, conversion functions, and more.
  • Expressions — a static SQL analyzer needs to evaluate expressions the same way the database would, including if conditions, subqueries, and more.

AI & SQL Intention

More advanced solutions than static analyzers use machine learning and AI to learn the difference between a “good” and “bad” SQL. These rely on complex language analysis that aims to understand the intention of a SQL.

It’s difficult to determine the effectiveness of this technology because it relies on every customer teaching it what is good and bad in their particular environment.

As a result, any SQL that is misclassified can simply be attributed to the fact that the customer hasn’t taught it yet. Similarly, any SQL injection that is detected doesn’t mean that the next one will be detected as well. It’s just part of the unpredictable nature of AI.

However, it is difficult to imagine that even the best AI taught by the most diligent customer will be able to detect OR first != last as a SQL injection since it can be a legitimate part of the application logic.

Anomaly Analysis

Blue Core Research developed a unique technology for detecting SQL injection. Unlike static analyzers and AI, we don’t attempt to understand the SQL. As we’ve demonstrated above, the simple truth is that it’s impossible to do this well.

Our solution to SQL injection detection is to leverage our Full Capture technology and our repository technology to create a list of all the SQL constructs that run in the database.

Since applications repeat the same SQLs over and over, over time we can learn every SQL the application uses. All that’s left is to identify SQL constructs that we haven’t seen before and flag those as possible SQL injection attempts.

For example, if we’ve never seen the application run a particular SQL with OR first != last in the where clause, we’ll alert you of a possible SQL injection attempt when we see it.

Anomaly Analysis is also the only technology that will always be able to successfully detect the injection of an entire SQL using a batch. This type of injection is especially difficult to detect because there’s nothing suspicious about the injected SQL itself. The only thing suspicious is that this SQL is something the application never does. Anomaly Analysis will easily detect that.

The Core Audit Anomaly Analysis engine can do a lot more than identifying SQL injections, but that’s just one more benefit of this technology. Anomaly Analysis is the most effective technology for detecting SQL injection.

Web Application Firewall (WAF)

As mentioned above, Web Application Firewall is a technology that was born to deal with the threat of SQL injection. It started when the only other technology at the time was static SQL analysis which was quickly proven to be ineffective.

Some vendors go as far as claiming they can fix the SQL injection problem without fixing the application code. These claims are very appealing to customers, but they are entirely false. These vendors market their solutions as WAF which includes the word “Firewall” in the title and gives customers a false sense of security.

The idea behind WAF was that while it is impossible to detect SQL injection through static analysis of the SQL sent to the database, it should be possible to detect the injection when doing static analysis on the input sent to the application. That’s a nice theory, but the examination we’ll perform below will show it is incorrect.

WAF Concept

The concept behind WAF is to examine the user input before it’s incorporated into the SQL. The idea is that a SQL fragment will look suspicious in the user input but impossible to detect once it blends into the complete SQL.

For example, having OR first!=last in the where clause of a SQL could be a normal part of the application logic and impossible to distinguish from it. However, that phrase would be highly suspicious in regular user input.

Many hackers are lazy and use SQL Injection scanning tools. These tools go through every parameter on every page searching for potential vulnerabilities. Accordingly, many WAF implementations turned from something that protects against SQL injection to something that detects the type of unusual input these tools generate. This isn’t protection against SQL injection nor is it effective since there are many tools (including custom ones).

However, this brings up an interesting question – Why purchase a WAF that checks user input against known patterns in scanning tools when you can find all those vulnerabilities yourself? All you need to do is run a SQL injection scanning tool. These tools are available on the internet for free.

There are several fundamental problems with this concept:

  • SQL Injection doesn’t have to look like a SQL fragment — A SQL Injection could be as simple as “id –” (see the section on Keywords). It doesn’t have to have a keyword like OR, UNION, etc. It doesn’t have to have special symbols like = or !=. It could be something that will trick even a human inspection of the input.
  • Lack of Grammar — Unlike SQLs, user input doesn’t follow any predictable grammar. There’s no way to know what the input means or why the user entered it. This makes any type of static analysis extremely limited and entirely based on heuristics, signatures, and search patterns.
  • Different parameters have different meanings — User input varies a lot across different parameters in the application and across all applications. Sometimes it’s a page number, sometimes a search request, and sometimes someone’s name. Therefore, it’s difficult to define rules that are always true for any input.
  • Safety is code dependent — While some parameters may be vulnerable to SQL injections, others might be properly handled and safe. The input O’BRIAN might be suspicious because it has a tag (‘) and, therefore, is trying to search for code vulnerabilities. At the same time, it might be safe if the code handles this particular input properly.

WAF Methods & Challenges

All WAF solutions face the same basic challenges. The difference between the tools is in how they balance those challenges and how good is the implementation.

Parameter Type

As mentioned above, SQL injection is all about escaping the parameter cage. However, the parameter cage depends on the parameter type. To escape the cage of a textual parameter you need a tag (‘). To escape the case of a numerical parameter, you need a literal delimiter character such as space, slash, etc. Numerical parameters can also be replaced by expressions that change these rules.

Detecting the parameter type might not be as simple as it seems considering problems such as Misleading Numerical Inputs mentioned above.

If we prevented any character that is not a letter or a digit, we could completely prevent SQL injection. However, that means that, for example, no one can enter a space anywhere in the application input. That’s not a problem for numerical input, but it is a problem for textual one.

By guessing the parameter type, a WAF can apply different filters to different parameter types and allow a wider range of input increasing the risk of SQL injection.

Special Characters

The parameter type problem expands into various special characters that can be used in SQL injection as well as other types of injections (e.g. file name injections like LFI, RFI, etc).

There are many special characters and when relying solely on filtering special characters we will quickly prevent many types of valid application input.

Examples of special characters are

  • Comments can be a double dash (- -), a /*, and in MySQL also a hashtag (#).
  • Parenthesis can be a function call or a subquery.
  • Strings are terminated by a tag (‘), but in MySQL, strings can also be enclosed by double quotes (“)
  • Numeric literals end with whitespace (space, CR, LF, Tab), a slash (/), a dash (-), and many other characters.
  • A semi-column (;) can terminate a SQL in a batch

Keywords

The other strategy is filtering keywords that appear in SQL like OR and UNION. However, in batch injection a semi-column (;) is optional and a new SQL can start at any point. That means that SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, and a few other keywords also need to be filtered.

But even without the influx of keywords, “UNION BANK OF CALIFORNIA” contains the word UNION. And it’s not difficult to think of phrases that contain the word OR.

Mixed Strategy

Ultimately, a good WAF solution needs to balance various strategies of looking at special characters, keywords, parameter types, and more.

Unfortunately, even with all the strategies combined, it is difficult to strike the right balance that will allow most of the legitimate inputs and prevent most of the malicious ones.

Automatic or Configurable

One of the challenges WAF solutions face originates in customer expectations. Customers want a one-click solution. Push one button and get protection from SQL injection. Since customers want, salespeople promise to deliver.

Unfortunately, as we’ve explained so far, it’s simply impossible. A solution that will work on the entire application and pass more legitimate input will also pass at least some types of SQL injection attacks. It only takes is one successful type of attack to compromise all the data in the database.

Instead of looking for the “One button” solution, customers that are willing to customize different strategies for different types of input can enjoy superior protection with less false positives.

The more configuration the WAF solution has and the more granular the control, the more likely it is that it can be tailored for your particular application and offer better protection.

Implementation

Just like any software solution, there are significant differences between good and bad WAF implementations. Here are a few examples that might seem funny but are entirely real:

  • WAF might filter the word “union” in lower case, but fail to filter “UNION” in upper case or “UnIoN” in mixed case.
  • WAF might filter the word “OR” but failed to see that %4F%52 is the same thing in hex.

There are many other potential flaws and weaknesses in WAFs and they ultimately become yet another piece of software whose bugs can be exploited.

Unfortunately, there is no simple and reliable way to measure the quality of a WAF solution and no unbiased publication. Testing it yourself with the many examples on this page can give you an idea of the quality of a solution.

The WAF Sanitizer

Some WAF solutions block malicious activity, others clean it, and some offer both options. The sanitizer is the portion of the WAF that cleans the malicious activity. Like any piece of software, this too can have weaknesses.

Sanitizers have a particular type of weakness that happens a lot – by removing one malicious attack they expose another malicious attack.

If the WAF solution has the option to block potentially malicious activity and avoid the sanitizer, it is safer to do so.

For example, these inputs:

1 O/**/R 1/**/=/**/1
un/**/ion SEselectLECT * F'RO'M

Could be sanitized by removing comments, symbols, and keywords to produce:

1 OR 1=1
union SELECT * FROM

In-Application Firewall

Blue Core Research developed a new type of technology that can supplement or replace a WAF. This technology runs inside the application instead of outside and offers control over the protection applied to different parts of the code without changes to the source.

The Core Audit Java Agent uses native JVM capabilities to add security at runtime to the regular JAR files you’re currently using.

The Core Audit Java Agent can add several types of security, but since this paper is about SQL Injection, we will only go into detail about the Basic Blocking capabilities that aim at addressing this threat.

The types of security the Core Audit Java Agent can add are:

  • Auditing (L2-3) — Full Capture of the application sessions and the activity they perform. Core Audit native application auditing levels 2-3 include report generation, alerting, forensics, analysis of behavioral profiles, and more.
  • Basic Blocking — Apply WAF-style firewall rules at various points in the application. More details below.
  • Blocking Policies (L4) — Apply custom rules about who can do what in the application. For example, restrict which IPs or users can run what activity at what times.
  • Reactive Blocking — Configure policies that automatically block IPs or users with too many failed logins or errors across multiple application servers.

Filter & Parameter Control

The Core Audit Agent comes with multiple filters that can be applied as needed to different parameters. You can apply tight SQL injection filters to some parameters and more lenient to others.

There are numerical filters for numerical data, String filters for textual data, general Safe filters that auto-detect the data type, Keyword filters, and more. Having multiple filters that can be applied in different ways gives you the flexibility to tailor the In-Application firewall to your application.

Depth Control

Filters can be applied to any method in the application. They can be applied to URL parameters that are received from the user or to a method that creates a SQL from them.

You can apply lenient filters to search parameters you believe to be safe, and strict filters when those parameters are added to the SQL to ensure they are as safe as you thought.

Areas in the application that have fragile code can be better protected, while well-tested portions of the code can be granted more latitude in their interaction with the user.

Since filters can be applied to any method within the application, they are not limited to filtering web-related activity. Filters can be applied to data read from files, from various network sources, from databases, internally generated activity, and more.

Context & Action Control

The In-Application firewall is part of the Core Audit generic application auditing technology. As such, all information reported about potential SQL injections is linked to the full user session context.

That means that every blocked activity will be related to the IP address, user, program, and every other piece of information reported by the application. You will be able to know what happened before and after even if those actions did not trigger an alert.

The In-Application firewall can block actions by throwing exceptions. These exceptions can be customized to fit the type of exception handled by each area of the code.

Additionally, the In-Application firewall has a high-quality sanitizer that can eliminate the identified threat without introducing another.

In-Application Firewall or WAF

One of the benefits of WAF is that it filters the activity before it reaches the application. The In-Application firewall cannot make this claim.

However, since WAF always runs outside the application, it’s missing the context in which the activity runs, and the ability to apply different filters at different depths in the application stack.

The In-Application firewall offers granular control over the types of filters so they can meet the specific needs of each portion of the application. There’s no need to compromise on a filter that is “generally okay” and doesn’t give the appropriate level of protection.

The In-Application Firewall runs within the application so there’s no need to deploy additional machines and filter the network traffic. It all runs on the same machine and as part of the application.

The In-Application Firewall comes with full application auditing delivered by Core Audit Levels 2-3 and can be upgraded to apply context-based blocking policies or reactive policies with Core Audit Level 4. All these capabilities are only possible with the Core Audit technology stack.

Summary

SQL injection is a difficult attack to combat, and rewriting the application using bind variables is the only true solution to the problem.

If converting the application to entirely use only bind variables is not a viable course of action, Core Audit technologies in the application and the database are the most effective in detecting and preventing possible SQL injection attacks.

Each independent approach has limitations, so the best solution is a mixed approach of improving the code, deploying effective detection in the database, and preventive measures in the application.

Beyond SQL injection, Core Audit technologies can help secure databases and applications to ensure your sensitive data is not compromised by external or internal actors.