Understanding SQL Injection - Blue Core Research

Introduction

SQL Injection is one of the most well-known attack vectors, and that’s largely because it poses a significant security challenge. The only way to understand the benefits and deficiencies of different solutions is by understanding the problem and the approach each solution has to solving it.

The Problem

The best way to understand the problem is to look at examples. There are many variations in SQL injection, and we’ll demonstrate most of them using this simple application example. So, let’s first understand the example.

Application Example

Account	0
First name	John
Last name	Doe
Last 4 of social	5555
	Submit

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 5555

Let’s assume our application is a simple web page that allows customer support representatives to see the credit card numbers of customers who call in.

For security reasons, the representative must enter the customer’s account number, first name, last name, and the last 4 digits of the customer’s social security number.

When all the information matches the customer’s data, the application will display the credit card numbers used by the customer.

It appears to be a secure design. Customer support representatives can only view the credit cards of real customers they talk to. How could they use this form to obtain information about customers they didn’t talk with?

The problem is not in the conceptual design but in the actual implementation of the code.
To make this page work, the application running inside the web server must run a SQL against the database. That SQL might look like the example above.
That SQL seems to be correct, with no apparent way to get credit card information without entering the correct customer’s information.
So, what is the problem with this implementation?

The Attack

Account	0
First name	John
Last name	Doe
Last 4 of social	0 OR 1=1
	Submit

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 0 OR 1=1

When you fill the form this way, you can see the resulting SQL next to it. Take a look at how the application placed each value in the correct location in the SQL.

When the database executes a SQL the database compares the conditions against every row in the table.

Let’s assume the first row is for customer 1001 named JOE SMITH, with SSN 9999. Here’s what that database evaluation of the conditions looks like:

Is the customer number 0? No, it’s 1001
Is the first name JOHN? No, it’s JOE
Is the last name DOE? No, it’s SMITH
Is the SSN 0? No, it’s 9999
Is 1=1? Yes! 1 is always equal to 1.

In SQL, the order of operations evaluates AND before OR, resulting in something like this:

(FALSE and FALSE and FALSE and FALSE)
or
(TRUE)

The ultimate result of this comparison is TRUE! That means the database will return the credit card number for JOE SMITH, and the application will display it.

The database will evaluate this condition to be true for every row in the table. So, the application will display every credit card in the table.

We filled out the form in a way the programmer who wrote the application did not expect. Therefore, the code did not protect against this attack, and we were able to trick the application into displaying every credit card in the database.

The String Variation

Here’s another little insight about condition evaluation: the database will consider ssn = ‘5555’ to be a valid comparison. The database will automatically convert the string ‘5555’ into the number 5555. To protect against the above attack, let’s embed the ssn as a string. Maybe then things would look different. Let’s fix the application to run this SQL instead:

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’
  AND last = ’Doe’
  AND ssn = '5555'

Now, the database will return an error if we fill the form with “0 OR 1=1” because it cannot convert that to a number. Have we prevented SQL injection?

Unfortunately, no. That is why there are so many variations on SQL injection. The attacker just needs to modify the attack. Consider this:

Account	0
First name	John
Last name	Doe
Last 4 of social	‘ or ‘a’ = ‘a
	Submit

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = '' or 'a' = 'a'

With this slight modification, the attack will still extract all the credit cards because ‘a’ is always equal to ‘a’. Take a look at how replacing 5555 with this new input results in the above SQL.

Truncating the SQL

Account	0 OR 1=1 —
First name	John
Last name	Doe
Last 4 of social	0 OR 1=1
	Submit

SELECT card FROM cards WHERE
  account = 0 OR 1=1 -- AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 5555

Another complication an attacker might face is if there’s another AND condition after the injected code. In the above example, we injected into the account number that has several AND conditions after it.

In SQL, double dash (- -) comments out the rest of the line. That way, the attacker can dispose of any additional conditions further down the line.

In MySQL, # is another way to comment out the rest of the line with identical effect to – –

These comment styles comment out the rest of the line. But what if a SQL has multiple lines?

Another way to add a comment to a SQL is by enclosing the comment in /* */. This comment style will span multiple lines.

If a comment starts (/*) but doesn’t end (*/), most databases will consider it to be a comment until the end of the SQL. So, adding /* instead of — will comment out the rest of the SQL on single or multi-line SQLs.

Another point to remember is that in MySQL /*! */ is not a comment, as the ! symbol disables the comment. That can help attackers bypass security measures that ignore comments.

Parameter Fragmentation (HPF)

Account	0 OR/*
First name	/1/
Last name	/=/
Last 4 of social	*/1
	Submit

SELECT card FROM cards WHERE
  account = 0 OR/* AND
  first = ’*/1/*’ AND
  last = ’*/=/*’ AND
  ssn = */1

Another variation is to comment out entire sections of the SQL and split the attack across multiple parameters.

In SQL, any text between /* and */ is a comment and is ignored.

The nice thing about this attack is that in addition to disabling entire portions of the SQL, it also splits the attack across multiple parameters. That makes it almost impossible to detect using WAF technology. More on that later.

Splitting the SQL

Account	0
First name	John
Last name	Doe
Last 4 of social	1 UNION SELECT SSN FROM USERS
	Submit

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1 UNION SELECT SSN FROM USERS

What if the vulnerable form doesn’t access the table or columns we want to attack? Will that prevent us from doing a SQL injection?

The CARDS table only includes the last 4 digits of the social security number. Let’s assume we want to read the full SSN stored in the SSN column of the USERS table.

But we have a problem. This SQL doesn’t retrieve the USERS table and its SSN column. How can we exploit this form that reads the cards table to display information from a completely different table?

The answer is the UNION operator. It combines the output of two different SQLs into a single result set. The above example uses it to append all the social security numbers from the USERS table after the list of credit card numbers.

In this example, the first SELECT returns no data from the CARDS table because the condition doesn’t match. However, the second SELECT will dump the entire SSN column from the USERS table. The union operator combines both into a single result set. Therefore, the application will display all the social security numbers from the USERS table as if they were credit cards.

SQL Batch

Account	0
First name	John
Last name	Doe
Last 4 of social	0; CREATE LOGIN hacker WITH PASSWORD = ‘password’
	Submit

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 0; CREATE LOGIN hacker WITH PASSWORD = 'password'

In Microsoft SQL Server and Sybase, SQLs are sent to the database in batches. A batch is a sequence of SQLs sent to the database in a single request. All requests in these databases are batches.

You can use a semi-column to separate SQLs in the batch:

SELECT c1 FROM t1; SELECT c2 FROM t2

However, the semi-column is optional and this batch will yield the same result:

SELECT c1 FROM t1 SELECT c2 FROM t2

Batches give SQL injection an entirely new level of malevolence. The injected fragment can execute a completely different SQL allowing it to change data, alter the database, and more.

The example above assumes the application is connected to SQL Server with a privileged account. Note that the semi-column was added for clarity but is optional.

When executed, this SQL will not return any credit cards but will create a new user in the database called ‘hacker’ with the password ‘password’.

This method lets attackers modify data, alter permissions, change the database, and more. Attacks are only limited by the privileges granted to the application. If the application has permission, the attacker can execute the shutdown command and shut down the database.

If complete control over the database isn’t enough, SQL Server also has the xp_cmdshell procedure that runs operating system commands. There’s also xp_regread that can read the registry. Using these, an attacker can gain complete control over the machine.

While it’s always true that applications should never use a privileged account, in SQL Server this rule is crucial. Applications that connect to SQL Server databases must NEVER use a privileged account. Unfortunately, many applications in SQL Server connect using the highly privileged SA account, and that exposes them to even greater risks.

Variations

So far, all the examples we looked at used OR 1=1. That is the classic example everyone uses. It can create the illusion that if a SQL does not contain 1=1, there is no SQL injection.

Unfortunately, there are endless variations on conditions that are always true. 1=1 is just one grain of sand in an infinite desert.

Let’s start with trivial examples that use simple operators:

Expression	Explanation
17 != 21	17 is always different than 21
99 < 100	99 is always smaller than 100
9-1 > 2+1	9-1 is always greater than 2+1
‘ABC’ = ‘ABC’	The string ‘ABC’ is always identical to ‘ABC’
‘hello’ != ‘world’	The string ‘hello’ is always different than the string ‘world’

We can also use some database functions to create slightly more complex expressions:

Expression	Explanation
‘XYZ’ like ‘%Y%’	XYZ contains a Y
‘B’ in (‘A’, ‘B’, ‘C’)	B is in the list
7 in (select 7)	7 is in the result set
translate(‘AB’,’AB’,’XY’) = ‘XY’	replacing A with X and B with Y in AB results in XY

While there are endless examples like this, they all use literals (constants). If we had a tool that was smart enough to calculate every expression the database can perform, it would be able to determine that all these expressions are always true.

No security solution is that smart, but this theoretical ability to detect SQL injections only means one thing. It means we need to take another small step forward and examine expressions no solution can determine are always true.

The expressions below use column values, and therefore, no solution can calculate them. It is impossible to distinguish these expressions from actual application logic.

Expression	Explanation
last != ‘XYZ’	last name is not XYZ
ssn > 0	SSN is greater than 0
id not in (0, 1, 2)	ID is not 0, 1, or 2
first != last	first and last names are different
id is not null	The ID is not a null value

While these expressions will probably always be true, it’s impossible to determine that without understanding or evaluating the data in each column. We’ll discuss this further in the section about static analysis.

Non-Trivial Input

All the examples so far entered data into fields in the form. That is the most trivial way to interact with the application, but not the only one.

The HTML form we used usually sends the query to the application web server using a GET or POST request method. A GET request might look like:

.../cards?id=0&first=JOHN&last=DOE&ssn=5555

You’ve probably seen strings like that in the URL of your web browser.

You can also send the same GET request using AJAX, and those are not visible in the URL. However, you can easily see them in the developer console (try hitting F12 in your web browser).

POST requests are not visible in the URL but usually contain the same information. You can also see those in the developer console of any web browser.

When attackers hack, they usually send GET and POST requests directly. That allows them to manipulate fields in ways the application form might not allow.

For example, a form can have hidden fields you cannot see, or the application may use JavaScript to add additional parameters to the request.

These non-trivial inputs are slightly more challenging to identify and modify. While harder to exploit, this lack of accessibility means developers usually do not protect them, and testers rarely test them.

Here are a few examples of information you cannot edit in a form but can change in a GET or POST request:

Hidden input fields – these are input fields that are not visible to the user. Applications use these to keep context information about the state of the application or the user’s flow through it. While the user cannot see these fields on the screen, it is not difficult to modify their content.
For example, a hidden field can store some ID the application needs to query the database.
Select fields – These are single-choice combo boxes or multi-choice checkboxes. While the user can only select from the list of options, an attacker can send any value.
For example, if you can only choose between two options that send a value of 0 or 1, an attacker can send the value of ‘0 or 1=1’.
Links – We often click on links, buttons, trees, images, and other components in web applications. These links frequently contain embedded values attackers can manipulate.
For example, the “Next Page” link usually contains the number of the next page (e.g., page=2) or the number of the first item on the next page (e.g., start=11). Attackers can modify them to include a SQL injection.
Cookies – Cookies are a way to store state information in the web browser and send it to the application. For example, this is where most applications store the session ID and various user preferences.

Everyone overlooks these types of inputs. QA never tests them, and regular users only use them in the way you’d expect. Most automated vulnerability detection systems will also fail to test these. That means that flaws in validating these types of inputs can lie dormant for years, never noticed or addressed.

These non-trivial inputs tend to be more vulnerable and provide a fertile ground for targeted attacks by experienced hackers.

Parameter Pollution (HPP)

Parameter Pollution is an attack that requires manipulation of the GET/POST requests. For example, what happens if the above application receives two SSN fields in the request instead of one? In other words, the GET request looks like this:

?id=0&first=JOHN&last=DOE&ssn=5555&ssn=7777

The answer depends on the technology stack the application uses and the methods it uses to extract the parameters. Some technologies will return a comma-separated list of the duplicate parameter. So the above request will return:

5555,7777

That creates an opportunity for similar attack to Parameter Fragmentation (HPF) but without the need for multiple parameters or knowing their order in the SQL. For example, sending these 4 SSNs (broken into lines for clarity):

…/cards?id=0&first=JOHN&last=DOE
&ssn=0 OR /*
&ssn=*/ 1 /*
&ssn=*/ = /*
&ssn=*/ 1

That will result in this SQL:

SELECT card FROM cards WHERE
   account = 0 AND
   first = ’JOHN’ AND
   last = ’DOE’ AND
   ssn = 0 OR /*,*/ 1 /*,*/ = /*,*/ 1

That is a fragmented SQL injection attack using a single parameter.

No Keywords

Account	ACCOUNT/*
First name	John
Last name	Doe
Last 4 of social	1
	Submit

SELECT card FROM cards WHERE
  account = ACCOUNT /* AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

So far, every attack contained SQL keywords such as OR and UNION. Even though these are simple words in English that would appear in regular input, they make the input look suspicious.

But with a little investigation and effort, it’s possible to create a SQL injection that has no keywords or even spaces.

In this example, we check whether a column is equal to itself and comment out the rest of the SQL. Since the account is always equal to itself, this SQL will return all the cards in the table.

Truncating without Comments

Account	1 OR 1=1 OR 1=1
First name	John
Last name	Doe
Last 4 of social	1
	Submit

SELECT card FROM cards WHERE
  account = 1 OR 1=1 OR 1=1 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

Another recurring motif in these examples is leveraging comments. Many times, we use comments to remove unwanted portions of the SQL. But there’s another way.

We highlighted the OR. In the order of precedence, ORs are last, so anything separated by an OR is evaluated separately. The red portion always evaluates to true, so the entire WHERE clause is always true.

Misleading Numerical Inputs

Account	cast(1 as varchar(10)) OR 1=1
First name	John
Last name	Doe
Last 4 of social	1
	Submit

SELECT card FROM cards WHERE
  account = cast(1 as varchar(10)) OR 1=1 AND
  first = ’John’ AND
  last = ’Doe’ AND
  ssn = 1

This approach aims to defeat particular defense systems. The idea is to create confusion about the nature of the parameter.

In this example, the Account parameter doesn’t appear to be numerical input. Therefore, an automatic defense will try to validate it as a string. As a string, it is safe to execute because it has no tags (‘) that terminate the string.

CAST() is a standard database operator that converts a number to a string. The database will consider this to be valid input. To compare the string value to the account, it will convert it back to a number.

Login screens are also vulnerable to SQL injection attacks. There are two ways the application code can verify the user. Both methods are susceptible to attacks.

Query Validates Password

Username	admin’–
Password	x
	Login

SELECT id FROM users WHERE username = 'admin'--' and password = hash('x')

In this authentication method, the query validates the password and will look like this:

SELECT id FROM users where username = 'admin' and password = hash('x')

The select is looking for a row with the correct user and password. If the password matches, the query returns the row. Otherwise, the query returns no data.

In the where clause, the SQL will contain a call to a hash function or a literal with the calculated hash.

In the above example, the password-validating portion of the SQL is disabled. The double dash (- -) comments out the rest of the SQL, and the query will return the row for admin. That means the password is correct.

Query Retrieves Password

Username	‘ union select ‘5f4dcc3b5aa765d61d8327deb882cf99
Password	password
	Login

SELECT password FROM users WHERE
  username = '' union
  select '5f4dcc3b5aa765d61d8327deb882cf99'

This authentication method retrieves the hash from the database and validates it in the code. The SQL will look like this:

SELECT password FROM users WHERE username = 'admin'

In the above example, the SQL will return the MD5 of the word ‘password.’ When the application compares this md5 to the password we provided, it will match.

In this method, instead of the hash stored in the database, we return the password hash we want to return.

Introduction to Solutions

It is vital to understand a SQL injection is fundamentally exploiting a flaw in the code. You cannot ensure the code has no SQL injection vulnerabilities, just like you cannot guarantee it has no bugs. However, like with any code vulnerability, the best solution is to fix the flaw in the code. The alternatives are to try and prevent, detect, or make it difficult to exploit vulnerabilities, but all these approaches have limitations.

The following sections review the different approaches and discuss the benefits and limitations of each. Before diving into the details, let’s take a high-level overview and get the lay of the land.

Solutions in the Code

As mentioned above, the best solution is to have high-quality code and proper coding techniques that avoid the possibility of a SQL injection bug.

The best way to avoid a SQL injection bug is to use bind variables. That is the perfect solution and it completely prevents SQL injections. Slightly less robust methods include escaping input and input validation. It is also recommended to reduce client error reporting.

Later, we will go into these techniques in more detail, exploring their value and limitations, and discuss how to implement them.

Database Solutions

SQL injection is ultimately a database attack that takes advantage of the flexible nature of SQL. Therefore, the most obvious place to look for a solution is in the database.

There are several database technologies to detect or prevent SQL injections, but most are ineffective. An exception is the unique anomaly analysis technology in Core Audit, which is highly effective.

We will later discuss these database technologies in more detail, including their differences, pros, and cons.

Web Application Firewall (WAF)

The difficulties in identifying SQL injection at the database level gave rise to a new product category: the Web Application Firewall (WAF).

The idea was that it would be easier to detect the SQL injection attack when examining the application input rather than when looking at the SQL sent to the database.

It’s a nice theory, but it only works on specific examples. When looking at the full scope of SQL injection attacks described in this paper, the problem becomes insurmountable. More on that later.

In-Application Firewall

A new technology developed by Blue Core Research aims to take WAF technology to the next level and incorporate it into the application.

The Core Audit Java Agent provides detailed audit data on all the application activity and everything that happens inside it. One of its additional capabilities is to apply firewall-type blocking rules around the entire application or sensitive areas in the code.

An In-Application Firewall can benefit from context and information available only inside the application as well as apply targeted protections to specific areas.

Detect

The wide range of approaches described above doesn’t only differ in their technology and effectiveness but also in the type of protection they deliver. While many security people tend to dismiss detective capabilities, that is a mistake.

Detective approaches aim to alert you of a possible SQL injection. The downside is that they don’t block the attack, and you’ll need to take protective action as well as fix the code.

It may appear as though the value of such detective solutions is lower. However, unlike prevention, you can tune detection to be more sensitive. That is because false positives are alerts that don’t prevent legitimate users from working.

That means detection is more sensitive and will alert you of most attacks. It provides effective security with little to no false negatives (undetected events).

A powerful detective engine such as the one found in Core Audit is likely to notify you of any SQL injection attempt.

Prevent

There are two primary categories for preventive approaches: blocking and curing. Depending on the type of solution, one or both options may be available. The Core Audit Java Agent offers both.

While blocking prevents an attack altogether, curing aims to remove only the offensive part of the request. The benefit of blocking is that if the solution recognizes the attack, the prevention will be complete. Curing bears the potential risk that part of the attack will get through.

There are significant differences between various solutions, but none can claim 100% effectiveness. That is to block all attacks and pass all legitimate activity. That’s the benefit of detection, but more on that later.

Because preventive solutions can block legitimate activity, they let you disable some of their filters. That limits their effectiveness even further.

As the activity becomes more versatile, blocking must be less strict (with higher false negatives). Otherwise, the likelihood of blocking legitimate activity increases (more false positives).

Sanitize

Finally, some solutions aim to cure the problem without blocking it. With these solutions, malicious injections are removed from the input and allowed to run safely. Core Audit also has this capability.

While this is a very nice advertisement, in many solutions it comes at a heavy price of introducing additional vulnerabilities. The problem is that an attacker can craft an attack that will leverage the sanitizing process as part of the attack.

This means that In addition to the limited effectiveness of SQL injection detection, using sanitizing can reduce the effectiveness of the solution by offering the attacker additional tools to bypass the security. More on that later.

Escaping the Parameter Cage

When looking at the various examples it becomes clear that SQL injection is about changing the SQL contract by escaping the parameter cage.

If the parameter is a string, then a SQL injection will need to terminate the string before it can modify the SQL. That usually means having a tag (‘) in the input.

Numerical parameters are terminated by whitespace (space, tab, newline, etc.) or other characters that are not part of the literal (for example, a /).

Detecting an escape from the parameter cage is one method for recognizing SQL injection in application firewalls but it relies on correctly identifying the parameter type.

Another common method is the identification of suspicious keywords or special characters. This method also has significant limitations.

Solutions in the Code & Best Practices

As mentioned above, the best solution is to have good code. Proper coding techniques can reduce and even eliminate the risk of SQL injection. This section will discuss these in detail.

Bind Variables

All SQL injection attacks involve escaping the previously mentioned parameter cage. Fortunately, all databases come with a full-proof API for preventing that. It’s called bind variables.

Bind variables are a way of sending the parameters separately from the SQL rather than embedding the constants into it. For example, the previous SQL used in the examples would look like this with bind variables:

SELECT card FROM cards WHERE
  account = :1 AND
  first = :2 AND
  last = :3 AND
  ssn = :4

When this SQL is sent to the database, the programmer needs to also send the values for the bind variables (:1, :2, :3, :4).

Writing SQLs this way avoids the database misinterpreting part of the value for being part of the SQL construct. The values can contain any character and the database will not be confused.

Using bind variables also has a performance benefit. That is, actually, their primary purpose. When the same SQL executes multiple times with different bind variables, the database parses it only once, making those executions faster.

Unfortunately, rewriting the entire application to use bind variables might be expensive, difficult, or entirely impractical.

Escaping Input

The most common approach for fixing the application code is to escape all input fields. For example, if an input field contains a tag (‘), replace it with two tags (‘ ‘). The escaped quote can be embedded in a SQL without causing an injection attack.

Escaping the input John O’Brian in our example will yield this SQL:

SELECT card FROM cards WHERE
  account = 0 AND
  first = ’John’ AND
  last = ’O''Brian’ AND
  ssn = 5555

Escaping input is only half the solution because it only solves the problem for string input. Numerical input requires a different approach mentioned below.

While this is not the best method for preventing SQL injection, it is the most commonly used one. This is due to the relatively small impact on the code and, consequently, cheaper implementation costs.

Numeric Input

Fixing numerical input is, on the one hand, a very simple solution. However, this solution has to be individually implemented for every numerical input making it potentially time-consuming, difficult, and costly to implement.

Every programming language contains a function for converting strings to numbers and means of embedding those numbers into a string. By converting the input into a number before embedding it into the SQL string we ensure that it is an actual number.

For example, unsafe code in PHP might look like this:

$sql .= " id= ".$_GET["ssn"];

While the safe version looks like this:

$sql .= " id= ".intval($_GET["ssn"]);

$_GET[“ssn”] reads the GET parameter ssn and wrapping it in intval() will ensure that it’s a number.

In Java, the unsafe looks like this:

sql += " id = "+ssn;

While the safe version is this:

sql += " id = "+Integer.parseInt(ssn)

Similar examples can be made for any programming language from Python to C. The idea is always the same — convert to a number and then back into a string.

It should be noted that there are slight differences in the way this technique works in different languages and those depend on how the conversion function works.

For example, the intval() function in PHP would convert part of the string if possible, so the string “7 OR 1=1” would be converted to 7. If the input could not be parsed, the function would return the value 0.

In Java, however, the Integer.parseInt() function would attempt to convert the entire string to a number and any obstacle would cause it to throw an exception.

Regardless of the differences, all these implementations are safe and none would allow a SQL injection to pass through the numerical parameter.

Input Validation

Input validation is the concept of ensuring the input is acceptable and fits the operational parameters of the function. It is a concept that goes far beyond stopping SQL injection. It aims to prevent several types of bugs and attacks.

Input validation is a much broader subject than SQL injection. Depending on interpretation, it may focus on other details while ignoring issues related to SQL injection. For example, it could include tests that ensure strings only have escaped tags (‘) and numbers are always valid. However, input validation tests could easily ignore these particular tests.

For example, consider this piece of PHP code:

if ($_GET['ssn'] > 0 &&
$_GET['ssn'] < 9999) return;

$sql .= " id= ".$_GET["ssn"];

This code checks that the ssn is between 0 and 9999 and will reject any input that is not within range. While it seems like a safe implementation that tests the input, sending it “7 OR 1=1” will bypass the test and result in a SQL injection.

The reason is that to perform the comparison, PHP must convert the string input to a number. It is as if we wrapped each of the $_GET[‘ssn’] expressions inside the if() statement with the intval() function. Therefore, “7 OR 1=1” will be converted to the number 7, which falls within the valid range.

If you plan to use input validation to prevent SQL injection attacks, you must specifically test inputs are valid from an SQL injection standpoint. That would usually entail escaping the input as previously described.

When application developers display error messages to the user, they often include the errors they got from the database. While that additional information is useless to the user, it is valuable to the developer who needs to debug the problem.

However, this same information is also valuable to a hacker who wants to perform a SQL injection attack. Database error messages can contain SQL fragments and other indicators that help hackers craft the attack.

Without these database error messages, hackers must do what’s known as Blind Injection. That involves more complex techniques that attempt to obtain information without seeing the database error messages.

One such technique is to use if() conditions that call a sleep function (e.g., waitfor). In other words, the hacker will know whether something is true based on how long the query takes to complete.

The bottom line is that Blind Injection is much more tricky, and there is no good reason to display these error messages to the users. When a developer needs to debug a problem, they can always enable error messages for a particular scenario while performing their investigation.

Errors & Vulnerability Detection

Most applications have many pages with many input fields. Hackers are far too lazy to try them all. So hackers use scanning tools that find every input on every page and enter tricky values. The type of input that would cause vulnerable code to return an error.

If the scanner finds an application error using these inputs, there is good reason for further investigation. They can probably exploit a vulnerability on that page.

It is important to remember that this discussion about Errors and Vulnerabilities assumes that the hackers have to “experiment” on the live application. Hackers that have another type of access to the application software will use it to detect vulnerabilities and learn how to exploit them.

Other types of access include:

Off-the-shelf software that can be downloaded and installed
Access to non-production systems (Dev, QA, pre-prod, etc.)
Access to the source code or free software

That leads to several interesting conclusions:

Don’t show errors to the users. Users will complain when they don’t get data. Displaying cryptic error numbers or messages does not improve user experience.
Errors are helpful for developers who need to debug a problem and for hackers who want to find vulnerabilities in the application. Not displaying any errors can help hide where the code is vulnerable.
Monitor for database errors. Applications that operate properly should not generate errors. However, you are likely to see database errors when hackers try to locate vulnerable code and exploit it.
SQL injection scanning would likely generate errors involving Invalid SQL. Be on the lookout for such errors. That is one of the many things Core Audit can help you achieve.
Run the scanning tool yourself. Vulnerability scanning tools used by hackers are freely available on the internet. Running such tools against your application will show you at least some of the places where your code is vulnerable.

Database Solutions

SQL injection is ultimately a database attack that leverages the flexible nature of SQL. This section reviews technologies that can help detect or prevent SQL injection in the database.

Static SQL Analysis

SQL analysis is the oldest technology that tried to detect SQL injection. Its successor, WAF technology was born as a result of the severe limitations of this technology.

Static analysis aims to identify suspicious expressions in the WHERE clause of the SQL. These tools will always detect the classic example of OR 1=1, but more complex expressions can be challenging.

The section about Variations is a good place to start exploring the boundaries of a static SQL analyzer.

No static analyzer supports such a wide array of operators, functions, and expressions. But even if one did, it would still be limited to expressions that use literals.

Expressions that take advantage of column names (such as first != last) or SQL Batch are outside the theoretical limits of static analyzers and they would be unable to detect those.

Static analyzers must perfectly immitate the database expression evaluation logic. To explain the magnitude and complexity of that, here is a sample shortlist of expressions that static analyzers could identify:

Operators. Static analyzers must understand every database operator including +, -, *, /, =, !=, <, <=, >, >=, <>, IN, NOT IN, ANY, SOME, ALL, BETWEEN, LIKE, etc.
Functions. Static analyzers must know all the database functions including string manipulation (Concat, Substr, Lower, Upper, Trim, Replace, Translate, etc), numerical functions (Ceil, Round, Floor, Trunc, Mod, etc), character functions, date functions, conversion functions, and more.
Expressions. Static analyzers must evaluate expressions the same way the database would. That includes, for example, IF conditions, subqueries, WITH clauses, and more.

AI & SQL Intention

This solution category leverages machine learning and AI to differentiate between a good and bad SQL. They rely on complex language analysis to understand the intention of a SQL.

It’s difficult to tell if this technology is effective. That is because it relies on each customer teaching the solution the good and bad SQLs in their particular environment.

Any misclassified SQL can be attributed to the customer not manually classifying that example. Similarly, detecting one SQL injection doesn’t mean the next one will be detected as well. It’s just part of the unpredictable nature of AI.

However, it is difficult to imagine that even the best AI, taught by the most diligent customer, will be able to detect OR first != last. It’s just a condition that can be a legitimate part of the application logic.

Anomaly Analysis

Blue Core Research developed a unique technology for detecting SQL injection. Unlike static analyzers and AI, we don’t attempt to understand each SQL. As we’ve demonstrated above, the simple truth is that it is impossible to do this well.

Our solution to SQL injection detection leverages two of our other technologies: Full Capture and the Security Repository. These technologies maintain a list of every SQL construct that runs in the database.

Since application SQLs are repetitive, over time, we learn every possible SQL the application can generate. We flag SQL constructs we have not seen before as possible SQL injection attempts.

For example, if we’ve never seen the application run a particular SQL with OR first != last in the where clause, we’ll alert you of a possible SQL injection attempt when we see it.

Anomaly Analysis is also the only technology that can detect the injection of an entire SQL using a batch. That type of injection is difficult to detect because there’s nothing suspicious about the injected SQL itself. The only thing suspicious is that the SQL is something the application never does. Anomaly Analysis will easily detect that.

The Core Audit Anomaly Analysis engine can do a lot more than identify SQL injections. It is just one more benefit of this technology. Anomaly Analysis is the most effective technology for detecting SQL injection.

Application Solutions

Some solutions aim to protect the application from receiving input that will trigger a SQL injection attack. They aim to identify the type of input that can cause it and block it.

The challenge, as demonstrated above, is that SQL injection attacks are incredibly versatile. It is, therefore, next to impossible to detect them all.

Web Application Firewall (WAF)

Web Application Firewall was invented to address the threat of SQL injection. At the time, the only other technology was the ineffective Static SQL Analysis. So the bar was not very high.

Some vendors even claim to fix the SQL injection problem without fixing the application code. These claims are appealing to customers. However, they are dangerously misleading, if not utterly false. The term Firewall in WAF gives customers a false sense of security.

The idea behind WAF is to perform static analysis on the application input instead of the SQL the application sends to the database. The expectation was that the injection attempt would be easier to identify in the input rather than when embedded in the SQL. That is a nice theory, but as you’ll show below, it is incorrect.

WAF Concept

The concept behind WAF is to examine the user input before it’s incorporated into the SQL. The idea is that a SQL fragment will look suspicious in the user input but impossible to detect once it blends into the complete SQL.

For example, having OR first!=last in the where clause of a SQL could be a normal part of the application logic and impossible to distinguish from it. However, that phrase would be highly suspicious in regular user input.

Many hackers are lazy and use SQL Injection scanning tools. These tools go through every parameter on every page searching for potential vulnerabilities. Accordingly, many WAF implementations turned from something that protects against SQL injection to something that detects the type of unusual input these tools generate. This isn’t protection against SQL injection nor is it effective since there are many tools (including custom ones).

However, this brings up an interesting question – Why purchase a WAF that checks user input against known patterns in scanning tools when you can find all those vulnerabilities yourself? All you need to do is run a SQL injection scanning tool. These tools are available on the internet for free.

There are several fundamental problems with this concept:

SQL Injection doesn’t have to look like a SQL fragment — A SQL Injection could be as simple as “id –” (see the section on Keywords). It doesn’t have to have a keyword like OR, UNION, etc. It doesn’t have to have special symbols like = or !=. It could be something that will trick even a human inspection of the input.
Lack of Grammar — Unlike SQLs, user input doesn’t follow any predictable grammar. There’s no way to know what the input means or why the user entered it. This makes any type of static analysis extremely limited and entirely based on heuristics, signatures, and search patterns.
Different parameters have different meanings — User input varies a lot across different parameters in the application and across all applications. Sometimes it’s a page number, sometimes a search request, and sometimes someone’s name. Therefore, it’s difficult to define rules that are always true for any input.
Safety is code dependent — While some parameters may be vulnerable to SQL injections, others might be properly handled and safe. The input O’BRIAN might be suspicious because it has a tag (‘) and, therefore, is trying to search for code vulnerabilities. At the same time, it might be safe if the code handles this particular input properly.

WAF Methods & Challenges

All WAF solutions face the same basic challenges. The difference between the tools is in how they balance those challenges and how good is the implementation.

Parameter Type

As mentioned above, SQL injection is all about escaping the parameter cage. However, the parameter cage depends on the parameter type. To escape the cage of a textual parameter you need a tag (‘). To escape the case of a numerical parameter, you need a literal delimiter character such as space, slash, etc. Numerical parameters can also be replaced by expressions that change these rules.

Detecting the parameter type might not be as simple as it seems considering problems such as Misleading Numerical Inputs mentioned above.

If we prevented any character that is not a letter or a digit, we could completely prevent SQL injection. However, that means that, for example, no one can enter a space anywhere in the application input. That’s not a problem for numerical input, but it is a problem for textual one.

By guessing the parameter type, a WAF can apply different filters to different parameter types and allow a wider range of input increasing the risk of SQL injection.

Special Characters

The parameter type problem expands into various special characters that can be used in SQL injection as well as other types of injections (e.g. file name injections like LFI, RFI, etc).

There are many special characters and when relying solely on filtering special characters we will quickly prevent many types of valid application input.

Examples of special characters are

Comments can be a double dash (- -), a /*, and in MySQL also a hashtag (#).
Parenthesis can be a function call or a subquery.
Strings are terminated by a tag (‘), but in MySQL, strings can also be enclosed by double quotes (“)
Numeric literals end with whitespace (space, CR, LF, Tab), a slash (/), a dash (-), and many other characters.
A semi-column (;) can terminate a SQL in a batch

Keywords

The other strategy is filtering keywords that appear in SQL like OR and UNION. However, in batch injection a semi-column (;) is optional and a new SQL can start at any point. That means that SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, and a few other keywords also need to be filtered.

But even without the influx of keywords, “UNION BANK OF CALIFORNIA” contains the word UNION. And it’s not difficult to think of phrases that contain the word OR.

Mixed Strategy

Ultimately, a good WAF solution needs to balance various strategies of looking at special characters, keywords, parameter types, and more.

Unfortunately, even with all the strategies combined, it is difficult to strike the right balance that will allow most of the legitimate inputs and prevent most of the malicious ones.

Automatic or Configurable

One of the challenges WAF solutions face originates in customer expectations. Customers want a one-click solution. Push one button and get protection from SQL injection. Since customers want, salespeople promise to deliver.

Unfortunately, as we’ve explained so far, it’s simply impossible. A solution that will work on the entire application and pass more legitimate input will also pass at least some types of SQL injection attacks. It only takes is one successful type of attack to compromise all the data in the database.

Instead of looking for the “One button” solution, customers that are willing to customize different strategies for different types of input can enjoy superior protection with less false positives.

The more configuration the WAF solution has and the more granular the control, the more likely it is that it can be tailored for your particular application and offer better protection.

Implementation

Just like any software solution, there are significant differences between good and bad WAF implementations. Here are a few examples that might seem funny but are entirely real:

WAF might filter the word “union” in lower case, but fail to filter “UNION” in upper case or “UnIoN” in mixed case.
WAF might filter the word “OR” but failed to see that %4F%52 is the same thing in hex.

There are many other potential flaws and weaknesses in WAFs and they ultimately become yet another piece of software whose bugs can be exploited.

Unfortunately, there is no simple and reliable way to measure the quality of a WAF solution and no unbiased publication. Testing it yourself with the many examples on this page can give you an idea of the quality of a solution.

The WAF Sanitizer

Some WAF solutions block malicious activity, others clean it, and some offer both options. The sanitizer is the portion of the WAF that cleans the malicious activity. Like any piece of software, this too can have weaknesses.

Sanitizers have a particular type of weakness that happens a lot – by removing one malicious attack they expose another malicious attack.

If the WAF solution has the option to block potentially malicious activity and avoid the sanitizer, it is safer to do so.

For example, these inputs:

1 O/**/R 1/**/=/**/1
un/**/ion SEselectLECT * F'RO'M

Could be sanitized by removing comments, symbols, and keywords to produce:

1 OR 1=1
union SELECT * FROM

In-Application Firewall

Blue Core Research developed a new type of technology that can supplement or replace a WAF. This technology runs inside the application instead of outside and offers control over the protection applied to different parts of the code without changes to the source.

The Core Audit Java Agent uses native JVM capabilities to add security at runtime to the regular JAR files you’re currently using.

The Core Audit Java Agent can add several types of security, but since this paper is about SQL Injection, we will only go into detail about the Basic Blocking capabilities that aim at addressing this threat.

The types of security the Core Audit Java Agent can add are:

Auditing (L2-3) — Full Capture of the application sessions and the activity they perform. Core Audit native application auditing levels 2-3 include report generation, alerting, forensics, analysis of behavioral profiles, and more.
Basic Blocking — Apply WAF-style firewall rules at various points in the application. More details below.
Blocking Policies (L4) — Apply custom rules about who can do what in the application. For example, restrict which IPs or users can run what activity at what times.
Reactive Blocking — Configure policies that automatically block IPs or users with too many failed logins or errors across multiple application servers.

Filter & Parameter Control

The Core Audit Agent comes with multiple filters that can be applied as needed to different parameters. You can apply tight SQL injection filters to some parameters and more lenient to others.

There are numerical filters for numerical data, String filters for textual data, general Safe filters that auto-detect the data type, Keyword filters, and more. Having multiple filters that can be applied in different ways gives you the flexibility to tailor the In-Application firewall to your application.

Depth Control

Filters can be applied to any method in the application. They can be applied to URL parameters that are received from the user or to a method that creates a SQL from them.

You can apply lenient filters to search parameters you believe to be safe, and strict filters when those parameters are added to the SQL to ensure they are as safe as you thought.

Areas in the application that have fragile code can be better protected, while well-tested portions of the code can be granted more latitude in their interaction with the user.

Since filters can be applied to any method within the application, they are not limited to filtering web-related activity. Filters can be applied to data read from files, from various network sources, from databases, internally generated activity, and more.

Context & Action Control

The In-Application firewall is part of the Core Audit generic application auditing technology. As such, all information reported about potential SQL injections is linked to the full user session context.

That means that every blocked activity will be related to the IP address, user, program, and every other piece of information reported by the application. You will be able to know what happened before and after even if those actions did not trigger an alert.

The In-Application firewall can block actions by throwing exceptions. These exceptions can be customized to fit the type of exception handled by each area of the code.

Additionally, the In-Application firewall has a high-quality sanitizer that can eliminate the identified threat without introducing another.

In-Application Firewall or WAF

One of the benefits of WAF is that it filters the activity before it reaches the application. The In-Application firewall cannot make this claim.

However, since WAF always runs outside the application, it’s missing the context in which the activity runs, and the ability to apply different filters at different depths in the application stack.

The In-Application firewall offers granular control over the types of filters so they can meet the specific needs of each portion of the application. There’s no need to compromise on a filter that is “generally okay” and doesn’t give the appropriate level of protection.

The In-Application Firewall runs within the application so there’s no need to deploy additional machines and filter the network traffic. It all runs on the same machine and as part of the application.

The In-Application Firewall comes with full application auditing delivered by Core Audit Levels 2-3 and can be upgraded to apply context-based blocking policies or reactive policies with Core Audit Level 4. All these capabilities are only possible with the Core Audit technology stack.

Summary

SQL injection is a difficult attack to combat, and rewriting the application using bind variables is the only true solution to the problem.

If converting the application to entirely use only bind variables is not a viable course of action, Core Audit technologies in the application and the database are the most effective in detecting and preventing possible SQL injection attacks.

Each independent approach has limitations, so the best solution is a mixed approach of improving the code, deploying effective detection in the database, and preventive measures in the application.

Beyond SQL injection, Core Audit technologies can help secure databases and applications to ensure your sensitive data is not compromised by external or internal actors.

Introduction

The Problem

Application Example

The Attack

The String Variation

Truncating the SQL

Parameter Fragmentation (HPF)

Splitting the SQL

SQL Batch

Variations

Non-Trivial Input

Parameter Pollution (HPP)

No Keywords

Truncating without Comments

Misleading Numerical Inputs

Login Screens

Query Validates Password

Query Retrieves Password

Introduction to Solutions

Solutions in the Code

Database Solutions

Web Application Firewall (WAF)

In-Application Firewall

Detect

Prevent

Sanitize

Escaping the Parameter Cage

Solutions in the Code & Best Practices

Bind Variables

Escaping Input

Numeric Input

Input Validation

Error Messages & Blind Injection

Errors & Vulnerability Detection

Database Solutions

Static SQL Analysis

AI & SQL Intention

Anomaly Analysis

Application Solutions

Web Application Firewall (WAF)

WAF Concept

WAF Methods & Challenges

Parameter Type

Special Characters

Keywords

Mixed Strategy

Automatic or Configurable

Implementation

The WAF Sanitizer

In-Application Firewall

Filter & Parameter Control

Depth Control

Context & Action Control

In-Application Firewall or WAF

Summary

Comments

AI Recommended

Ask a Question

You might also like

Why the “Not That Sensitive” Argument Crumbles Under Scrutiny

The Silent Majority: Why Database Threats Must Become Center Stage

The Illusion of the Wall: Why Your Data Fortress is a Sandcastle