Most organizations use the OWASP Top 10 as the standard against which they test for security vulnerabilities in their web applications. The OWASP Top 10 includes the most critical web application vulnerabilities, and this list is revised on a regular basis. The problem with testing for vulnerabilities listed in OWASP Top 10 is that there are a large (and growing) number of web vulnerabilities that don’t fit into those categories.
Furthermore, the categories are becoming broader and harder to test against as they lack the specificity that is needed to know exactly what to test for.
In this article, you can learn about race conditions, or manipulating the timing of actions to produce anomalous results.
Hackers can use race conditions in numerous ways to create adverse effects, ranging from crashing an application to stealing money from a business.
What are Race Conditions?
So what are race conditions, and why are they worth talking about alongside the common vulnerabilities published by OWASP?
As per the OWASP testing guide, “A race condition is a flaw that produces an unexpected result when the timing of actions impact other actions. An example may be seen on a multithreaded application where actions are being performed on the same data. Race conditions, by their very nature, are difficult to test for.”
Another way of putting it is: when the timing of actions impact other actions, events may happen out of sequence, resulting in anomalous behavior. This anomalous behavior is a race condition, which can result in a serious security vulnerability.
Exploiting Race Conditions
One instance of a race condition found by a security researcher (Egor Homakov) resulted in essentially unlimited money on Starbucks gift cards. How did it happen?
The exploit took advantage of the way funds were transferred between gift cards. The server-side logic for transferring funds between accounts went something like this:
- Check balance on card 1.
- If sufficient funds, increment funds in card 2.
- Decrement funds in card 1.
The above steps were programmed to happen synchronously (one step at a time), however these actions were actually performed in a multi-threaded asynchronous environment.
To be clear, most modern computing environments are multi-threaded and asynchronous (even when using PHP — more on that later). So how does one exploit this race condition?
Simple: send a ton of requests for this function at the same time, thus creating a high probability that events would occur simultaneously. For example, it may result in the following execution logic:
- 5 threads check the balance on card 1 (let’s say it’s $5).
- 5 threads find that the balance is sufficient, and so they increment card 2 by $5, resulting in a total balance of $25.
- 5 threads each try to decrement the funds in card 1 by $5, however the card can’t go below $0, thus 4 of the threads fail.
Wallet 2 has now gained $25 and wallet 1 has lost only $5.
For the complete write-up of this exploit, you can check out the blog post here.
Testing for Race Conditions
The best way to test for race condition vulnerabilities is to have access to source code, in what is known as a “white box” assessment. If you have access to source code, then it is much easier to look through all of the functions in the code and identify logic that is assuming synchronous actions, without the proper defensive programming techniques applied (more on this below).
Once you find such a function, in order to validate the vulnerability, you would simply call that function a large number of times simultaneously, forcing the likelihood that a collision will occur.
Race conditions have a significant impact when accessing shared resources (e.g. databases, files, objects in code), as they can result in the corruption of said resource, or other unpredictable behavior. By simultaneously accessing and modifying a shared resource, the integrity of the resource can be affected as well.
In the case of “black box” testing (when you do not have access to the source code), testing for race conditions can be a little trickier, but not impossible. The same principles described above still apply, however finding functions may be more difficult, and testing them even more so.
You can check out the tool, dubbed “Race-The-Web,” on GitHub here.
If you are using Burp Suite, it is still possible to test for race conditions in a similar manner using Intruder, however the benefit of using Race-The-Web is that it is faster (thus more likely to trigger a race condition on more performant web applications), as well as free and open-source. Of course, use whichever tooling is best for you and your environment.
Defense and Mitigation
The key to preventing a race condition is to find a way to synchronize or otherwise strictly control the order of operations in potentially vulnerable functions and actions. The best way to do this is through locks.
Most programming languages have a built-in locking functionality for data; for example, Python has “threading.Lock”, and Go has “sync.Mutex”.
Please refer to the documentation for your programming language of choice for more information. If the language has multi-threaded or asynchronous capabilities built-in, it should have some form of locking mechanism available to you.
You can also force synchronization in ACID-compliant databases. ACID stands for atomicity, consistency, isolation, and durability. Basically, to have the most reliable and effective general-purpose database, you will want to ensure that it implements all four components into its design.
The key component for race synchronization is isolation, and the highest level of isolation is “serializable”. By setting your database’s level of isolation to serializable, it will effectively force all transactions to occur sequentially.
While it will guarantee safety from race conditions in the database, the downside to this is that operations will be seriously slowed down. Not only that, it will typically involve using exclusive locks, which can cause deadlocks. In reality, you can still achieve a higher level of safety from race conditions by using a lower level of isolation; for example, “repeatable read” in MySQL (more information on isolation in MySQL is available here), however, doing so will not guarantee full protection.
As with all security mitigations, you must balance business needs with security requirements, so you will likely have to find the balance that works for your organization.
Another point related to databases — if you have the option, always use inserts over updates in your SQL queries. The reason for this is because inserts have more error protection in most configurations, which prevents modifying a single database entry simultaneously.
You can also enforce locking at the file level. To do so, you can usually use a system call at the kernel level (“LockFile” in Windows, and “flock” or “lockf” in Unix), however this is likely abstracted well enough in your programming language of choice. Another method used by many modern applications like Microsoft Word is to create a temporary file (e.g. ~myfile.lck), which exists while a file needs to be locked from concurrent access. The program would then check for the existence of the lock file before granting write access to the true source file.
There are other ways to mitigate race conditions, such as implementing CSRF tokens in your web application, which makes it more difficult to automate the large number of requests needed to trigger a race condition. However, the most effective way to mitigate or completely remove race conditions is through locking, as described above.
Race Conditions in Synchronous Environments?
Did you know that PHP is also vulnerable to race conditions, even though it has no native concept of asynchronous events or multi-threading in the language?
Because PHP is running on an asynchronous, multi-threaded platform (usually Nginx or Apache), that underlying platform is making a number of requests to PHP functions simultaneously.
PHP itself is just a language that executes a set of functions. It doesn’t have any functionality for multi-threaded processing built into the language; however, if it attempts to access a shared resource outside of the scope of the code (a database entry, for example), it will do so as many times as it is called by the underlying platform.