Database Concurrency Conflicts in the Real World
A lot of articles have been written about database concurrency conflict detection and the various ways of handling them.Unfortunately most of these articles, and accompanying solutions, have one major flaw in that they focus on the technical issues and database implementation instead of real-world data and how people use the data. In this article, I will try to show the difference between focusing on the database implementation and on the real-world data. I will show some possible approaches on how to solve these concurrency issues.
What Is a Database Concurrency Conflict?
Let’s start with a quick recap of what database concurrency conflicts are and why you need to solve them in the first place.
Most database applications in this world are multi-user applications. This means that, at any given point in time, you can expect multiple persons and/or processes reading from and writing to a database. Given that multiple persons/processes are updating data in a database, it is only a matter of time before two separate persons/processes will try to update the same piece of data. Because a typical update cycle consists of:
- Read into memory
- Update in memory, and
- Write back to the database
there will be occasions where two users will both read the same data into memory. User 1 will update the data and write those changes back to the database before user 2 does the same thing. Now you have a concurrency control conflict because user 1 read the data before user 2 wrote it back to the database. Why? Because if user 1 writes his data back into the database he will overwrite the changes made by user 2, causing them to be lost. Basically, whoever saves their changes second will win, overwriting the changes made by whoever saves first.
This kind of database concurrency issue can occur both with humans or automated processes or a combination of the two. A concurrency issue is more likely to occur with human users as the read/update/write cycle is likely to take much longer. However, that said, the same concurrency issue can occur between automated processes and it is harder to solve because in the case of an update by a human you can ask what the user wants (do they want to overwrite changes made by another user?) and respond to that while a process needs to have all actions fully automated.
The Current State
Let’s first take a look at what others generally say and do about solving database concurrency issues. Typically solving the problem is divided into two basic approaches:
- Pessimistic concurrency control
- Optimistic concurrency control
Below I will give a brief description of the two and the different options for handling them. Here I’ll just clarify the problem. I will in no way offer a complete coverage of concurrency handling and all of its aspects.
Pessimistic Concurrency Control
In the case of pessimistic concurrency control, it can prevent a collision because the user/process must take some action before the application can change the data. This action can be a number of things but typically involves locking the data in the database thereby preventing another user from holding the same lock.
- Simple to implement. Because database servers support and enforce locking mechanisms they can very easily implement pessimistic concurrency control. Because a user needs to place the lock before making any changes, the database server informs the user before it makes a change that there is a conflict.
- Very secure. Because the database server implements the locking very reliably and you as a developer can be assured that nothing will be able to ignore the lock and change the data anyway.
- Not very scalable. Locking data in a database requires an open connection to a database. This means every user must have at least one open connection to the database, which means more resources and licenses. When using older database servers, locking might also prevent other users from reading the data.
- Prone to deadlocks. Suppose two users both want to change two pieces of data-A and B. User 1 first locks A and user 2 first locks B. Now both users want to lock the second part of the data but cannot because another user has already locked the other piece they want. You have a deadlock if both users decide to wait until the data is available.
- Locks can last a long time. If a user starts changing some data, he or she has a lock on the data until they save it. If the user is distracted somehow or goes to a meeting without saving his changes, the data remains locked and no one else can make any changes until the first changes have been committed.
You can also use pessimistic locking by using soft locks instead of real database locks. This means that you update a field to indicate that a user is busy with the data and has it “locked.” This prevents the scalability problem because you don’t need to keep the connection open. However this approach has several drawbacks because the database doesn’t enforce a true lock and another piece of code can decide to ignore the lock. Additionally, your code must manually remove the lock and if this isn’t done the data remains locked forever.
Even though pessimistic locking has it’s place, it has a number of drawbacks and doesn’t combine very well with the disconnected nature of data in a .NET application.
By: Maurice De Beijer
Maurice de Beijer is an independent software developer, beta tester, and a recipient of the MVP award. He specializes in .NET, object-orientation, Visual FoxPro, and solving technically challenging problems. Maurice is The Problem Solver and you can reach him via e-mail or at www.TheProblemSolver.nl.
Database concurrency conflicts are somewhat of a plague in software development. Like all kinds of concurrency conflicts they are hard to predict and handle. Unfortunately, unlike most other kinds of concurrency conflicts, they are hard to prevent due to the inherent multi-user nature of most database applications.