A Beginner’s Guide to Addressing Concurrency Issues
Inserts, updates and deletes. Every framework tutorial starts with these and they are seen as the most basic functionality that just works.
But what if two concurrent requests try to modify the same data? Or try to insert the same data that should be unique? Or the inserts and updates have side-effects that have to be stored in other tables (e.g. audit log).
“Transactions” you may say. Well, yes, and no. A transaction allows a group of queries to be executed together – either pass together of fail together. What happens with concurrent transactions depends on a specific property of transactions – their isolation level. And you can read here a very detailed explanation of how all of that works.
If you select the safest isolation level – serializable (and repeatable read), your system may become too slow. And depending on the database, transactions that happen at the same time may have to be retried by specific application code. And that’s messy. With other isolation levels you can have lost updates, phantom reads, etc.
Even if you get your isolation right, and you properly handle failed transactions, isolation doesn’t solve all concurrency problems. It doesn’t solve the problem of having an application-imposed data constraint (e.g. a uniqueness complex logic that can’t be expressed as a database unique constraint), it doesn’t solve the problem of inserting exact duplicates, it doesn’t solve other application-level concurrency issues, and it doesn’t perfectly solve the data modification issues. You may have to get into database locking, and locking is tedious. What is a write lock, a read-lock, what is an exclusive lock, and how not to end-up in a deadlock (or a livelock)? I’m sure that even developers with a lot of experience are not fluent with database locks, because you either don’t need them, or you have a bigger problem that you should solve first.
The duplicate submission problem is a bit offtopic, but it illustrates that not all concurrent request problems can be solved by the database alone. As many people suggest, is solved by a token that gets generated for each request and stored in the database using a unique constraint. That way two identical inserts (a result of a double-submission) cannot both go in the database. This gets a little more complicated with APIs, because you should rely on the user of the API to provide the proper token (and not generate it on the fly in their back-end). As for uniqueness – every article that I’ve read on the matter concludes that the only proper way to guarantee uniqueness is at the database level, using a unique constraint. But when there are complicated rules for that constraint, you are inclined to check in the application. And in this case concurrent requests will eventually allow for two records with the same values to be inserted.
Most of the problems are easy if the application runs on a single machine. You can utilize your language concurrency features (e.g. Java locks, concurrent collections) to make sure everything is properly serialized, that duplicates do not happen, etc. However, when you deploy to more than one machine (which you should), that becomes a lot harder problem.
So what are the approaches to address concurrency issues, apart from transactions? There are many, and here are a few of them (in no meaningful order).
- There is Hazelcast, which lets you use distributed locks – the whole cluster follows the Lock semantics as if it was a single machine. That is language specific and setting up a hazelcast cluster of just a few usecases (because not all of your requests will need that) may be too much
- You can use a message queue – push all requests to a message queue that is processed by a single (async) worker. That may be useful in some cases, and impractical in others (if you have to return some immediate response to the user, for example)
- You can use Akka and its clustering capabilities – it guarantees that an actor (think “service”) is processing only one message at a time. But using akka for everything may not be a good idea, because it completely changes the paradigm, it is harder to read and trace, harder to debug, and is platform-specific (only JVM languages can make use of it).
- You can use database-specific application level locks. That’s something quite useful, even though it is entirely RDBMS-dependent. Postgre has advisory locks, MySQL has get_lock, others probably have something similar. The idea here is that you use the database as your distributed lock mechanism. The locks are managed by the application, and don’t even need to have anything to do with your tables – you just ask for a lock for, say (entityType, entityId), and then no other application thread can enter a given piece of code, unless it successfully obtains that database lock. It is kind of like the hazelcast approach, but you get it “for free” with the database. Then you can have, for example, a @Before (spring) aspect that attaches to service methods and does the locking appropriate for the current application use-case, without using table locks.
- You can use a CRDT. It’s a data structure that is idempotent – no matter what the order of the operation applied is, it ends up in the same state. It’s explained in more details in this presentation. How does a CRDT map to a relational database is an interesting question I don’t have an answer to, but the point is that if your operations are idempotent, you will probably have fewer issues.
- Using the “insert-only” model. Databases like Datomic are using it internally, but you can use it with any database. You have no deletes, no updates – just inserts. Updating a record is inserting a new record with the “version” increased. That again relies on database features to make sure you don’t end up with two records with the same version, but you never lose data (concurrent updates will make it so that one is “lost”, because it’s not the latest version, but it’s stored and can be reverted to). And you get an audit log for free.
The overall problems is how to serialize requests without losing performance. And all the various lock mechanisms and queues, including non-blocking IO, address that. But what makes the task easier is having a data model that does not care about concurrency. If the latter is applicable, always go for it.
Whole books have been written on concurrency, and I realize such a blog post is rather shallow by definition, but I hope I’ve at least given a few pointers.
Inserts, updates and deletes. Every framework tutorial starts with these and they are seen as the most basic functionality that just works.
But what if two concurrent requests try to modify the same data? Or try to insert the same data that should be unique? Or the inserts and updates have side-effects that have to be stored in other tables (e.g. audit log).
“Transactions” you may say. Well, yes, and no. A transaction allows a group of queries to be executed together – either pass together of fail together. What happens with concurrent transactions depends on a specific property of transactions – their isolation level. And you can read here a very detailed explanation of how all of that works.
If you select the safest isolation level – serializable (and repeatable read), your system may become too slow. And depending on the database, transactions that happen at the same time may have to be retried by specific application code. And that’s messy. With other isolation levels you can have lost updates, phantom reads, etc.
Even if you get your isolation right, and you properly handle failed transactions, isolation doesn’t solve all concurrency problems. It doesn’t solve the problem of having an application-imposed data constraint (e.g. a uniqueness complex logic that can’t be expressed as a database unique constraint), it doesn’t solve the problem of inserting exact duplicates, it doesn’t solve other application-level concurrency issues, and it doesn’t perfectly solve the data modification issues. You may have to get into database locking, and locking is tedious. What is a write lock, a read-lock, what is an exclusive lock, and how not to end-up in a deadlock (or a livelock)? I’m sure that even developers with a lot of experience are not fluent with database locks, because you either don’t need them, or you have a bigger problem that you should solve first.
The duplicate submission problem is a bit offtopic, but it illustrates that not all concurrent request problems can be solved by the database alone. As many people suggest, is solved by a token that gets generated for each request and stored in the database using a unique constraint. That way two identical inserts (a result of a double-submission) cannot both go in the database. This gets a little more complicated with APIs, because you should rely on the user of the API to provide the proper token (and not generate it on the fly in their back-end). As for uniqueness – every article that I’ve read on the matter concludes that the only proper way to guarantee uniqueness is at the database level, using a unique constraint. But when there are complicated rules for that constraint, you are inclined to check in the application. And in this case concurrent requests will eventually allow for two records with the same values to be inserted.
Most of the problems are easy if the application runs on a single machine. You can utilize your language concurrency features (e.g. Java locks, concurrent collections) to make sure everything is properly serialized, that duplicates do not happen, etc. However, when you deploy to more than one machine (which you should), that becomes a lot harder problem.
So what are the approaches to address concurrency issues, apart from transactions? There are many, and here are a few of them (in no meaningful order).
- There is Hazelcast, which lets you use distributed locks – the whole cluster follows the Lock semantics as if it was a single machine. That is language specific and setting up a hazelcast cluster of just a few usecases (because not all of your requests will need that) may be too much
- You can use a message queue – push all requests to a message queue that is processed by a single (async) worker. That may be useful in some cases, and impractical in others (if you have to return some immediate response to the user, for example)
- You can use Akka and its clustering capabilities – it guarantees that an actor (think “service”) is processing only one message at a time. But using akka for everything may not be a good idea, because it completely changes the paradigm, it is harder to read and trace, harder to debug, and is platform-specific (only JVM languages can make use of it).
- You can use database-specific application level locks. That’s something quite useful, even though it is entirely RDBMS-dependent. Postgre has advisory locks, MySQL has get_lock, others probably have something similar. The idea here is that you use the database as your distributed lock mechanism. The locks are managed by the application, and don’t even need to have anything to do with your tables – you just ask for a lock for, say (entityType, entityId), and then no other application thread can enter a given piece of code, unless it successfully obtains that database lock. It is kind of like the hazelcast approach, but you get it “for free” with the database. Then you can have, for example, a @Before (spring) aspect that attaches to service methods and does the locking appropriate for the current application use-case, without using table locks.
- You can use a CRDT. It’s a data structure that is idempotent – no matter what the order of the operation applied is, it ends up in the same state. It’s explained in more details in this presentation. How does a CRDT map to a relational database is an interesting question I don’t have an answer to, but the point is that if your operations are idempotent, you will probably have fewer issues.
- Using the “insert-only” model. Databases like Datomic are using it internally, but you can use it with any database. You have no deletes, no updates – just inserts. Updating a record is inserting a new record with the “version” increased. That again relies on database features to make sure you don’t end up with two records with the same version, but you never lose data (concurrent updates will make it so that one is “lost”, because it’s not the latest version, but it’s stored and can be reverted to). And you get an audit log for free.
The overall problems is how to serialize requests without losing performance. And all the various lock mechanisms and queues, including non-blocking IO, address that. But what makes the task easier is having a data model that does not care about concurrency. If the latter is applicable, always go for it.
Whole books have been written on concurrency, and I realize such a blog post is rather shallow by definition, but I hope I’ve at least given a few pointers.
Hazelcast is only an example of a distributed in-memory data store. There are similar ones like Redis.
And there’s akka for .net too: http://getakka.net/
Nice writeup, Bozho. 🙂
It’s an interesting topic which has also lots of academia attention.
Indeed, data storage alone cannot enforce all application invariants.
For example, in the excellent ‘Coordination avoidance in distributed systems’ paper (covered in The Morning Paper blog: https://blog.acolyer.org/2015/03/19/coordination-avoidance-in-database-systems/) it’s shown that it’s preferrable to use a ‘hybrid’ consistency model, i.e. apply different levels of consistency to operations depending on App-level constraints requirements.
Also I would mention another general technique – apologies mechanism. It’s when your app retroactively compensates for anomalies that do occur. (Some techniques are mentioned in this series: https://blog.acolyer.org/2015/09/09/out-of-the-fire-swamp-part-ii-peering-into-the-mist/).
It’s a good article, but the title is misleading. I think that “A beginner’s guide to addressing concurrency issues in enterprise applications” would be a much better alternative.
@Lester – true, but Hazelcast gives you locking as-if-you-run-on-a-single-machine. But that’s java-specific
@Dmitry – ah, yes, that’s what I was missing. And given that I’ve been writing such compensatory mechanisms last week 😀
@Vlad thanks, fixed 🙂
I would go for the ‘token’ approach. I guess this is much simpler solution at application level. Hibernate gives version fields to implement such token, I guess other frameworks would provide similar feature. Even if not, it’s much easier to implement on application code.
I liked this entry because I’ve been working on things like this recently.
We got a web application where some of its controllers used a Concurrent Map for taking account of the long tasks submitted by the users and also limiting how many tasks a single guy can run. But the number of users grew and the project needed to be deployed to a Tomcat cluster.
We chose to use the database to replicate the semantics provided by the concurrent hashmap here. The worst part was the clean-up process: on a single server fashion, whenever the application crashed or was stopped, the pending jobs died and the map was empty at the next startup… correct. Now we had to take care of emptying the table with all the hassle it implies (i.e. when only a single node failed but the others stayed alive, etc.)
Thanks, nice tips