Avoid Lists in Cassandra

Apache Cassandra is fast and scalable database which over the years became almost as easy to use as a traditional SQL database. At least on the surface.

You an use SQL-like queries, but they have a lot of limitations; you have a schema, but it’s not as flexible to modify it as in a SQL database; you have the same tabular structure with a primary key, but it’s more complicated due to the differentiation between partition key and sorting key. And there are a lot of underlying details that seem irrelevant at first, but are crucial for performance and data consistency, like tombstones, SSTable compaction and so on.

But I want to discuss the “list” column type, as recently we’ve had a very elusive issue with it. We are in the business of guaranteeing data integrity, and that’s why our records are not updated, ever. This is a good fit for Cassandra, as updates are tricky to get right. But on one of our deployments we noticed something strange – very rarely, the hash of the data in a particular entry out of millions would not match upon comparison with the indexed data. Upon investigation, we noticed that a column of type “list” got duplicate values. It was not an issue with the code, because in this particular case the code was always using Collections.singletonList(..)

It appears that Cassandra is trying to be clever and when it sees identical entries in a batch insert, instead of overriding one with the other, it tries to merge them, resulting in a list with duplicate entries. Accounts of the issue are reported here and here.

Now, batches are a difficult topic and one of those things that look straight-forward but aren’t. In most cases, batches are an anti-pattern. There are cases where batches are useful, but it’s more rare than expected. That’s because of the distributed nature of Cassandra. Another complication comes from whether you are using token-aware or toke-unaware client policy, i.e. whether your client knows where each record belongs in order to send the request to it. I won’t go into details about batches, as they are well explained in the two linked articles.

Back to lists – since in our case we don’t have identical records in a batch, the issue was probably manifested because of a network timeout where the client didn’t receive confirmation of the write and re-attempted sending the same statement again. Whether being in a batch or not affects it, I can’t be sure. But it’s probably safer to assume that it might happen with or without a batch. I.e. lists can be merged in unexpected situations.

This is a serious reason for not using lists at all. Additional arguments are given by Walmart

Sets should be preferred to Lists as Sets (and Maps) avoid read-before-write pattern for updates and deletes

And this is just for a small number of items. Using collections for a large number of items (e.g. thousands) is another issue, as you can’t load the items in portions – they are all read at once.

In a Java application, for example, you can easily substitution the List with a Set even if the underlying column is of type List and that would help temporarily avoid the issues – data may still be duplicated in the database, but at least the application will work with unique values. Have in mind though, that ordering is not guaranteed by the Java Set, so if it matters for your logic, make sure you order by some well-defined comparison criteria.

The general advice of “avoid lists” (and “avoid batches”) paints an accurate picture of Cassandra. It looks straightforward to use, but once you get to production, you may realize there were some suboptimal design decisions.

Leave a Reply

Your email address will not be published. Required fields are marked *