You Probably Don’t Need a Message Queue

July 3, 2014

I’m a minimalist, and I don’t like to complicate software too early and unnecessarily. And adding components to a software system is one of the things that adds a significant amount of complexity. So let’s talk about message queues.

Message Queues are systems that let you have fault-tolerant, distributed, decoupled, etc, etc. architecture. That sounds good on paper.

Message queues may fit in several use-cases in your application. You can check this nice article about the benefits of MQs of what some use-cases might be. But don’t be hasty in picking an MQ because “decoupling is good”, for example. Let’s use an example – you want your email sending to be decoupled from your order processing. So you post a message to a message queue, then the email processing system picks it up and sends the emails. How would you do that in a monolithic, single classpath application? Just make your order processing service depend on an email service, and call sendEmail(..) rather than sendToMQ(emailMessage). If you use MQ, you define a message format to be recognized by the two systems; if you don’t use an MQ you define a method signature. What is the practical difference? Not much, if any.

But then you probably want to be able to add another consumer that does additional thing with a given message? And that might happen indeed, it’s just not for the regular project out there. And even if it is, it’s not worth it, compared to adding just another method call. Coupled – yes. But not inconveniently coupled.

What if you want to handle spikes? Message queues give you the ability to put requests in a persistent queue and process all of them. And that is a very useful feature, but again it’s limited based on several factors – are your requests processed in the UI background, or require immediate response? The servlet container thread pool can be used as sort-of queue – response will be served eventually, but the user will have to wait (if the thread acquisition timeout is too small, requests will be dropped, though). Or you can use an in-memory queue for the heavier requests (that are handled in the UI background). And note that by default your MQ might not be highly-availably. E.g. if an MQ node dies, you lose messages. So that’s not a benefit over an in-memory queue in your application node.

Which leads us to asynchronous processing – this is indeed a useful feature. You don’t want to do some heavy computation while the user is waiting. But you can use an in-memory queue, or simply start a new thread (a-la spring’s @Async annotation). Here comes another aspect – does it matter if a message is lost? If you application node, processing the request, dies, can you recover? You’ll be surprised how often it doesn’t actually matter, and you can function properly without guaranteeing all messages are processed. So, just asynchronously handling heavier invocations might work well.

Even if you can’t afford to lose messages, the use-case when a message is put into a queue in order for another component to process it, there’s still a simple solution – the database. You put a row with a processed=false flag in the database. A scheduled job runs, picks all unprocessed ones and processes them asynchronously. Then, when processing is finished, set the flag to true. I’ve used this approach a number of times, including large production systems, and it works pretty well.

And you can still scale your application nodes endlessly, as long as you don’t have any persistent state in them. Regardless of whether you are using an MQ or not. (Temporary in-memory processing queues are not persistent state).

Why I’m trying to give alternatives to common usages of message queues? Because if chosen for the wrong reason, an MQ can be a burden. They are not as easy to use as it sounds. First, there’s a learning curve. Generally, the more separate integrated components you have, the more problems may arise. Then there’s setup and configuration. E.g. when the MQ has to run in a cluster, in multiple data centers (for HA), that becomes complex. High availability itself is not trivial – it’s not normally turned on by default. And how does your application node connect to the MQ? Via a refreshing connection pool, using a short-lived DNS record, via a load balancer? Then your queues have tons of configurations – what’s their size, what’s their behaviour (should consumers explicitly acknowledge receipt, should they explicitly acknowledge failure to process messages, should multiple consumers get the same message or not, should messages have TTL, etc.). Then there’s the network and message transfer overhead – especially given that people often choose JSON or XML for transferring messages. If you overuse your MQ, then it adds latency to your system. And last, but not least – it’s harder to track the program flow when analyzing problems. You can’t just see the “call hierarchy” in your IDE, because once you send a message to the MQ, you need to go and find where it is handled. And that’s not always as trivial as it sounds. You see, it adds a lot of complexity and things to take care of.

Certainly MQs are very useful in some contexts. I’ve been using them in projects where they were really a good fit – e.g. we couldn’t afford to lose messages and we needed fast processing (so pinging the database wasn’t an option). I’ve also seen it being used in non-trivial scenarios, where we are using to for consuming messages on a single application node, regardless which node posts the message (pub/sub). And you can also check this stackoverflow question. And maybe you really need to have multiple languages communicate (but don’t want an ESB), or maybe your flow is getting so complex, that adding a new method call instead of a new message consumer is an overkill.

So all I’m trying to say here is the trite truism “you should use the right tool for the job”. Don’t pick a message queue if you haven’t identified a real use for it that can’t be easily handled in a different, easier to setup and maintain manner. And don’t start with an MQ “just in case” – add it whenever you realize the actual need for it. Because probably, in the regular project out there, a message queue is not needed.

If you find the content interesting, you can subscribe and get updates


 

7 Responses to “You Probably Don’t Need a Message Queue”

  1. These problems are there, with any Eventually Consistent system. Business processes will have to designed thinking about critical sections where system will be inconsistent.

  2. Hmm, I’ve made excellent experiences with Oracle AQ or JTA-compliant JMS queues. Don’t remember having lost any messages participating in transactions that way. What’s your take on those?

  3. I’m a die hard minimalist, however as always it’s about context.

    Most of the examples you have cited are the “simple cases” and you are correct :) in those instances an MQ is over the top and over engineered.

    However, when you need to build big and complex interconnected service based systems, mq’s really shine as you can better manage complexity, (kind of ironic, you need a little bit of extra complexity to handle bigger complexity!)

    :)

  4. @Lukas: don’t know about Oracle AQ. RabbitMQ also doesn’t lose messages, unless a node dies (and if you run that in AWS, for example, dying often happens).

    @Zen.Master – indeed. If complexity is so high, that 1. you cannot reduce it, because it’s inherent to the business case and 2. using a “simple solution” complicates things unnecessarily, then sure, use a dedicated MQ.

  5. MQ are useful when you have to transfer data across different technologies. For example to send data (XML) from BizTalk(.NET) to Websphere(Java), there is no alternative other than MQ for asynchronous messaging. Microsoft doesn’t provide adapters for open source like MQ etc.

  6. slight correction activeMQ (open source)

  7. The sending of emails is the perfect example where a queue is the only decent solution.

    1. If you perform a synchronous emailing operation and the email fails to send, that email is lost. As you point out, it is debatable whether a lost email is acceptable. My opinion is that since the MQ allows the retry scenario, it’s just not worth the risk of taking the synchronous route. What happens if your entire email operation is offline for an hour or longer? Suddenly *all* of your emails are lost, rather than simply delayed until the problem with the emailing is resolved. Even worse is what happens if the email sending operation takes more than a second to timeout if something is wrong – your user experience takes a hit.

    2. A specific scenario where synchronous emails do not work is bulk delivery on a single page load, such as an “invite a friend” feature. This isn’t a common feature these days due to concerns over spam, but if you happen to allow a member to provide a list of emails to invite to a site or service (ie: import gmail/yahoo/hotmail contacts), you *must* use an asynchronous solution. If you import 100 or 1000 contacts, your script will take forever – if not timeout – trying to iterate to send all those emails. This was a real scenario at a previous job for me – the previous devs hadn’t used a queue for “invite a friend”.

    3. Using a mysql table along with a cron job is in fact riskier and more complicated to manage than a MQ. This solution typically involves a single cron instance running every 1-5 minute(s). If you select all unsent emails in a query and there is a spike or problem, your script may take longer than the interval between cron runs to finish sending everything. If a second cron is allowed to run before the first has completed, severe problems can occur. In the worst case scenario, you didn’t anticipate multiple instances of the cron running simultaneously and now each instance is sending an additional duplicate of each email. Even if you have the cron ensure it’s the only instance running, a long-running cron now means that new emails will have an additional delay before delivery.

    Point number 3 is extremely important. Many, many devs do not tackle crons properly. At every job I have had, there is always at least one cron that doesn’t behave as expected if it takes longer than anticipated to complete. The worst I have seen is a cron scheduled to run every minute taking 20+ minutes to complete – with 20+ instances of the cron running concurrently. Properly handling a “queue” via cron requires tracking each job manually with semaphore information (process PID, timestamp of job start, retry count, etc). This is not easy to get right.

    A queue slices through all these problems. The benefits:

    1. You no longer have an issue if a service (like an SMTP service) goes offline for a while. Once it comes back online, your queue picks up where it left off with automatic retries built in (this is software-specific, not all queues have this).

    2. You don’t have to worry about making sure your cron’s behaviour is consistent no matter the circumstances. A queue worker is designed to complete a single job at a time, so you don’t run into cron timing issues if you have high load.

    3. A new job is picked up immediately if the queue is empty. When sending an email, it’s as fast as the synchronous solution – your email is delivered immediately, not 1-5 minutes later when the next cron runs.

    4. You gain concurrency – you can run multiple queue workers to send say 5 emails at a time (one per worker) rather than a simple loop that can only send one email at a time.

    5. You don’t have a haphazard mysql table + cron solution. I’m telling you, this just doesn’t work most of the time. Trying to replicate the functionality of a queue system using a cron is bound to cause more headaches in the long run than just spending the time to learn how to use a queuing system.

    tldr; Queue systems are not complicated to learn and the benefits compared to alternative solutions – that should really be using a queue anyway – are plentiful.

Leave a Reply