Don’t Use JSON And XML As Internal Transfer Formats

November 2, 2012

You have a system that has multiple components and they have to communicate. They do that either via internal web services or using a message queue. Normally, you would want to send (data transfer) objects from one component to another. Three typical examples:

  • a user has registered and you send a message to a message queue and whenever the message is consumed, an email is sent to the user. The message needs the user at least the email and names of the user.
  • if your layers communicate via web services for some reason (rather than live within one JVM), on registration the web layer needs to invoke a back-end service and pass a User object.
  • you store objects in some (distributed) in-memory cache in order to reduce redundant calls to the database (assuming you map your database results to objects in some way, either an ORM or some mapper, but this is done in the majority of cases). So when a request arrives asking for a user profile, you check if it’s present in the cache, and if it is – you get it from there, rather than hitting the database.

In order to achieve these things you need to serialize the objects to some format that will then be deserialized on the other end. Many frameworks include XML and JSON serializers and they are used in many examples online. Therefore people are inclined to use JSON or XML for these purposes. And that’s not a good idea. Using these formats internally has no benefit – you don’t actually need the serialized objects to be human-readable, and if you need to read the message contents, then you have the facilities to deserialize it and print it to a log file.

But there are major drawbacks – speed and size. Both formats are text-based (so that they can be human-readable), which means they are unnecessarily verbose. Yes, JSON is less verbose than XML, but it’s still a text format that you don’t need. Instead, in most cases you’d better use binary serialization. Almost any binary serialization is better. I have evaluated a couple and the ease of use + speed and size benefits made me choose MessagePack. But you can also use protobuf, bson , avro or whatever fits your project.

Yes, I know, I also said “this is probably a micro-optimization”. And then I ran some benchmark on our messages to see the time and size saved. I don’t remember the exact figures, but MessagePack was a lot faster and had a lot smaller message size, and seeing the results made me go straight into coding a MessagePackConverter to replace the JSONConverter. For some figures you can check here. It is a pretty small change for the huge impact it has on the whole system. And given the high volume of messages that our system needs to serialize and deserialize, spending one day on integrating MessagePack is totally worth it – after all that would allow you to process or store (say) twice as many messages with the same hardware (compared to JSON).

But note one important thing – if we need to debug messages, we can easily switch back to JSON with one line of configiuration. And also, we can deserialize the binary messages. So debugging is not hampered by that.

There are, of course, some things to consider, like versioning of the objects (if you add a field, does the deserialization of old messages break? In messagePack it does if the field is primitive, so you need a custom template to handle that) or if you are in a multi-language environment – is the deserialization library supported by all languages. Also, you usually have to let the serializer know the structure of your objects in advance, so here’s some additional code/annotations to populate the serializer context. But all of these are included in the “one day” mentioned above that I spent for integrating MessagePack.

And it is probably a good idea to mention that if you are exposing an API to 3rd parties, you can’t rely on these serializers – your API should be JSON/XML, because there it needs to be human-readable and it needs to be supported in every language.

But unless you totally don’t care about your resources (probably because it’s a system with little usage), seriously consider a binary serialization mechanism for your internal messaging, APIs, caching, etc.

If you find the content interesting, you can subscribe and get updates


 

8 Responses to “Don’t Use JSON And XML As Internal Transfer Formats”

  1. Tried to ‘impose’ on some projects I work this kind of alternatives vs SOAP which for uses case you presented works but is too heavyweight but i had no luck yet :)
    Will point from now on to your article.I was thinking on extending your article with some metrics to prove to those who do not believe …
    Cheers

  2. To augment your post on internal communication… I would say don’t use REST internally.

    I have seen way to many organizations (through contract work) use REST or Web Services for there internal communications when they should be using some more asynchronous like a message bus (RabbitMQ, ZeroMQ, etc…) or asynchronous RPC (Finagle, Thrift… etc).

    Its not that you can’t get REST to be more asynchronous (Non Blocking ie NIO) its just that most Java REST servers and clients are not.

  3. @Cristian: actually, SOAP is backed by XML, I don’t think it draws any win in performance comparison to JSON/XML.

  4. @Cristian oops! sorry, I misread your post! You are advocating to use the binary protocols instead of SOAP, my bad!

  5. Agree 100%. JSON and XML are great for public and cross-platform APIs, but between Java processes Java serialization is generally the better alternative.

  6. One thing to consider for large messages is the memory usage for serialising. I found many serialisers will duplicate the object during serialization (I’m look at you Bson for Jackson, CORBA, et al). Jacksons JSON serialisation can stream the JSON meaning a tiny overhead. Did you notice MessagePack’s memory usage as I didn’t get to test it.

  7. I would suggest you turn this on it’s head. Don’t convert xml/json to an object unless absolutely necessary. The cost of xml to object conversion and maintenance of brittle code that breaks on code xml format changes is a large overhead.

    For xml in/xml out system xpath query can query, route and transform and keeping the data in xml allows human readable logging and support.

  8. @Greg: “between Java processes Java serialization is generally the better alternative”

    I totally, totally disagree. Java’s built-in serialization and deserialization are extremely inefficient in both CPU time and serialized size. That’s mostly because the format includes loads of repetitive type information (which makes it not just bloated but also awfully brittle) and the deserializer needs to do a lot of security-related checking.

    In fact, I tend to agree with Richard. I’ll only use a non-human-readable format if it really turns out to be a performance issue. Jackson is extremely easy to use and fast (and supports a “binary JSON” format if you need an additional 10-20% of performance with half a line of code change), so why bother with Java serialization?

Leave a Reply