State Does Not Belong In The Code

What is “state” in your web application? It’s the data that gets stored (regardless of the destination – memory, database, file-system). The application itself must not store any state in the code. This means your classes should only have fields with objects that are also stateless. In other words – you should not store anything in your services, DAOs or controllers during the program flow. This is a complete “must” for your service layer. Why?

Your application needs to be scalable. This means it needs to be run in a cluster, and state is the hardest thing to distribute. If you minimize the places where state is stored, you minimize the complexity of clustering. But state should exist, and here is where it is fine to have it:

  • the database – be it SQL, NoSQL or even a search engine, it’s the main thing that stores state. It is the thing that is supposed to support clustering, or a huge dedicated machine that handles requests from multiple other “code” servers. The code communicates with the database, but the code itself does not store anything for more than one client request;
  • cache – caching is relatively easy to distribute (it’s basically key-value). There are many ready-to-use solutions like EhCache and memcached. So instead of computing a result, or getting it from the DB on each request, you can configure caching and store the result in memory. But again – code does not store anything – it just populates and queries the cache;
  • HTTP session – in web components (controllers, managed beans, whatever you call it). It is very similar to caching, though it has a different purpose – to allow identifying subsequent actions by the same user (http itself is stateless). But as your code runs on multiple machines, the load-balancer may not always send subsequent requests to the same server. So the session should also be replicated across all servers. Fortunately, most containers have that option built-in, so you just add one configuration line. Alternatively you can instruct the load-balancer to use a “sticky session” (identify which server to send the request depending on the session cookie), but it moves some state management to the load-balancer as well. Regardless of the option you choose, do not put too much data in the session
  • the file system – when you store files, you need them to be accessible to all machines. There are multiple options here, including SAN or using a cloud storage service like Amazon S3, which are accessible through an API

All these are managed outside the code. Your code simply consumes them through an API (the Session API, the cache API, JDBC, S3/file system API). If the code contained any of that state (as instance-variables of your objects) the application would be hard to support (you’d have to manage state yourself) and will be less scalable. Of course, there are some rare cases, where you can’t go without storing state in the code. Document these and make sure they do not rely on working in a cluster.

But what can go wrong if you store state in the objects that perform the business logic? You have two options then:

  • synchronize access to fields – this will kill performance, because all users that make requests will have to wait in queue for the service to manage its fields;
  • make new instance of your class for each HTTP request, and manage the instances somehow. Managing these instances is the hard part. People may be inclined to choose the session to do it, which means the session grows very large and gets harder to replicate (sharing a lot of data across multiple machines is slower, and session replication must be fast). Not to mention the unnecessarily increased memory footprint.

Here’s a trivial example of what not to do. You should pass these kinds of values as method arguments, rather than storing them in the instance:

class OrderService {
   double orderPrice;

   void processOrder(OrderDto order) {
         for (Entry entry : order.getEntries() {
              orderPrice += entry.getPrice();
         }
         boolean discounts = hasDiscounts(order);
   }
   boolean hasDiscounts(OrderDto order) {
        return order.getEntries().length > 5 && orderPrice > 200;
   }
}

So, make all your code stateless – this will ensure at least some level of scalability.

21 thoughts on “State Does Not Belong In The Code”

  1. Good post. Thanks.

    It is interesting that stateless doesn’t necessarily mean sessionless in the web components. What about the case of a RIA’s where the client app manages most state and accesses a RESTful API? In that case perhaps the web/app server can also be stateless?

  2. “So, make all your code stateless – this will ensure at least some level of scalability.”

    This is called “Golden Hammer Syndrome.”

  3. What is the relation between the golden hammer, and my statement? First, I say “at least some level”, and then I don’t say you should use any particular tool or language – just a broad general principle.

  4. If you use a session mechanism that keeps all state in cookies, with no server-side session state, it allows you to use session state without the drawbacks you mention.

    Rails by default keeps all session state in cookies (using some secret tokens and crypto-like functions to keep this secure from spoofing session state).

    Of course, this requires you only to keep things in session that are easily serializable to cookies, and the total size of all serialized session state small.

  5. @vlion – true, that’s why my first sentence makes it clear I’m talking about the web. But not the cloud – this rules applies in all cases, even if you won’t need to scale out in the near future.

    @Jonathan – true. Storing the state in cookies is a bit tricky though, because it makes requests way bigger, and also puts some extra CPU load on every request, in order to run the cryptography. But it’s a viable option.

    @Orkan – no, but the variables should only hold dependencies (probably injected by a DI framework), not state.

  6. I think it should be made clear that by not storing state in your code, in the case of web applications, you mean state that lives across requests. Using HttpSession doesn’t really count, it’s name gives it away.. it’s session.. so naturally you can store state in it across requests. However, it’s a bad design to do so these days.

    But in your post, as you say, in a single request, much like rest apis, the code should pull any data that lives longer than a single request from some sort of store, be it file, db, whatever.

    I learned a neat trick with regards to using HttpSession and scalability. Often, you want to store some sort of login token that lets the system know a user is logged in..usually the user id. You *can* store more info, like the user name, email, profile data, etc if it’s used often in many pages of the site. However, what yous should NOT do is replicate that session object across servers.. the only data you replicate is EXACTLY what you need to recreate the data that is stored in the session for convenience. For example, by storing the user id, I can reload the user on any server to get all the user profile data (as an example).

    To handle this, you create a serializable class.. something like:

    class SessionState implements Serializable {
    // allow this object to replicate
    private Long userId;

    // This is the full user object stored in HttpSession
    // for convenience so we don’t have to keep doing a DB
    // hit to load the data on every request, but does NOT
    // replicate because it’s stored in the database and we
    // can recreate it using the Long userId.
    @transient
    private User user;


    }

    The above gives you a bit of a performance boost in the case the user object is used on many of the requests to the server.. such as to get/display the user name on several pages, other user info, etc. By not having to user the Long userId on each request to load the user data from the database (or even cache), you have the most often used objects right at your disposal, but at the same time, you do NOT take a hit when session replication occurs because you’re only replicating the very minimal info you need, in this case the Long userId object.

    The trick is, each access to user, the method would look like this:

    User getUser(){
    if (null != userId && null == user){
    // load user from database
    user = loadUser(userId);
    }
    return user;
    }

    Basically, if the session userId is not null, then a user is logged in.. but if the user object IS null, then either it’s the very first access to the user object from the session (for the server it resides on), or a possible failure happened, and the user (unbeknownst to them) was routed to another server where they are still logged in (userId is not null), but being a new server, the user object hasn’t been loaded yet, so we need to now load the data from the database.. thus recreating the info we need without replicating all the data across servers.

  7. @Kevin, if I understand your example correctly both the UserId and the User are stored in the HttpSession. However, only the UserId is replicated across servers. I understand your reasoning about WHY but I’m curious as to HOW you can configure a servlet container (ie. Tomcat) to selectively replicate only certain objects across servers. Thanks.

  8. @Jenson: It’s because the user field is transient, Kevin used an an annotation in the example, but I guess he wanted to use it has a keyword.

    Transient fields are not serializable that’s how you can prevent the replication.

  9. “Caching is relatively easy to distribute” – I beg to differ, apparently, you haven’t done many jobs that had to do with caching.

    Caching can be really hard and usually can be an evil must. We all hate to do it, but we all have to do it, and it’s never easy, especially on a distributed system.

  10. Distributing itself is easy, because this is supported out-of-the-box by memcached and EhCache – that’s what they do. Deciding what to cache, for how long is the hard part.

  11. @Jenson,

    Yes..sorry..I used an annotation.. that was incorrect, it’s just transient keyword as Pabrantes pointed out. It prevents objects within a Serializable object from being serialized. Because HttpSession depends on a Serialiable object to replicated, you simply mark anything you don’t want replicated with transient to control what gets replicated.

    You *can* store non-serializable objects in HttpSession as well, it’s backed by a Map, and those won’t serialize as well. However.. in the case of a server shut down where the server my serialize the HttpSession to disk (not replicate it), upon restoring anything non-serializable is gone. An example of when this may happen.. if you auto-redeploy during development, it may serialize the HttpSession out, restart the app then load it back in. I am not sure if any container does that any more, but I recall years ago something like that being done to keep user state working across server restarts.

  12. eh bozho,
    you are confusing me: state of you web application, scalability, service layer – you are pretty good in mixing concepts as you like it.
    Of course, a pure service layer should be stateless, but I would claim, every non trivial web application has some state to manage. And storing state variables in a http-session or on the application server does not mean the web application does not scale. This just means you have to calc or test how good does it scale, how many sessions, etc. And finally, of course I can use data base caching, instead of DAOs, state variables, but why should I do this, when my application server can session handling, failover, etc. for me. why should I change one infrastructure component for another?
    regards

  13. Bozho, isn’t being stateless in general against DDD? It’s impossible to use this technique without keeping any state in your code – e.g. if object was persisted or not..

Leave a Reply

Your email address will not be published. Required fields are marked *