Avoid Lazy JPA Collections

Hibernate (and actually JPA) has collection mappings: @OneToMany, @ManyToMany, @ElementCollection. All of these are by default lazy. This means the collections are specific implementations of the List or Set interface that hold a reference to the persistent session and the values are loaded from the database only if the collection is accessed. That saves unnecessary database queries if you only occasionally use the collection.

However, there’s a problem with that. The problem that manifests itself through the exception that in my observations is 2nd most commonly asked exception (after NullPointerException) – the LazyInitializationException. The problem is that the session is usually open for your service layer and is closed as soon as you return the entity to the view layer. And when you try to iterate the uninitialized collection in your view (jsp for example), the collection throws LazyInitializationException, because the session that they hold a reference to is already closed and they can’t fetch the items.

How is this solved? The so called OpenSessionInView / OpenEntityManagerInView “patterns”. In short: you make a filter that opens the session when the request starts and closes it after the view has been rendered (and not after the service layer finishes). Some people call that an anti-pattern, because it leaks persistence handling into the view layer, and complicates the setup. I wouldn’t say it’s that bad: generally it solves the problem without introducing other problems. But in all recent project I’ve been involved, we aren’t using OpenSessionInView, and it works fine.

It works fine because we aren’t using lazy collections. But then, you’ll rightly point, you will be fetching “the whole world” when you load a single entity. Well, no. There are two types of *ToMany mappings:

  • value-type mappings where the collection logically does not hold more than a dozen elements. This is in most cases @ElementCollection, and also @*ToMany with items like “Category” or “Price” that are just more complex value objects, but that do not hold any other mappings themselves. Another common feature of these types of collections is that they are usually displayed in the UI together with their owning entity. It is most likely that you want to display the categories of an article, for example. For this type of collections EAGER is the better option. You’ll have to fetch them anyway, why not let hibernate (or any jpa implementation) think of some clever join? As I said – the collections are logically not bigger than a dozen or two, so fetching them won’t be a performance hit. And, logically, they won’t fetch a big object graph with them.
  • mappings across the big, core entities. This can be “all orders made by the user” or “all users for the organization”, “all items of the supplier”, etc. You certainly don’t want to fetch them eagerly. Because if you fetch 2000 users for an organization, which in turn have 1000 orders each, and an order has 3 items on average which in turn have a collection of all people who have purchased it.. you’ll end up with your entire database in memory. Obviously you need lazy collections, right? Well, no. In that case you should not be using collection mappings at all. These types of relations are, in 99% of the cases, displayed in paged lists in the UI. Or in search results. They are never (and should never) be displayed all on one screen (or should rarely be returned in one API call, if your application provides something like a REST API). You have to make queries for them, and use query.setMaxResults and query.setFirstResult() (or limit them with some restrictive criteria). Furthermore having the collections mapped means someone will try to use them at some point, which may fail. And if the object is serialized (xml, json, etc.) the collection contents will be fetched. Something you almost certainly don’t want to happen. (A draft idea here: JPA could have a PagedList collection that would allow paged lazy fetching, thus eliminating the need for a query)

So what did I just say – that you should never use lazy collections. Use eager collections for very simple, shallow mappings, and use paged queries for the bigger ones.

Well, not exactly. Lazy collections are there and they have application, though it is rather limited. Or at least they are way less applicable than they are used. Here’s an example scenario where I found it applicable. In my side-project I have a Message entity, and it holds a collection of Picture entities. When a user uploads a picture, it is stored in that collection. A message can have no more than 10 pictures, so the collection could very well be eager. But then, Message is the most commonly used entity – it’s fetched virtually on every request. But only some messages have pictures (how many of the tweets on your stream have a a picture upload?). So I don’t want hibernate to make queries just to find out there are no pictures for a given message. Hence I store the number of pictures in a separate field, make the pictures collection lazy, and Hibernate.initialize(..) it manually only if the number of pictures is > 0.

So there are scenarios, when the entity has optional collections that fall into the first category above (“small, shallow collections”). So if it is small, shallow and optional (say, used in less than 20% of the cases), then you should go with Lazy to save unnecessary queries.

For everything else – having lazy collections will make your life harder.

24 thoughts on “Avoid Lazy JPA Collections”

  1. I think you’re missing the big picture. The filter is used because people uses businees objects in views. So you have two options :

    1- use the filter knowing that you are using business objects in your views (and that IS the anti-pattern, not the use of the filter)

    2- let your business objects in their layer so you don’t need any filter and you must use DTOs. So what you need for UI is binded into UI beans (the DTOs). The drawback here is that you are binding a bean to another one wihch looks exactly the same (95% of the cases)

    Now, lazy collections are totally usefull. Here’s a very simple example : I got a user list page. This page displays the most important infos of users in a table. Let’s say a user got a collection of Books. In this page, I don’t want to display the books, so I wan’t to load the user info, but not the books. Now, let’s say I click on a user, I go to a page with all user infos and his list of books. This time, I fetch the same User object but I’ll also load the books because I’m actually desplaying them ! In the code, this will result in loading a User object (or a list of them) but the User manipulation is the same.

    To summarise, lazy collections, besides performance, gives us consistency and transparency and ALL collections should be lazy. Only the always-needed ones should be eager.

  2. I also said they are useful. But way more rarely than they are actually used. In your example, however, I wouldn’t have the collection of books as a collection. A User can have 200 books, and I wouldn’t like to show them all on one screen. That’s what the 2nd point above is about. In the cases when you would use a lazy collection, you shouldn’t actually use a collection at all.

    About the anti-pattern: in the usual scenario the objects passed down to the view don’t have any business logic – they are pojos (that being considered sometimes an anti-pattern as wlel, but that’s another discussion). The anti-pattern is that you leak persistence handling (the session) in the view.

  3. If you are using Hibernate you can use BatchSize to manage how many collections are loaded from the DB at the same time. This can alleviate the situation that you have pointed out of loading a whole bunch of lazy collections.
    I think the “fetch” strategy depends (among other things like the cardinality of the association) on whether the entity is a root entity or not. Root entities are the ones looked up directly from the DB to navigate its graph.

  4. Yup, batch-size is a good spot. That way you will need to have the “batch” value both in the mapping and in the view, but apart from that it’s nice. My idea of PagedList for JPA was to standardize something like that.

  5. @Bozho

    “Furthermore having the collections mapped means someone will try to use them at some point …”

    Right on. In most cases @*ToMany properties are unnecessary and even dangerous in the hands of careless developers. Collections should better be accessed from the service layer with proper queries and careful joins wherever appropriate.

    @waddle

    DTOs are THE anti-pattern. They bring a small percent of convenience at the price of a boat load of bloat, duplication and maintenance costs.

  6. Not sure what you guys think of this, this is my current take on this problem, but no definite views:

    The problem can also be aproached by having the entity implement several interfaces, for Example User implements SummaryUser and DetailedUser. SummaryUser contains all the name of the user and some basic properties and eager collections.

    SummaryUser is returned by a query used to to display on the screen a list of users.

    When we click on a user we issue another query that returns a single DetailedUser with a getBooks() method, which in this case we have eagerly fetched at query level (using setFetchMode for criteria of fetch join for JPQL).

    But the user to books relation is defined as lazy by default (as the strength of the relation is weak), and made eager depending on the query.

    The idea is to define the relation as eager/lazy depending on the strength of the relation. for order/order lines the relation is composition and the relation is strong, so eager fetch always.

    For books/user the relation is aggregation (weak) so the relation is defined as lazy. A user makes sense on its own without its books, but an order does not make sense without its order details.

    This is all very pretty, but I have seen an implementation when users start casting away the interfaces SummaryUser and DetailedUser that are aimed at preventing the lazy initialization exception, and we are back at it again 🙂

    Just doing some daydreaming here, if in some new language we could make these interfaces as traits or something and have them serialized to the view layer, we would get rid of the DTOs! would this be possible in scala for example?? No idea just wondering…

    Love your blog, keep them coming.

  7. @Bozho

    In your second point you explain the case where “you should not be using collection mappings at all”. I understand your point when it comes to using queries to select info about your entities.

    However, I’m confused about how you persist your entities when you are not using annotations to define the relationships in your POJO’s.

    The example in your second point is “orders for user”. When you persist your “order” how is the relationship to the “user” maintained when you haven’t defined it in the POJO? What about many-to-many relationships?

    Many thanks for any info you can provide. And thanks for the great article on a common problem many struggle with.

  8. You map the @ManyToOne side. So you save an order and call order.setUser(user). But you don’t have User#orders.

    As for many-to-many – it really depends on the case. You usually can (and need) to go with a intermediate entity anyway (if the mapping has more than just the foreign key columns).

    And still, note that in some cases it is OK to use collection mappings. It’s just way rarer than people do.

  9. How is it relevant to lazy collections? I like it it is a general principle and try to follow it, but it is not always possible.

  10. @Bozho
    “How is it relevant to lazy collections?” – Bozho

    Well, in CQRS you never ever let your PDO (Persistent Domain Objects) be passed to the Presentation Layer (Spring Controller). You always must use a direct query (yes plan and old SQL) to show the results to the user.

    “The problem is that the session is usually open for your service layer and is closed as soon as you return the entity to the view layer. ” – Bozho

    You’re of course using the PDO in the Presentation Layer. So, if you separate your queries from your domain model, you will never be in such situation. You cannot have a truly rich domain model if you’re using it to show data to the user.

    There’s a lot information (I can give you the source if you are interested) of how bad is ORM (hibernate) to recovery data, a lots of joins, and with or without lazy loading you will end with such a large sql.

    That’s the reason why CQRS is absolutly related with “lazy collections”.

    Cheers. Enrique Illán

  11. I don’t think CQRS is about using sql queries to return results. It’s just that a method should do only one thing – either return a result, or execute a command. In our case it can return a DTO based on the entity, but it will still be in accordance to CQRS.

  12. @Bozho
    “I don’t think CQRS is about using sql queries to return results. It’s just that a method should do only one thing – either return a result, or execute a command.”

    Yeah, you are right, but in this particular case we are only talking about the query side.

    “In our case it can return a DTO based on the entity, but it will still be in accordance to CQRS.”

    That’s fine, and technical posible.

    “Bozho’s tech blog » ORM Haters Don’t Get It”
    Ok, I get it, but I don’t think we should not use ORM, even my self do what you have already mention: mapping a PDO to DTO. If we have to use a ORM or native SQL it’s just a matter of taste.

    I mentioned the use of CQRS beacuse I’m avocate of the use of DTO and CQRS use this idea as such. But let’s put behind to us the use of CQRS for a moment.

    I really think that the use of DTO it’s the best way of avoiding the problems you have mention. I know from previous post what you think about DTO. You are in the “middle way” using DTO when “DTOs are only created when their structure significantly differs from the that of the entity”. I’m in the 1st option you already mention: “every entity has at least one corresponding DTO.”.

    Nevertheles there is a misconception in this. You mention that every entity has a DTO and I think we have a DTO depending of the use case requirements and not only per entity basis. According to Adam Bien the use of DTO can help you to:

    1. Realization of additional views to a domain model.
    2. Abstracting from legacy data stores.
    3. Enforcement of eager loading.
    4. Even transport of client-specifc metadata.

    So, why don’t start using DTO for this use case and get rid of the whole issue?

  13. Hi, only two kind of programmers use OSIV filter:

    1) programmers having poor SQL skills
    2) lazy programmers

    In each and every case you should “doze” your queries to return only those portion of data you need. The point is that you should always be aware of what you need and how to get it.

    Collections should be lazy in most cases.

    Regards.

  14. It’s disappointing to see such misleading advice. Lazily loaded collections are a great strategy and you shouldn’t worry about exceptions if only you’d follow the primary rule of Hibernate: don’t mess with your entities without a Session managing them.
    There are many good reasons for that, not least to provide a consistent state with a transaction.

    If you get a LIE exception, you should consider it a warning flag to need re-evaluate your architecture: embrace the fail-fast rather than hiding the problem.

  15. Could you elaborate on which scenario do you think I’m wrong about? The “load everything” case, where you would normally use paging? Or the preference for eager fetch whenever it’s certain that the collection will be loaded anyway?

  16. Good Post, I’m facing the same issue, the 3 level lists could lead a very big data set ( 100 employee * 100 Projects * 365 days timesheet entry), this is where I found the *toMany is useless feature of JPA, why no one thought to put some kind criteria features in the @*ToMany or JPQL.

  17. This is a good, honest and brave post.
    I think that hibernate-managed collections are overrated and easily misused. Even when there is a case for them, you must be careful at what hibernate does or you might end up with many useless trips to the database.
    In order to use collections properly, you need to gain a level of competency that not many people realize is needed or are willing to gain when adopting an ORM that is supposed to make things easier. This might also be a reason why so many young programmers (in my experience) hate java.

    I like to have my schema generated by JPA annotations and have very simple queries handled by the ORM, but when it comes to working with potentially big lists, I prefer my own simple and efficient native queries.

  18. Your article confirmed what i think about lazy collections in JPA.

    So it leads me to further considerations about JPA/ORM itself:

    1. big sized @ElementCollection

    consider you have an @Entity named Component, which holds an @ElementCollection List.

    Conceptually a Component is composed of many Parts and a single Part does have no sense existing on its own.
    That’s the reason for @ElementCollection.

    But this list can hold thousands of elements.

    Well, JPA cannot return this list, nor a partiton of it, with a query – even if ‘SELECT p FROM Component c, c.partList p where c = :component’ works fine in Eclipselink, it’s forbidden to return collection-valued selections as of JPA/JPQL spec.

    So pagination is not allowed, no escape, two choices.
    Either use Component.getPartList for CRUD operations with performance killer issues, or break conceptual model promoting Part to @Entity and map the @OneToMany or @ManyToOne to enable queries.

    As a side note, converting the mapping to unidirectional @ManyToOne on Part makes the controller, for MVC, insanely complicated to deal with C(R)UD: no straight em.merge(component) for bulk relation changes anymore.

    2. big sized bidirectional @ManyToMany

    Consider an @Entity Item with a @ManyToMany List and @ManyToMany is on both sides.
    These lists can hold thousands of elements too.

    While Item.getDocumentList and Document.getItemList can be partition-queried for pagination, the problem arises on write operations.

    Again you can break the conceptual model introducing an intermediate @Entity ItemDocumentRelation with both @ManyToOne mappings to Item and Document.

    Again, the controller will be insanely complicated as the intermediate entity have to be transparent to View layer.

    I like to hear your opinion about these topics and i take the opportunity to thank you, your articles and your answers on stackoverflow teached me a lot.

  19. One of the best articles regarding this topic is this one from 2006 by Christian Bauer.

    You don’t always need collections. In reality, most collections can be turned into queries, which benefit from pagination, and are much more flexible anyway.

    Collections are useful when the number of entries is rather small, in which case they should always be LAZY. EAGER fetching is a code smell, and when used for collections, it can easily lead to Cartesian Products. That being said, the best collections are the bidirectional @OneToMany associations.

    @ManyToMany collections perform very bad, so better emulate the relationship with two bidirectional one-to-many associations.

    In both cases, this should not complicate the controller layer.

Leave a Reply

Your email address will not be published. Required fields are marked *