Saturday, July 19, 2008

Batch Insertion in Hibernate

Batch Insertion is a powerful feature of hibernate particularly useful when you are importing data from other systems in batch. If you do not use batch feature of hibernate, your application's performance may decrease dramatically at the time of insertion of many records.
There are two approaches in hibernate to handle batch insertion. Each one is explained below:

  • One approach is related to Session class. Let's discuss it with example. Suppose there is an application for library which uses hibernate as ORM layer. Now you want to insert 1 million books in library. For simplicity domain model contains two classes one is book and second one is publisher. Publisher is contained in book class and i have enabled cascade insert and cascade update so if i insert book which contains publisher that is not in the system, it will also be saved with book. Code to insert books are given below:

    Session session =HibernateUtil.getSessionFactory().openSession(); session.beginTransaction();
    for (int index = 0; index <> Book book = new Book();
    book.setAuthor("amer");
    book.setIsbn("34343");
    book.setName("Hibernate " + index);
    Publisher pub = new Publisher();
    pub.setName("Publisher " + index);
    book.setPublisher(pub);
    book.setPublishDate(new Date());
    session.save(book);
    }
    session.getTransaction().commit();
    session.close();


Above code is putting a lot of memory burden because whenever an object is saved, hibernate puts that object in cache which is called as "session cache" or "first level of cache" and probably you will face an error of Stack outofflow memory error. In order to avoid this, you have to clear the session, but question is when session should be cleared either after each insertion of after some interval. If you clear session, after each insertion, it will decrease your applications' performance dramatically because before calling clear operation on session, you have to call flush operation which will synchronize persistent data store to objects memory states. For this situation, there comes a concept of batch insertion. For batch insertion, first of all you will have to add "hibernate.jdbc.batch_size" property in your hibernate.cfg.xml file with value of 50.

By using this property, hibernate will use jdbc for batch insertion of 50 records at a time when you flush the session. Batch size "50" is my recommendation because i always got max application performance with this number. You also have to change code to flush objects from session as below:

session.save(book); if (index % 50== 0) { session.flush(); session.clear(); }

  • Second way of doing batch insertion is through "StatelessSession" class. StatelessSession class differs from Session class in that it does not cache the objects, does not call interceptors, does not save any persistence context of object, does not cascade to composed objects, does not take care of collections, directly transfers the object to jdbc insert statement. So in other way it is more close to jdbc. With StatelessSession, you have to save composed objects separately e.g. in above example, you have to insert publisher and books separately. Above code with StatelessSession will be:


    StatelessSession session = HibernateUtil.getSessionFacgtory().openStatelessSession(); session.beginTransaction();
    for (int index = 0; index < style="font-style: italic; color: rgb(255, 153, 102);"> {
    Book book = new Book();
    book.setAuthor("amer");
    book.setIsbn("34343");
    book.setName("Hibernate " + index);
    Publisher pub = new Publisher();
    pub.setName("Publisher " + index);
    book.setPublisher(pub);
    book.setPublishDate(new Date());
    session.insert(pub);
    session.insert(book);
    }
    session.getTransaction().commit();
    session.close();

Well its time now to conclude the things. I have used both ways and i prefer first approach over second one because of the following reasons:
  • StatelessSession does not provide any performance over Session when you have to save composed instances too.
  • It does not call any interceptors or evens which may complicate the application design if we heavily depend on interceptors or events for logging or data level security etc.
  • Even for a plain object, StatelessSession does not provide me much performance over Session. It only provided me performance when an object contains one or two attributes. e.g. if i only insert publisher through StatelessSession, i will get performance by 20 to 30 seconds.

8 comments:

Arunkumar said...

Awesome piece of comparison...
Thanks for sharing the knowledge which is encouraging for beginners like me....

Harshal said...

Thanks Amer. Seriously informative piece of information. Very very helpful. Tahnk you so much.

Imran said...

Thanks. For this short and very intuitive knowledge sharing on hiberate. I'am having 4 lakhs of record which i have to insert . Used for and was expensive. this would help me.

I have insert 4 lakhs records, 4 times in my applicatio flow.
I am thinkin to go with StatelessSession as i have only two column in my table and recors are more.


Imran

Renaud said...

Thanks!

Prabhat Jha said...

I have created a complete maven project which has all the needed configs to do batch insert with mysql and some other tips as well. Pls see the details at http://sensiblerationalization.blogspot.com/2011/03/quick-tip-on-hibernate-batch-operation.html

Kalukuri said...

Thanks... It was much helpful

Rama

Dinesh said...

Has anybody used a rollback on the session in the case of batch insert failing on one of the inserts with Duplicate constraint ? I want to rollback when there is even 1 error during the batch insert.

bedrin said...

Is it possible to do the same using JPA?