Getting Started with Hibernate

Revision History
Revision 1.423 January 2006aps

Table of Contents

1. Introduction to Hibernate
2. Example Hibernate Application
3. Database Interaction Pattern
4. The Hibernate Object Life Cycle
5. Hibernate Objects
6. The Session
7. Querying
8. Cascading Persistence
9. Transactions
9.1. Versioning
10. Mapping Classes to the Database
10.1. Mapping Simple Entity Classes without Relationships
10.2. Mapping Value Objects within Entities (Component Mapping)
10.3. Mapping Entities with Inheritance
10.4. Many-to-One, Unidirectional Associations
10.5. One-to-Many, Unidirectional Associations
10.6. Many-to-one, bidirectional Associations
11. Patterns
12. Going Further
References

1. Introduction to Hibernate

Hibernate is an Object Relational Mapping (ORM) tool. It manages the persistence of java objects in a relational database. The idea is that a programmer should be able to design his business objects as standard Java objects with very little interference from the problems of making these objects persist in a database. Together with a little help from the programmer, Hibernate saves the objects into the database, retrieves them when needed and supports queries on the database written in a form similar to SQL but which refers to objects and object properties instead of tables and column names. The end result is that the code that needs to be written to interact with the database is considerably shorter and simpler.

This document is intended to cover only the basics of Hibernate: i.e., do things the way it is described here until you have outgrown it. Hence many features are only referred to in passing or not mentioned at all. The full reference documentation and a great deal of other important information is available on line at http://www.hibernate.org/5.html:Hibernate On Line Documentation. However, for serious users of Hibernate, a thorough study of [BK05] is highly recommended.

[Important]Important

The books currently in print all discuss a version of Hibernate earlier than 3.0. However, Hibernate has moved on and the latest version as of 23 January 2006, is 3.1.1. There were incompatible changes between versions 2 and 3, so when reading the text books, you should have beside you the migration guide: http://www.hibernate.org/250.html:Hibernate Migration Guide

The differences are, generally, small, but enough to stop even simple hibernate examples from working. Critically, all the hibernate classes (and log4j logger names) are now org.hibernate... instead of the old net.sf.hibernate.... Some of the Session methods for query execution (find(), iterate(), filter(), and delete()) have been not just deprecated, but moved into a different package (presumably to encourage programmers to abandon them). To replace them, you should use createQuery for all queries, and DELETE HQL queries for bulk deletions — session.delete(object) is still okay for deleting a single object. Similarly, saveOrUpdateCopy() (use merge() instead) has gone the same way. However, if you just want to get an example from a book working quickly, you can still use the deprecated methods simply by replacing your org.hibernate.Session objects with org.hibernate.classic.Session ones. Finally, in the hbm.xml files, the dtd has changed from http://hibernate.sourceforge.net/hibernate-mapping-2.0.dtd to http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd

The set of cascade options has changed to bring Hibernate into alignment with the EJB3 standard.

2.  Example Hibernate Application

Here we provide a fully working, but very simple, example application using Hibernate. The example is based on some fragments that appear in chapter 2 of [BK05]. The files necessary are:

  • Main.java: the main class that manipulates Message objects, storing them into and retrieving them from the database. Note how only Commons logging classes are used as a thin wrapper around log4j — this is common practice to provide independence from specific logging systems when an application may use multiple different libraries, each of which might otherwise may different choices for logging.

  • Message.java: the class defining the objects that will be persisted to the database.

  • Message.hbm.xml: the mapping file that describes how the properties of a message file should be mapped to columns of tables in the database (along with other necessary information such as how keys are generated, what database constraints and indexes should be maintained etc.)

  • hibernate.properties: the Hibernate configuration file that specifies the database that is to be used, the database connection pooling system (if any) and other configuration parameters for the system.

  • log4j.properties: the log4j configuration file that sets many parameters of the logging system.

  • ehcache.xml: the ehcache configuration file that sets parameters parameters for the second level cache in hibernate. In fact, this won't really be used without further work but without it you will get warning messages when you start your application.

3.  Database Interaction Pattern

The basic pattern of database interactions via Hibernate is visible in the Main.java file above.

  1. Create a Configuration object, load the configuration parameters from the hibernate.properties and adjust them as required.

  2. Create a SessionFactory object from the Configuration object. The SessionFactory object is a heavyweight, thread safe object. You would normally share one such object between all your threads in a web application.

  3. For each unit of work (normally one use case) use the SessionFactory object to obtain a Session object. This is an extremely lightweight, non-thread safe object. It will be associated with a database connection but it only obtains that connection lazily, i.e., only when (and if) it is required. Session objects must not be shared between different threads. Recall that every request to the web server usually runs in its own thread.

  4. Inside a try block, get a Transaction object by calling beginTransaction() on the Session object.

  5. Interact with the database:

    • explicitly by calling methods of Session to associate objects to the database (i.e., map them to the database), execute queries, load, save, delete mapped objects etc.

    • implicitly by calling property mutators on mapped objects that will lead to the database being updated.

    • implicitly by referencing non-mapped objects from mapped objects which (in certain circumstances) can cause the non-mapped objects to be added to the database.

    • implicitly by unreferencing mapped objects from other mapped objects which (in certain circumstances) can cause the unreferenced objects to be deleted from the database.

  6. Call commit() on the Transaction object and close the try block, handling exceptions and closing the Session object in the usual way.

4. The Hibernate Object Life Cycle

  • Transient objects do not (yet) have any association with the database. they act like any normal Java object and are not saved to the database. When the last reference to a transient object is lost, the object itself is lost and is (eventually) garbage collected. There is no connection between transactions and such objects: commits and rollbacks have no effects on them. They can be turned into persistent objects via one of the save method calls of the Session object, or by adding a reference from a persistent object to this object.

  • Persistent objects do have an association with the database. They are always associated with a persistence manager, i.e., a Session object and they always participate in a transaction. Actual updates of a database from the persistent object may occur at any time between when the object is updated until the end of the transaction: it does not necessarily happen immediately. However, this feature, which allows important optimizations in database interactions, is essentially invisible to the programmer. For example, one place where one might expect to notice the difference between the in-memory persistent object and the database version is at the point of executing a query. In such a case, Hibernate will, if necessary, synchronise any dirty objects with the database (i.e., save them) in order to ensure that the query returns the correct results.

    A persistent object has a primary key value set, whether or not it has been actually saved to the database yet.

    Calling the delete method of the Session object on a persistent object will cause its removal from the database and will make it transient (it will still be available as a normal, non-persistant Java object).

    Aside from making a persistent object out of a transient one as described above, one can also create a new persistent object, with its values obtained from the database, by executing the load or get methods of Session if you know the object's database identifier. You can also get persistent objects by creating a query (createQuery) and extracting the results from it.

  • Detached objects are objects that were persistent but no longer have a connection to a Session object (usually because you have closed the session). Such an object contains data that was synchronised with the database at the time that the session was closed, but, since then, the database may have changed; with the result that this object is now stale.

    [Important]Important

    A detached object may be re-attached later to another Session object to become persistent again. Thus, in essence, these objects can happily exist, and be used, without concern for being inside a transaction. This mechanism, in fact, is the basis for letting business objects which are stored persistently in the database, to escape up to higher levels in the system without having to add extra value beans (also known as Data Transfer Objects (DTOs) which exist to copy the data of objects tied to one layer in the system to objects tied to another layer. Without this mechanism, one typically has to create a number of classes for each business object, where the instance variables are all basically the same, but which differ in the layer specific details. Also, this mechanism can be used to help systems efficiently handle long operations which involve multiple separate database queries or updates separated by user interactions.

The Hibernate Object Life Cycle

The Hibernate Object Life Cycle

Given a pair of (persistent) objects of the same class, we now have three concepts of identity to consider.

  • a==b  Java Identity
  • a.equals(b)  Java Equality
  • a.getId().equals(b.getId())  Database Identity

The rule for Hibernate, is that if, within a single Session, you request two objects which have the same database identifier, then you will get references to the same actual objects. It accomplishes this by using a cache for persistent objects. Anytime you request, in any way, a persistent object, Hibernate first checks in the cache for it and, only if it can't find it there, will Hibernate actually execute a query to get the data from the database and create an object for it. Incidentally, this means that you can use Java Identity (i.e. ==) to test persistent objects for identity, even if you would normally need to use the equals().

Since the programmer can define the equals(), it is important not to use the id field in that definition if the id field is a surrogate key. This is because, if the object uses a generated identifier value for its id, or if at least part of its id is a reference to another object (i.e. a foreign key), then Hibernate only sets the field the first time it saves the object to the database. Hence, for example, if you add the object to some set or map collection, then saving the object will result in its identity changing, and part of the rules about using the set collection class is that the contained object's identity must not change while it is in the collection.

In fact, this situation is almost certain to occur because of the frequent use of collection classes to represent the many side of one-to-many or many-to-many relationships. Therefore we use the Java Equality concept to define when two objects should really be the same database object.

However, there are other problems with using all the non-id values of an object in the equality test: you really want the test to return true if the objects map to the same row of the the same table in the database (i.e. they represent the same real world concept). But two objects may represent the same real world object and have some different values. For example, two Customer objects may differ in the value of a password property (because the two objects date from different instances in time between which the customer has change her password). But they still refer to the same real world concept: i.e., the same customer.

The solution is to decide on a Business Key for a class. This is like a database key, but involves no generated surrogate keys. Instead it consists of those "real-world" properties of the class that the programmer considers to uniquely identify a particular record. It is not a requirement that the business key absolutely never changes, merely that it changes will not change within the period in which it might be stored in memory in a collection class. For the Customer class on a web application, an appropriate business key might be the customer's email address. This, of course, can change, in which case the customer will be treated as a new different customer. However, this is rarely a significant problem, and if it is, one can always provide a mechanism to reconnect the old data about the customer to the new customer record. More importantly, from our point of view, a change of customer email address is extremely unlikely to affect any reattachment of a detached Customer object to a new session.

Note that Hibernate does not know or care anything about your business keys. As far as it is concerned, reattaching an object works by checking the id property of the object. If it is null, then the object is a new one that could be added to the database but certainly cannot be reattached. Otherwise, the object can be matched up with a record in the database and, on reattachment, the contents of the object are used to update the contents of the corresponding database record(s).

In writing an equals method, there are two important considerations to bear in mind:

  • If you write an equals method, you must write a hashCode method which always returns the same value for two objects which equals decides are equal.

  • When referring to instance variables of the argument object, always use the accessor method rather than directly referring to the raw instance variable: this is because, in an environment such as a web application or service, you may actually be dealing with a proxy object rather than the actual object you expect for reasons of, for example, distributed load balancing or scalability to very large service loads.

Given that, the equals and hashCode methods should be written as follows:

public class Customer
{
    …
    public boolean equals(Object other)
    {
        if (this==other)
            return true;
        if (other==null)
            return false;
        if (!(other instanceof Customer))
            return false;
        final Customer o = (Customer) other;
        return this.emailAddress.equals(o.getEmailAddress());
    }

    public int hashCode()
    {
        return emailAddress.hashCode();
    }
}

5.  Hibernate Objects

A Hibernate object, suitable for mapping into a database, is a normal java bean with a number of extra requirements.

  • There must be a default constructor for the class.

  • There must be accessors and mutators for all the instance variables of the class. Actually this is overstating the requirement but is a good base rule: read the Hibernate documentation for the full details.

  • The class should implement Serializable. Strictly speaking, this is not a requirement. However, in practice you will normally want your Hibernate objects to be serializable so that they can be (potentially) migrated around a multiprocessor cluster or saved and restored across a web server reboot etc.

  • The class should have an id instance variable, usually of type Long. Again this is not a true requirement but it is recommended to use automatically generated surrogate keys and, if so, to use an instance variable called id to hold it. Certainly, alternatives are possible.

  • The mutator for the id property should be private, not public. Again not a requirement but good practice. You should never update the id property directly but rather rely on Hibernate updating it for you. In practice, it is the value of this field that Hibernate uses to decide if an object has been mapped to a database record or not. Change the property yourself and you could seriously confuse Hibernate.

  • You should decide on a business key for the object and implement the equals and hashCode methods for it.

  • You can add any extra type specific constructors (which should leave the id field null) and business rule methods you like.

  • You should not make the class final if you want to be able to use lazy loading for objects of the class (which you normally do).

6. The Session

Once you have a Session object, and are executing inside a Transaction, there are a number of ways you can interact with the database.

  • session.get is used to create a new persistent object by id from the database. It returns null if there was no such object in the database. session.load is similar except that if there was no such object in the database it throws an exception.

    [Important]Important

    Conceptually, these methods do not just get the object requested but also all objects that it refers to through its properties, and, transitively, all objects that they refer to as well, and so on. If the database is large, and there is a path of associations from every object to every other object, then fetching one object could try to load in the entire database. There are two issues to consider:

    1. Controlling what really gets loaded while still making everything work as if everything referred to has been loaded. This is done by a technique of lazy loading using proxies. A proxy is an object that substitutes for the target object. The idea is that it has all the same member methods as the target object and can be used anyplace that the target object can be used. However it does not contain the target object's data and does not have to be read from the database. When one of the member methods of a proxy is invoked, it then triggers loading the real target object from the database, replaces itself with the real object and passes the original call on to the corresponding method of the real object. To specify the use of proxies for a class, the attribute lazy="true" must be specified for that class in the mapping file.

      Using lazy fetching, however, now means that such lazy associations can only be turned into eager associations (by a process called initialisation) within a session. Within a session, simply accessing a non-id property of a persistent (although possibly lazy) object initialises it. But once the object is detached, any request for a lazy and uninitialised property will throw an exception. There is a static method, Hibernate.initialise, which can be used to ensure that a lazy or proxy object is materialised before closing the session, and another Hibernate.isInitialised, to test its initialisation state. However, these methods do not work recursively over the whole object graph. Therefore, the programmer must do one of the following if (s)he needs to access methods of a detached object which uses lazy loading:

      • Recursively walk over the object graph, initialising all objects that will be accessed while the object is detached on the way before detaching the object from the session.

      • Re-attach the object to a session before de-referencing potentially uninitialised objects.

      • Load the objects in a query in which the fetch strategy has been changed to an eager one. This runtime mechanism can override the lazy setting for the objects in the mapping file.

      • Keep the session open until any possible chain of dereferences of the properties of the object has been completed.

      The standard advice ([BK05]) is to make all associations lazy by default in the mapping files and override this at runtime, where necessary, with queries that force eager fetching.

    2. Loading parts of the object graph efficiently. The naïve approach would be for Hibernate to simply load the object requested, get the ids of the objects referred to, load them and so on, with each load being a separate SQL query. Hibernate provides that strategy as an option, but a possibly significantly more efficient strategy is also available: that of executing an outer join to get the first object and the objects it refers to in one query. This is specified in the mapping file with the attribute outer-join="true" on the association elements, i.e., one-to-one many-to-one, one-to-many and many-to-many. For detailed semantics and other parameters of this feature, see the manual http://www.hibernate.org/5.html:Hibernate On Line Documentation.

  • session.delete will cause the database row corresponding to a persistent (or even a detached!) object to be deleted and the object will become transient. What happens to the objects it refers to depends on the cascade properties of the mapping configuration for that reference.

  • session.save on a transient item will assign it an id and make the object persistent: i.e., ensure it, and any other objects it refers to, get saved to the database. This operation essentially causes an SQL INSERT to be executed. Any further calls to the mutators of the object within the transaction will cause an SQL UPDATE to be invoked.

  • session.lock and session.update are both intended for reattaching a detached object. Normally you should use session.update which triggers an SQL UPDATE to the database row with id equal to that of the object. Thus if the database and the object disagreed on the values contained, then the object overrides the database. session.lock simply reattaches the object (to the session) without checking or updating the database on the assumption that the database is still fully in synch with the object. Generally, do not use this method unless you are absolutely sure that nothing has changed the database state of the object since it was detached.

  • session.saveOrUpdate is a convenience method that checks whether the object is transient, in which case it acts like session.save, or detached, in which case it acts like session.update.

  • session.merge checks for a persistent object with the same identifier in the session. If it finds one, it copies the data from the detached object onto the persistent one, Otherwise it creates a new persistent object from the data of the detached one. Either way, the detached object stays unchanged and detached and you are now guaranteed that there is a persistent object in the session (which is returned by the method) which exactly matches the detached one.

7.  Querying

So far, our discussions above show how to fetch an object from the database if we know its id. Obviously we need more powerful querying facilities.

As described above, one of the simplest ways of querying is simply by invoking a chain of accessor methods on a persistent object:

X x = z.getY().getX()

The most critical type of access that is not covered above is finding an object (or collection of objects) when you have some information about them but not the identifier and you do not have a persistent object available that refers to them. For example, if we want to find the customers whose email address match a given one. Here we can use the session.createQuery method which returns a Query object. The Query object can be executed by invoking list() or iterate() to return, respectively, a list of results or an iterator over the results. The results, i.e. the elements of the list or extracted from the iterator, will either be an object of one of your persistent classes, or a object array containing a list of such objects depending on whether your query asked for one or for a number of objects on each row of the result.

The query language is not SQL but HQL. HQL is very similar to SQL but where, instead of names of tables and columns, the query uses names of java objects and properties. Of course there is a great deal more to it that that and full details can be found in http://www.hibernate.org/5.html:Hibernate On Line Documentation or [BK05]. The following paragraphs gives a little taster of HQL.

The statement

session.createQuery("from customer cust where cust.city=:cityName")
    .setString("cityName", "Birmingham")
    .list() ;

would return a list (java.util.List) of customer objects whose city property was "Birmingham". Naturally, one does not need to chain the method calls but can introduce variables and execute whole operation in separate steps if you wish. Note that instead of the "?" mechanism for JDBC's PreparedStatement, HQL provides a named parameter mechanism which is somewhat more readable and less error-prone. HQL does provide a "?" parameter numbering mechanism, but HQL's numbering starts at 0, whereas JDBC's starts at 1.

In the above query, there was no select clause. The select clause is optional in HQL, but if used, it should contain a list of one or more objects, rather than one or more column names:

Query query = createQuery("select cust, sa " +
                           "from Customer cust, SalesAgent sa " +
                           "where cust.city = sa.city");

This query would find pairs of customers and salesagents in the same city. Thus one might print the results by executing the following:

ArrayList results = query.iterate();
while ( results.hasNext() )
{
    Object[] row = (Object[]) results.next();
    Customer cust = (Customer) row[0];
    SalesAgent sa = (SalesAgent) row[1];
    System.out.format("Customer: %20s, Sales Agent: %20s\n", cust.getName(), sa.getName());
}

Often, a web request may cause a query to be executed which would require a large number of rows to be returned. In such a case, one usually limits the rows to be returned to some limit (say 10 or 20) and allow the user to see them and request for the next block of rows: for example, search engines like Google return only one page of matches at a time. This is called Pagination and Hibernates queries support this in a very simple way. Given a Query object, which should contain a query with a specific ordering, you can use the (chainable) methods setFirstResult() and setMaxResults(), each of which take a single integer argument, to, respectively, choose which row to return first (counting starts at 0) and how many rows to return from there on. The code to return the third page of results (i.e. page 2), where each page holds 10 rows, might thus look like:

int pageSize = 10 ;
int pageNo = 2 ;
Query query = createQuery("select cust, sa " +
                           "from Customer cust, SalesAgent sa " +
                           "where cust.city = sa.city" +
                           "order by cust.name asc, sa.name asc);
query.setFirstResult(pageNo * pageSize);
query.setMaxResults(pageSize);
List customerSalesagentList = query.list();

8.  Cascading Persistence

We have said, a number of times, that when an object is made persistent, that the objects it refers to are also made persistent. This was an oversimplification. In the mapping files for the classes, there is an attribute, cascade, of the various mapping elements (e.g. one-to-one, one-to-many etc.) that lets us control how much, or how little, of a reference graph gets automatically persisted, deleted or updated etc. For a full discussion, see the section on transitive persistence in the Hibernate reference manual. The values that it can be set to, and their meanings when specified on a relationship from a referencing object to one or more referenced objects, are given in the following list. Note that you can have the union of a number of cascade behaviours by writing the behaviours in a comma separated list

  • none: no automatic action on the referenced object takes place. This is the default if no cascade behaviour is set.

  • persist: Cascade any persist() operation across this relationship. Note that there is a error in the reference manual where this is called create.

  • merge: Cascade any merge() operation across this relationship.

  • lock: Cascade any lock() operation across this relationship.

  • evict: Cascade any evict() operation across this relationship.

  • replicate: Cascade any replicate() operation across this relationship.

  • refresh: Cascade any refresh() operation across this relationship.

  • save-update: If save(), update() or saveOrUpdate(), is called on the referencing object, automatically call saveOrUpdate() on all referenced objects.

    delete: automatically delete the referenced object(s) when delete() is called on the referencing object. Note that, if the referencing object is not deleted but merely removes its reference to the referenced object, then this option will not do anything and, potentially, a garbage (or orphan) object will be left in the database.

  • delete-orphan: automatically delete any object for whom the reference has been removed from the referencing object. This option is only available for one-to-many and one-to-one relationships.

  • all: cascade all operations, but do not take the action of delete-orphan.

  • all-delete-orphan: cascade all operations, and take the action of delete-orphan as well.

It is not normally appropriate to specify any cascade behaviour on a many-to-one or a many-to-many relationship.

In the situation where you have a one-to-one, or a one-to-many relationship, where the referencing object owns the referenced object(s), the appropriate cascade behaviour is all,delete-orphan.

In any other case, if you want some cascade behaviour, but there is no ownership relationship involved (so that, for example if the referencing object is deleted, the objects it refers too can continue to exist in the database) then the appropropiate behaviour is persist,merge,save-update

9. Transactions

As we have seen, the standard pattern for executing a use case is to get a Session from the SessionFactory, get a Transaction from the Session, interact with the database via the Session, commit the Transaction and close the Session. Details of the exception handling have been given above. This is fine for normal, fully serialised database transactions but there are two situations when it is not fine:

  1. When using an isolation level other than fully serializable. There are 4 standard transaction isolation levels: Read Uncommitted, Read Committed, Repeatable Read and Serializable. They indicate different levels of locking strategies used within transactions and effect just how isolated one transaction really is from other transactions running simultaneously. Thus Serializable means that two transactions run as if one had completely finished (committed or rolled back) before the other had started. In practice, providing this level of isolation requires considerable resources and causes problems with scalability of applications. For this reason, most applications use a weaker form of isolation and use other strategies to overcome the consequent problems that can arise.

    The most common isolation level used, and the default obtained with a PostgreSQL JDBC connection, is Read Committed. This ensures that one transaction can not see any value which has been written by another transaction if that other transaction has not yet committed. Therefore we don't have to worry about the other transaction rolling back with the result that we would have to roll back this transaction. With this level, the isolation problems that can occur are:

    • Lost Updates: tx 1 reads a row, tx 2 reads the same row, tx 1 writes the row and commits, tx 2 writes the row and commits, the value written by tx 1 is lost. This can be dealt with using versioning (see below).

    • Unrepeatable Reads: Transaction (tx) 1 reads a row, tx 2 writes the row and commits, tx 1 reads the same row and gets a different result from last time. The fact that Hibernate uses a cache, and, adding to that, the use of versioning, will handle this situation.

    • Phantom Read: tx 1 executes a Select query. tx 2 inserts or deletes new rows in the database and commits, tx 1 executes the same query again and finds a different set of rows from what was there last time. There is very little that you can do about this except to be aware, when you design your transactions, of the issue. The most common place where this arises is in pagination: where the results are too large to fit on a screen, you return only a page full of results to the user and allow him to select the next when ready, then you execute the same query again but return the next page full of records. A phantom read problem might mean that the first time you return records 0–19, another transaction then deletes a record in the range of that first page, say record 5, then you are asked to display the second page of records and dutifully return records 20–39. However, record 20 after the delete is actually record 21 from before the delete. The end result is that you never show the user the old record 20 at all. If this is an important problem, then you have to specifically check that the new record 20 is still the old record 20 and fix your offsets if it is not.

  2. When long transactions are required. Here the problem is usually that some user interaction, which could take a considerable period of time, is required in a use case with database accesses taking place both before and after the user interaction. The issue is that we should not keep a database transaction open for a long period of time (i.e., for more than a fraction of a second), whereas a user interaction could take from minutes to hours. The solution is to break the long transaction up into two (or more) database transactions, and to use detached objects from the first transaction to carry the necessary information to the presentation layer. These objects get modified, outside any transaction, as part of the user interaction and then reattached to the second transaction to cause the necessary updates.

    If we left it at that, then all the isolation problems described above could occur, as well as the nastier one of a Dirty Read: Consider a long transaction and a normal transaction: the first contains two database transactions tx1a and tx1b (perhaps the first is to book a flight, the second to book a hotel). The other transaction, tx2 might be to order meals for the passengers on the flight (okay; not very realistic but you get the idea). Now tx1a could update a row and commit and tx2 could read it. Now tx1b decides to roll back (perhaps there were no hotels available). This only rolls back tx1b itself, because tx1a is committed and cannot be rolled back, but since this is part of a long transaction, the programmer has written explicit code to undo the effects of tx1a when tx1b rolls back (such an undo operation is normally called a compensating transaction: it may not even really undo the original transaction, it might only make up for it in some way such as authorizing a reimbursement or a voucher if a promised booking cannot be honoured). Now tx2 has executed, read data from a long transaction, which has since been undone, and has committed. To be correct, the effects of this second transaction should be undone as well but there is no way of knowing this.

    The safe solution here is to only read from the datatabase in the early transactions, collecting the values that must be written, but only to allow database writes in the last database transaction of a long transaction, and to use versioning there to ensure that no other transaction has modified the bits of the database that should not been changed for your writes to still make sense. Under certain application specific circumstances, more relaxed strategies can be taken but the above rule of thumb is safe and simple and should only be ignored with very careful analysis.

9.1. Versioning

The idea of versioning is very simple and particularly easy to handle with the support Hibernate provides:

  • Add a version instance variable, and corresponding accessor and mutator methods, to your objects. An int or long is recommended although some people prefer a TimeStamp or Calendar. The latter two have slightly worse performance and are not absolutely guaranteed to work correctly (one might end up with two updates made so close together in time that they have the same TimeStamp value - although some operating systems ensure that this can not the case), but they have the advantage that you can easily see exactly when the update was made.

  • Declare the version instance variable in your mapping file for the class. This requires the following element added to the mapping file immediately after your id element (assuming your instance variable is called "version" and you want it in a column in the database called "version"):

    <version name="version" column="version">

Now whenever you make an object dirty in memory, Hibernate will update its version (in memory). Whenever the object gets flushed to disk (e.g., at the end of a transaction or because you call session.update or session.saveOrUpdate to re-attach a detached object, Hibernate will throw a StaleObjectStateException if the version number of the object on disk is not the same as it was when the object was loaded. By catching that exception, the programmer can then decide what to do about the conflict (e.g., report back to the user that the choice he/she has just made is, in fact, no longer available and could they please make another one).

10. Mapping Classes to the Database

Hibernate objects, i.e., objects whose persistence Hibernate will manage, can be divided into two types.

  1. Entity beans are objects which have a persistent identity: i.e., usually an identifier field which is managed by Hibernate. These are typically the central business objects in an application such as User, Customer, Order etc.

    Value beans are objects which only exist in relationship to an entity bean. These are typically support objects for the entity objects such as Address, CreditCardDetails etc.

The connection between entity beans and database tables and columns is described in a mapping file: usually named X.hbm.xml for class X and stored in the same directory as the compiled file X.class.

The connection between value beans and the database is usually described in the mapping file for the corresponding entity bean.

In the following sections, we will look at the details of how to specify mappings for different types of mappings and classes. We only cover the basic situations. There are many variations possible and great flexibility in all the options and for more complex situations

10.1.  Mapping Simple Entity Classes without Relationships

A basic mapping file is as follows:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
        "-//Hibernate/Hibernate Mapping DTD 2.0//EN"
        "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="a.b.c">                                    (1)
    <class name="User" table="user" lazy="true">                       (2)
        <id name="id" column="id" type="long">                         (3)
                <generator class="sequence"/>
        </id>
        <version  name="version"     column="version"/>                (4)
        <property name="dateOfBirth" column="dob"    type="date"/>     (5)
        <property name="username"    not-null="true" unique="true"/>   (6)
        <property name="gender"/>                                      (7)
    </class>
</hibernate-mapping>
1

The package attribute is optional but using it defines a default package prefix for all classes mentioned in this class mapping specification

2

The class element is to specify the class that will be persisted. The name attribute is required. The column attribute is optional (it defaults to a suitable SQL name based on the class name). The lazy attribute is optional (it defaults to false) and specifies that lazy loading of this class via proxies should take place as described previously.

3

The id element defines what the primary key of the table should be (again the column attribute is optional). There are a number of options for type and the identifier generator algorithm but leaving it as shown here (i.e., using the database supplied sequence or automatic number generator) is a safe option.

4

The use of the optional version element to specify optimistic concurrency control was discussed above. Without this element, no check for lost updates or unrepeatable reads will be made, and, therefore, such problems may occur when reattaching detached objects (irrespective of the transaction isolation level) or when updating the database from a modified persistent object (when the isolation level is read committed or less).

5

Types for columns do not need to be specified if they are simple and can be deduced by inspecting the class properties. However, sometimes you want to override the default or specify something a bit more sophisticated.

6

We can specify whether a column should be unique (which will create a database constraint) and/or whether it can be null.

7

Finally, the simplest case is where we specify nothing but the property name — everything else is taken care of by the defaults.

10.2.  Mapping Value Objects within Entities (Component Mapping)

Sometimes one has properties of entity beans which are more complex than simple base types but which are totally owned by the entity. The standard example is that of an Address object stored as a property of a Person object. In this case there is only one Address object for the Person object.

Value Component

A Value Component

There are a number of ways this can be handled, but the simplest, which Hibernate calls components is to store both the parent object (Person) and the child (Address) in the same database row, to construct and connect the two objects on reading this row from the database and to coalesce the two objects and write them together when saving or updating either of them. To specify this, you use a component sub-element for the child object in the class element for the parent object instead of the usual property element. Thus, instead of something like

<property name="address" type="string"/>

one would enter:

<component name="address" class="Address">
    <property name="street"   column="user_street"/>
    <property name="postcode" column="user_postcode"/>
</component>

Some details to be aware of are:

  1. These child objects are wholly owned by their parents: you cannot have two different parents.

  2. A null child property is represented in the database by setting to null all the fields corresponding to the child object. Thus loading such a row will result in a parent object with a null child property, not a parent object with a child object whose properties are all null.

  3. Not only can you have multiple components in a class, but one can have multiple components of the same (child) class in a class: simply make sure that the component names are different and that the database field names (the column attributes) are different for the different components.

    Normally, the child object has no way to reference its parent object. However, if you want a property of the child object to refer to its parent, add a element of the form

    <parent name="person"/>

    to make a person property of the child object refer to its parent object.

10.3.  Mapping Entities with Inheritance

Again there are a number of ways Hibernate can handle inheritance. These are based on the standard techniques for reducing generalisation hierarchies in entity-relationship diagrams[BCN91].

The simplest is to use one table for the whole hierarchy. With this design, each row of the table can hold an object of any type from the hierarchy. There is one column for each of the properties in the union of the sets of properties of all the classes in the hierarchy and there is one discriminator column which contains a value (usually of type string, character or integer) used to tell which actual type of object is stored in this particular row. One normally does not make this discriminator a property of the class: it is used only by Hibernate to record and detect the type of the object that a row represents.

An Inheritance Class Hierarchy

An Inheritance Class Hierarchy

<class name="Person" table="people" discriminator-value="P">
    <id name="id" column="id" type="long">
        <generator class="sequence"/>
    </id>
    <version  name="version"         column="version"/>
    <discriminator column="subclass" type="character"/>
    <property name="dateOfBirth"     column="dob" type="date"/>
    <property name="name"/>
    <property name="gender"/>
    <subclass name="Lecturer" discriminator-value="L">
        <property name="office" type="string"/>
        <property name="telephone" type="string"/>
    </subclass>
    <subclass name="Student" discriminator-value="D">
        <property name="studentID" type="integer"/>
    </subclass>
</class>

Here one may not specify any of the subclass fields as not null because the corresponding column will be null in the table for any object of the hierarchy which is not of the subclass that contains the relevant property for that column.

10.4.  Many-to-One, Unidirectional Associations

This corresponds to the standard Java reference to one object from another.

A Many-to-One, Unidirectional Association

A Many-to-One, Unidirectional Association

In the diagram above, we represent the relationship between students and their thesis supervisors. In this design, a student can have no more than one supervisor but may not (yet) have any. However, a lecturer may have any number, including zero, of students to supervise. Furthermore, we only allow a one directional link: Student has a property (say getSupervisor/setSupervisor) but there is no direct way, starting with a Lecturer object, to find the students that the lecturer supervises.

If we start with the simple, non-related, base entity mapping files for Student and Lecturer, we add this association by adding the following, as a sub-element of the class element, to the mapping file for Student:

<many-to-one name="supervisor"	column="supervisor"/>

This element acts very much like a normal property element in that it defines the mapping between the supervisor property of Student and the column in the students table. However, it also sets up the relationship so that, after getting a Student object from the database, if we use the supervisor accessor of that Student object, we will get the corresponding Lecturer object (or a proxy thereof if we have enabled lazy loading of the Lecturer objects). Finally, it ensures that the underlying database is created with a foreign key constraint that the supervisor column is a foreign key into the lecturers table.

As things stand, there is now a question of what you want the cascade behaviour of the relationship to be (see the section on cascade above). Without adding the optional cascade attribute to the many-to-one element, then the Lecturer object on the other end of the association is ignored when the Student object is saved, updated, deleted or when its supervisor property is reset away from it. Certainly we would not want the lecturer to be removed from the database when the student is deleted or when the student no longer has that lecturer as his or her supervisor; so none of the delete or all options are appropriate. But what about save-update?. There are two scenarios under which this might have an effect:

  1. If you create a new (transient) lecturer and make a persistent student refer to it. In fact, for this particular object design, one would never do such a thing: the obvious semantics of the situation dictate that you cannot just invent new lecturers on demand: you would always have to have the lecturer as a currently existing object in the database before setting the student's supervisor property to that lecturer. Since the scenario will never arise, this is neither a vote for or against using the save-update option.

  2. If the Student, and associated Lecturer objects were detached, and now you reattach the Student object, then you need the save-update option if you want the Lecturer object to be reattached automatically. Without that, you need to reattach it directly yourself — an easy task to overlook and therefore a source of bugs. This therefore, is a vote for set the cascade="save-update" option.

Note that you can specify unique="true" as an attribute of the many-to-one element. This has the effect of disallowing the possibility of having two student rows with the same supervisor values, i.e., turning the "*" on the Student side of the class diagram into a "0..1" or limiting each lecturer to having at most one supervisee. Similarly, specifying not-null="true" adds the requirement that every student must have a valid supervisor, i.e., it changes the "0..1" on the Lecturer side of the diagram to a "1".

10.5.  One-to-Many, Unidirectional Associations

This relationship is essentially the same as last one, but now we choose the opposite direction for navigating the connection. Thus our Lecturer object now has a property which is a collection of Student objects while the Student objects have no properties which refer to their supervising Lecturer.

[Warning]Warning

For reasons described below, we would (almost) never use such an association: it is inefficient and there is almost no overhead in converting it to a much more efficient bidirectional association. Nonetheless, it is useful to discuss this case as a first step towards the bidirectional version of the association.

A One-to-Many, Unidirectional Association

A One-to-Many, Unidirectional Association

As far as the database is concerned, there is no difference between this and the unidirectional many-to-one association: there will still be a single column in the table holding the Student objects that contains a foreign key into the table holding the Lecturer objects.

Now that entities are being stored in collections, it becomes critical that you have appropriately implemented equals and hashCode methods for those entities. In particular, you should ensure that these methods are independent of the generated surrogate keys and that trivial changes to the object do not effect the methods while the objects are in the collections.

The simplest collection, for our purposes, is a Set. To create the association, we add a Set valued property to Lecturer

<set name="advisees" cascade="save-update" lazy="true">
    <key column="lecturer_id"/>
    <one-to-many class="Student"/>
</set>

Here, we define a property of Lecturer which is Set valued. The name of the property is advisees. This property is to capture a one-to-many association to the Student class and it this association is to be implemented in the database as a foreign key to the table holding Lecturer objects stored in the column lecturer_id in the table holding Student objects.

There are a number of constraints imposed by the use of this one-to-many association which arises from the fact that it is represented by this "reverse" link from the contained object side of the association:

  1. From a Java point of view, we could potentially have two different Lecturer objects, both of which have the same Student object in their container. However, this is not possible for a one-to-many association because, in the database, each Student row refers to the single Lecturer row which contains it. If you want the true Java semantics, you have to represent the association as a many-to-many one.

  2. You cannot have the same object multiple times in the same collection. This is obvious when the collection property is of type Set, but one could use other types, such as List. However, the implementation of association by the reverse foreign key makes this impossible. Again, a many-to-many association can provide the appropriate semantics

Finally, there is the question of why one-to-many associations between entities cause problems. Consider the following code:

tx = session.beginTransaction();
Lecturer lect = new Lecturer("Gordon Brown") ;
lect.getAdvisees().add(new Student("Tony Blair")) ;
lect.getAdvisees().add(new Student("Michael Howard")) ;
session.save(lect);
tx.commit();

Note that the association belongs to the Lecturer class (as it is defined in Lecturer's mapping file). This means that adding a student to a lecturer's advisees is considered an operation on a lecturer, not on a student. Thus the SQL statements that would be generated for the above statements would include an insert for the lecturer object, together with an insert each for the two connected student objects (because the students are referred to by the lecturer and we have put the cascade="save-update" declaration in the Lecturer's mapping file). But because the association does not belong to the students, the saving of the students would not set the foreign key value to the advising lecture. Thus there would be two extra update statements for adding the lecturer's foreign key value into the student records. These extra two update statements are not just an efficiency problem: If every student should have a supervisor, then we would like to add the not-null="true" attribute to the key element in the mapping file for the association. However this would cause errors as the above sequence of inserts and updates does insert nulls (if only to immediately update them) where they should never occur.

The solution is to only create such one-to-many associations as the inverse end of a bidirectional many-to-one association. This gives ownership of the association to the Student end and, as we see below, leads to the foreign key being created as part of the initial insert of the Student record instead of after it as a consequence of the Lecturer insert.

10.6.  Many-to-one, bidirectional Associations

In this case we allow navigation in both directions between the two classes. On the Many side it is a standard java reference. on the One side it is a collection. However, the two associations are not independent of each other but rather, one is the inverse of the other and the same foreign key column is used for both associations. Thus our Lecturer object now has a property which is a collection of Student objects while the Student objects have a properties which refers to the Student's supervising Lecturer.

A Many-to-One, bidirectional Association

A Many-to-One, bidirectional Association

To achieve this, we start by using the many-to-one element as before in the mapping file for the Student class, and the Set element as before in the mapping file for the Lecturer class, ensuring that both associations use the same column in Student's table to encode the association. Then we add a new attribute, inverse="true" to the set element in Lecturer's mapping file. Without this, adding a new Student as an advisee to a Lecturer would trigger Hibernate to set the foreign key column of the Student table twice: once for each association that has been changed. The inverse attribute tells hibernate that Student owns the association and that Hibernate should not trigger updates of the foreign key column when it changes on the Lecturer side.

Thus the mapping file for Lecturer looks like this:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
          "-//Hibernate/Hibernate Mapping DTD//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
    <class name="Lecturer" table="lecturers">
        <id name="id" column="lecturer_id">
            <generator class="sequence"/>
        </id>
        <version name="version" column="version"/>
        <property name="name" column="name"/>
        <set name="advisees" inverse="true" cascade="save-update" lazy="true">
            <key column="lecturer_id"/>
            <one-to-many class="Student"/>
        </set>
    </class>
</hibernate-mapping>

The mapping file for student is as follows:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
          "-//Hibernate/Hibernate Mapping DTD//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
    <class name="Student" table="students">
        <id name="id" column="student_id">
            <generator class="sequence"/>
        </id>
        <version name="version" column="version"/> 
        <property name="name" column="name"/>
        <property name="regNo" column="reg_no"/>
        <many-to-one name="advisor" column="lecturer_id" cascade="save-update"/>
    </class>
</hibernate-mapping>

Now all the programmer has to do is to ensure that, when the Lecturer's advisee property property is changed, the corresponding correct changes are made to the appropriate Student's advisor property. So long as both are done together, the Java object graph will be correct and the correct update on disk will be made as well. Furthermore, since the association belongs to the Student, there will never be an insert of a Student record with a null Lecturer foreign key if the Student has an advisor, thus avoiding not-null constraint breaking. To ensure that these updates are made together, it is usual to add some convenience methods: in Lecturer, change the getAdvisees() and setAdvisees() methods to private and add a convenience method to update the object graph correctly when adding a new Student advisee to a Lecturer:

    public void addAdvisee(Student st)
    {
        Lecturer oldAdvisor = st.getAdvisor() ;
        if (oldAdvisor != this)
        {
            if (oldAdvisor != null)
                oldAdvisor.getAdvisees().remove(st) ;
            st.setAdvisor(this);
            advisees.add(st) ;
        }
    }

Note how we are careful to correctly handle removal of a Student from a previous advising Lecturer before adding it to this one. Whether you need to do something similar for your code will depend on your detailed design.

If we have a true composition relationship, i.e., a parent-child relationship where if the parent gets deleted then the child should also be deleted etc., then we should change the cascade attribute on the set element in the Lecturer mapping file to be all-delete-orphan.

11. Patterns

12. Going Further

There is still plenty more to learn about Hibernate. There are one-to-one and many-to-many associations, value (as opposed to entity) collections, outer-join and batch fetching, Iterate queries, Criteria queries and the whole of the HQL query language, not to mention explicit SQL queries. There are alternative strategies for inheritance hierarchy mapping and polymorphism handling. There are user-defined data types and mappings, Interceptors, caching and all the Hibernate related tools. All of this and more are discussed on the Hibernate web site and in the book.

References

[BCN91] Carlo Batini, Stefano Ceri, and Shamkant Navathe. Conceptual Database Design . Benjamin Cummings. 1991. 0805302441.

[BK05] Christian Bauer and Gavin King. Hibernate in Action . Manning Publications Co.. 2005. 1932394-15-X.