Using Coherence and Hibernate

 

Hibernate and Coherence can be used together in several combinations. This document discusses the various options, including when and each one is appropriate, along with usage instructions. These options including using Coherence as a Hibernate plug-in, using Hibernate as a Coherence plug-in via the CacheStore interface and bulk-loading Coherence caches from a Hibernate query. Most applications that use Coherence and Hibernate use a mixture of these approaches. The Hibernate API features powerful management of entities and relationships, and the Coherence API delivers maximum performance and scalability.

Conventions

This document refers to the following Java classes and interfaces:

com.tangosol.coherence.hibernate.CoherenceCache
com.tangosol.coherence.hibernate.CoherenceCacheProvider
com.tangosol.coherence.hibernate.HibernateCacheLoader
com.tangosol.coherence.hibernate.HibernateCacheStore
 
 
com.tangosol.net.NamedCache (extends java.util.Map)
 
 
com.tangosol.net.cache.CacheLoader
com.tangosol.net.cache.CacheStore
 
 
org.hibernate.Query
org.hibernate.Session
org.hibernate.SessionFactory

As the CacheStore interface extends CacheLoader, the term "CacheStore" will be used generically to refer to both interfaces (the appropriate interface being determined by whether read-only or read-write support is required). Similarly, "HibernateCacheStore" will refer to both implementations.
The Coherence cache configuration file is referred to as coherence-cache-config.xml (the default name) and the Hibernate root configuration file is referred to as hibernate.cfg.xml (the default name).

Selecting a Caching Strategy

Overview

Generally, the Hibernate API is the optimal choice for accessing data held in a relational database where performance is not the dominant factor. For application state (or any type of data that fits naturally into the Map interface) use the Coherence API. For performance-sensitive operations, specifically those that may benefit from Coherence-specific features like write-behind caching or cache queries, use the Coherence API.

Hibernate API

The Hibernate API provides flexible queries and relational management features including referential integrity, cascading deletes and child object fetching. While these features may be implemented using Coherence, this involves development effort which may not be worthwhile in cases where performance is not an issue.

Coherence NamedCache API

There are many Coherence features that require direct access to the Coherence NamedCache API, including:

  • Write-Behind Caching (low-latency, high-throughput database updates)
  • Distributed Queries (low-latency, high-throughput search queries)
  • Cache Transactions (application-tier transactions)
  • InvocableMap (stored procedures, aggregations)
  • Invocation Service (messaging and remote invocation)
  • Cache Listeners (event-based processing)

Direct access to these features may be critical for achieving the highest levels of scalable performance.

Coherence CacheStore Integration

CacheStore modules are useful for transparently keeping cache and database synchronized. They are also more efficient than independently updating the cache and database as updates are routed through Coherence's partitioning facilities, minimizing locking.
CacheStore modules give very high performance for caching that can be expressed via a Map interface, that is a key-value pair. The NamedCache interface is a much simpler and by extension much lower-overhead API than the Hibernate query API. Additionally, in some cases (where complex queries can be mapped into a key-based pattern), very complex queries can be answered by a simple cache retrieval.
One final reason for using CacheStore is that it provides a means of coordinating all database (or other backend) access through a single API (NamedCache) and through a controlled set of JVMs (server machines). This is because the nodes which are responsible for managing cache partitions are the same machines responsible for synchronizing with the database server.

Using Coherence as the Hibernate L2 Cache

Introduction

Hibernate supports three primary forms of caching:

  • Session cache
  • L2 cache
  • Query cache

The Session cache is responsible for caching records within a Session (a Hibernate transaction, potentially spanning multiple database transactions, and typically scoped on a per-thread basis). As a non-clustered cache (by definition), the Session cache is managed entirely by Hibernate. The L2 and Query caches span multiple transactions, and support the use of Coherence as a cache provider. The L2 cache is responsible for caching records across multiple sessions (for primary key lookups). The query cache caches the result sets generated by Hibernate queries. Hibernate manages data in an internal representation in the L2 and Query caches, meaning that these caches are usable only by Hibernate. For more details, see the Hibernate Reference Documentation (shipped with Hibernate), specifically the section on the Second Level Cache.

Configuration and Tuning

To use the Coherence Caching Provider for Hibernate, specify the Coherence provider class in the "hibernate.cache.provider_class" property. Typically this is configured in the default Hibernate configuration file, hibernate.cfg.xml.

<property name="hibernate.cache.provider_class">com.tangosol.coherence.hibernate.CoherenceCacheProvider</property> 

The file coherence-hibernate.jar (found in the lib/ subdirectory) must be added to the application classpath.
Hibernate provides the configuration property hibernate.cache.use_minimal_puts, which optimizes cache access for clustered caches by increasing cache reads and decreasing cache updates. This is enabled by default by the Coherence Cache Provider. Setting this property to false may increase overhead for cache management and also increase the number of transaction rollbacks.
The Coherence Caching Provider includes a setting for how long a lock acquisition should be attempted before timing out. This may be specified by the Java property tangosol.coherence.hibernate.lockattemptmillis. The default is one minute.

Specifying a Coherence Cache Topology

By default, the Coherence Caching Provider uses a custom cache configuration located in coherence-hibernate.jar named config/hibernate-cache-config.xml to define cache mappings for Hibernate L2 caches. If desired, an alternative cache configuration resource may be specified for Hibernate L2 caches via the tangosol.coherence.hibernate.cacheconfig Java property. It is possible to configure this property to point to the application's main coherence-cache-config.xml file if mappings are properly configured. It may be beneficial to use dedicated cache service(s) to manage Hibernate-specific caches to ensure that any CacheStore modules don't cause re-entrant calls back into Coherence-managed Hibernate L2 caches.
In conjunction with the scheme mapping section of the Coherence cache configuration file, the hibernate.cache.region_prefix property may be used to specify a cache topology. For example, if the cache configuration file includes a wildcard mapping for "near-*", and the Hibernate region prefix property is set to "near-", then all Hibernate caches will be named using the "near-" prefix, and will use the cache scheme mapping specified for the "near-*" cache name pattern.
It is possible to specify a cache topology per entity by creating a cache mapping based on the combined prefix and qualified entity name (e.g. "near-com.company.EntityName"); or equivalently, by providing an empty prefix and specifying a cache mapping for each qualified entity name.
Also, L2 caches should be size-limited to avoid excessive memory usage. Query caches in particular must be size-limited as the Hibernate API does not provide any means of controlling the query cache other than a complete eviction.

Cache Concurrency Strategies

Hibernate generally emphasizes the use of optimistic concurrency for both cache and database. With optimistic concurrency in particular, transaction processing depends on having accurate data available to the application at the beginning of the transaction. If the data is inaccurate, the commit processing will detect that the transaction was dependent on incorrect data, and the transaction will fail to commit. While most optimistic transactions must cope with changes to underlying data by other processes, the use of caching adds the possibility of the cache itself being stale. Hibernate provides a number of cache concurrency strategies to control updates to the L2 cache. While this is less of an issue for Coherence due to support for cluster-wide coherent caches, appropriate selection of cache concurrency strategy will aid application efficiency.
Note that cache configuration strategies may be specified at the table level. Generally, the strategy should be specified in the mapping file for the class.
For mixed read-write activity, the read-write strategy is recommended. The transactional strategy is implemented similarly to the nonstrict-read-write strategy, and relies on the optimistic concurrency features of Hibernate. Note that nonstrict-read-write may deliver better performance if its impact on optimistic concurrency is acceptable.
For read-only caching, use the nonstrict-read-write strategy if the underlying database data may change, but slightly stale data is acceptable. If the underlying database data never changes, use the read-only strategy.

Query Cache

To cache query results, set the hibernate.cache.use_query_cache property to "true". Then whenever issuing a cacheable query, use Query.setCacheable(true) to enable caching of query results. As org.hibernate.cache.QueryKey instances in Hibernate may not be binary-comparable (due to non-deterministic serialization of unordered data members), use a size-limited Local or Replicated cache to store query results (which will force the use of hashcode()/equals() to compare keys). The default query cache name is "org.hibernate.cache.StandardQueryCache" (unless a default region prefix is provided, in which case "[prefix]." will be prepended to the cache name). Use the cache configuration file to map this cache name to a Local/Replicated topology, or explicitly provide an appropriately-mapped region name when querying.

Fault-Tolerance

The Hibernate L2 cache protocol supports full fault-tolerance during client or server failure. With the read-write cache concurrency strategy, Hibernate will lock items out of the cache at the start of an update transaction, meaning that client-side failures will simply result in uncached entities and an uncommitted transaction. Server-side failures are handled transparently by Coherence (dependent on the specified data backup count).

Deployment

When used with application servers that do not have a unified class loader, the Coherence Cache Provider must be deployed as part of the application so that it can use the application-specific class loader (required to serialize-deserialize objects).

Using the Coherence HibernateCacheStore

Overview

Coherence includes a default entity-based CacheStore implementation, HibernateCacheStore (and a corresponding CacheLoader implementation, HibernateCacheLoader). More detailed technical information may be found in the JavaDoc for the implementing classes.

Configuration

The examples below show a simple HibernateCacheStore constructor, accepting only an entity name. This will configure Hibernate using the default configuration path, which looks for a hibernate.cfg.xml file in the classpath. There is also the ability to pass in a resource name or file specification for the hibernate.cfg.xml file as the second <init-param> (set the <param-type> element to java.lang.String for a resource name and java.io.File for a file specification). See the class JavaDoc for more details.
The following is a simple coherence-cache-config.xml file used to define a NamedCache called "TableA" which caches instances of a Hibernate entity (com.company.TableA). To add additional entity caches, add additional <cache-mapping> elements.

<?xml version="1.0"?> 
 
 
<!DOCTYPE cache-config SYSTEM "cache-config.dtd"> 
 
 
<cache-config> 
 <caching-scheme-mapping> 
 <cache-mapping> 
 <cache-name>TableA</cache-name> 
 <scheme-name>distributed-hibernate</scheme-name> 
 <init-params> 
 <init-param> 
 <param-name>entityname</param-name> 
 <param-value>com.company.TableA</param-value> 
 </init-param> 
 </init-params> 
 </cache-mapping> 
 </caching-scheme-mapping> 
 
 
 <caching-schemes> 
 <distributed-scheme> 
 <scheme-name>distributed-hibernate</scheme-name> 
 <backing-map-scheme> 
 <read-write-backing-map-scheme> 
 <internal-cache-scheme> 
 <local-scheme></local-scheme> 
 </internal-cache-scheme> 
 
 
 <cachestore-scheme> 
 <class-scheme> 
 <class-name> 
 com.tangosol.coherence.hibernate.HibernateCacheStore
 </class-name> 
 <init-params> 
 <init-param> 
 <param-type>java.lang.String</param-type> 
 <param-value>{entityname}</param-value> 
 </init-param> 
 </init-params> 
 </class-scheme> 
 </cachestore-scheme> 
 </read-write-backing-map-scheme> 
 </backing-map-scheme> 
 </distributed-scheme> 
 </caching-schemes> 
</cache-config> 

It is also possible to use the pre-defined {cache-name} macro to eliminate the need for the <init-params> portion of the cache mapping:

<?xml version="1.0"?> 
 
 
<!DOCTYPE cache-config SYSTEM "cache-config.dtd"> 
 
 
<cache-config> 
 <caching-scheme-mapping> 
 <cache-mapping> 
 <cache-name>TableA</cache-name> 
 <scheme-name>distributed-hibernate</scheme-name> 
 </cache-mapping> 
 </caching-scheme-mapping> 
 
 
 <caching-schemes> 
 <distributed-scheme> 
 <scheme-name>distributed-hibernate</scheme-name> 
 <backing-map-scheme> 
 <read-write-backing-map-scheme> 
 <internal-cache-scheme> 
 <local-scheme></local-scheme> 
 </internal-cache-scheme> 
 
 
 <cachestore-scheme> 
 <class-scheme> 
 <class-name> 
 com.tangosol.coherence.hibernate.HibernateCacheStore
 </class-name> 
 <init-params> 
 <init-param> 
 <param-type>java.lang.String</param-type> 
 <param-value>com.company.{cache-name}</param-value> 
 </init-param> 
 </init-params> 
 </class-scheme> 
 </cachestore-scheme> 
 </read-write-backing-map-scheme> 
 </backing-map-scheme> 
 </distributed-scheme> 
 </caching-schemes> 
</cache-config> 

And, if naming conventions allow, the mapping may be completely generalized to allow a cache mapping for any qualified class name (entity name):

<?xml version="1.0"?> 
 
 
<!DOCTYPE cache-config SYSTEM "cache-config.dtd"> 
 
 
<cache-config> 
 <caching-scheme-mapping> 
 <cache-mapping> 
 <cache-name>com.company.*</cache-name> 
 <scheme-name>distributed-hibernate</scheme-name> 
 </cache-mapping> 
 </caching-scheme-mapping> 
 
 
 <caching-schemes> 
 <distributed-scheme> 
 <scheme-name>distributed-hibernate</scheme-name> 
 <backing-map-scheme> 
 <read-write-backing-map-scheme> 
 <internal-cache-scheme> 
 <local-scheme></local-scheme> 
 </internal-cache-scheme> 
 
 
 <cachestore-scheme> 
 <class-scheme> 
 <class-name> 
 com.tangosol.coherence.hibernate.HibernateCacheStore
 </class-name> 
 <init-params> 
 <init-param> 
 <param-type>java.lang.String</param-type> 
 <param-value>{cache-name}</param-value> 
 </init-param> 
 </init-params> 
 </class-scheme> 
 </cachestore-scheme> 
 </read-write-backing-map-scheme> 
 </backing-map-scheme> 
 </distributed-scheme> 
 </caching-schemes> 
</cache-config> 

Configuration Requirements

Hibernate entities accessed via the HibernateCacheStore module must use the "assigned" ID generator and also have a defined ID property.
Be sure to disable the "hibernate.hbm2ddl.auto" property in the hibernate.cfg.xml used by the HibernateCacheStore, as this may cause excessive schema updates (and possible lockups).

JDBC Isolation Level

In cases where all access to a database is through Coherence, CacheStore modules will naturally enforce ANSI-style Repeatable Read isolation as reads and writes are executed serially on a per-key basis (via the Partitioned Cache Service). Increasing database isolation above Repeatable Read will not yield increased isolation as CacheStore operations may span multiple Partitioned Cache nodes (and thus multiple database transactions). Using database isolation levels below Repeatable Read will not result in unexpected anomalies, and may reduce processing load on the database server.

Fault-Tolerance

For single-cache-entry updates, CacheStore operations are fully fault-tolerant in that the cache and database are guaranteed to be consistent during any server failure (including failures during partial updates). While the mechanisms for fault-tolerance vary, this is true for both write-through and write-behind caches.
Coherence does not support two-phase CacheStore operations across multiple CacheStore instances. In other words, if two cache entries are updated, triggering calls to CacheStore modules sitting on separate servers, it is possible for one database update to succeed and for the other to fail. In this case, it may be preferable to use a cache-aside architecture (updating the cache and database as two separate components of a single transaction) in conjunction with the application server transaction manager. In many cases it is possible to design the database schema to prevent logical commit failures (but obviously not server failures). Write-behind caching avoids this issue as "puts" are not affected by database behavior (and the underlying issues will have been addressed earlier in the design process).

Extending HibernateCacheStore

In some cases, it may be desired to extend the HibernateCacheStore with application-specific functionality. The most obvious reason for this is to leverage a pre-existing programmatically-configured SessionFactory instance.

Creating a Hibernate CacheStore

Introduction

While the provided HibernateCacheStore module provides a solution for most entity-based caches, there may be cases where an application-specific CacheStore module is necessary. For example, providing parameterized queries or including or post-processing of query results.

Re-entrant Calls

In a CacheStore-backed cache implementation, when the application thread accesses cached data, the cache operations may trigger a call to the associated CacheStore implementation via the managing CacheService. The CacheStore must not call back into the CacheService API. This implies, indirectly, that Hibernate should not attempt to access cache data. Therefore, all methods in CacheLoader/CacheStore should be careful to call Session.setCacheMode(CacheMode.IGNORE) to disable cache access. Alternatively, the Hibernate configuration may be cloned (either programmatically or via hibernate.cfg.xml), with CacheStore implementations using the version with the cache disabled.
It is important that a CacheStore implementation does not call back into the hosting cache service. Therefore, in addition to avoiding calls to NamedCache methods, you should also ensure that Hibernate itself does not use any cache services. To do this, call Session.setCacheMode(CacheMode.IGNORE) each time a session is used. Alternatively, the Hibernate configuration may be cloned (either programmatically or via hibernate.cfg.xml), with CacheStore implementations using the version with the cache disabled.

Fully Cached DataSets

Distributed Queries

Distributed queries offer the potential for lower latency, higher throughput and less database server load compared to executing queries on the database server. For set-oriented queries, the dataset must be entirely cached to produce correct query results. More precisely, for a query issued against the cache to produce correct results, the query must not depend on any uncached data.
This means that you can create hybrid caches. For example, it is possible to combine two uses of a NamedCache: a fully cached size-limited dataset for querying (e.g. the data for the most recent week), and a partially cached historical dataset used for singleton reads. This is a good approach to avoid data duplication and minimize memory usage.
While fully cached datasets are usually bulk-loaded during application startup (or on a periodic basis), CacheStore integration may be used to ensure that both cache and database are kept fully synchronized.

Detached Processing

Another reason for using fully-cached datasets is to provide the ability to continue application processing even if the underlying database goes down. Using write-behind caching extends this mode of operation to support full read-write applications. With write-behind, the cache becomes (in effect) the temporary system of record. Should the database fail, updates will be queued in Coherence until the connection is restored, at which point all cache changes will be sent to the database.