Sunday 11 October 2015

Vastly improved index build time on start-up of a Gemfire based application

Here, I’d like to discuss a specific scenario when a Gemfire server might experience slower start-up because of the following conditions in an application:
  • Building many indexes across a large number of objects.
  • Building indexes on value, which takes some time (per object) to be derived. e.g. the value is being derived from a JSON structure.

A solution to improve index building time (being discussed) is to use a programmatic approach to build indexes in parallel. This can reduce the start-up time by ~4 times.

I hope this helps anyone facing a similar situation/constraint.

My Application

  • It runs two clusters, across 6 servers. Each server runs a 150g vm with replicated regions.
  • The Objects stored are NOT simple java beans but an encapsulation of a data structure that enables us to store and retrieve data in a generic manner, irrespective of its structure. This also helps to avoid making any or major changes to store a new type / structure / class of data.
  • It has ~40g of compacted pdx-serialized data on disk/persistence store. (This generally is a fair approximation of the amount of data held in memory).
  • It has ~40 indexes across 6 regions. Two of our biggest regions taken together contain ~4.6 million entries.

The Problem (time taken to build indexes on start-up)

  • When a Gemfire server starts, it locks the entire cluster for modification, until the start-up is complete. This includes building all the indexes on data.
  • All indexes need to be built on start-up, before any queries start hitting the server.
  • The server has a start-up time of ~45 mins. If a server needs a restart we have to wait for a low-activity period, otherwise the cluster gets locked for any modification.

To find a solution, I tested the different strategies to build indexes.

Different Strategies to build indexes:

  • Approach 1 :: Gemfire simple indexing (the current strategy)


Simple indexing is the standard way of building indexes on objects. Gemfire picks up the index definitions from its configuration file and builds them serially. For each index, it iterates all the objects in the region.

Outcome: The server takes ~45 mins to start. The CPU utilization remains low, ~5% on an average.

I then decided to try the gemfire 7+ defined indexing. This strategy is available in Gemfire 7+. It is a programmatic way to build indexes. We can define all the indexes to be built for all the regions using the Gemfire API. It then iterates through each region once and builds all indexes for it.

  • Approach 2 :: Gemfire 7+ defined indexing


I used the Java API approach from the above. 

Outcome: This time the server took ~35 mins to start. The CPU utilization increased to an average of ~9%.

Although the objects are iterated once for all indexes to be built on it, the indexing is still serial across regions. This means the indexes for a region will start building only after all the indexes for a previous region have been built.

I wanted an even faster start-up time to support a server start up in business hours. At this time, Gemfire has no plans to build all indexes in parallel even in a future release, so I decide to write some custom code around it.

  • Approach 3 :: Custom concurrent indexes

The approach I took is very simple. I created a cache listener on the last region that gets built (regions are built serially). Help on creating a cache-listenerOn the “region created” event of the cache listener, I submitted a Runnable task for each index to be built to a java thread pool. The indexes were now being built concurrently.

Outcome: This time the server took ~8 mins to start. This approach utilized more CPU. CPU utilization was average ~40% and peaked at 91 % for a minute.

My Solution

Running a custom concurrent index builder (Approach 3) proves to be fastest and most effective. This approach works on Gemfire 6+ versions and better utilizes the system resources.

Note:
  • Although, most applications would obviously avoid using any third-party APIs to build but it’s still worth mentioning. 
    • APIs like Apache Lang3’s equals and hash code builder execute a synchronized block and cause high contention in scenarios similar to the one being discussed and should be avoided.
    • I've observed similar issues with Spring's spel expression parser.
  • More available processors should (ideally) further improve these timing.

Performance comparison (based on my application)


Test Environment

  • Java: version "1.7.0_51", Java(TM) SE Runtime Environment (build 1.7.0_51-b13), Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
  • Host configuration: OS = GNU/Linux 2.6.32-504.3.3.el6.x86_64, Processors = 24
  • Cluster configuration: Single standalone node on Gemfire v 8.1 and spring-data-gemfire 1.6.2.
  • Server’s JVM params: -d64 -server -Djava.net.preferIPv4Stack=true -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+UseCompressedOops -XX:+OptimizeStringConcat -XX:+UseStringCache -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -Xms150g -Xmx150g -Xss256k -XX:NewSize=24g -XX:MaxNewSize=24g -XX:PermSize=1g -XX:MaxPermSize=1g -XX:CMSInitiatingOccupancyFraction=75