Riak – from Bitcask to LevelDB

Posted on Posted in Data & Business Intelligence

Riak is a key-value distributed NoSQL database by Basho.

Riak supports 2 storage backend which are persistent: Bitcask and LevelDB.

Bitcask vs. LevelDB

Bitcask is the default Riak key-value backend. Basically, this is an append-only data storage, so if you’re updating an existing key, it will write a new row to the files and will mark the old entry as a dead key. You can define a threshold of dead keys in a file so it will merge the files (and will delete dead keys) to free storage space. A merge can be a heavy operation, so Riak allows you to decide when to merges will be executed (at the node level).

Since Bitcask is append only – writes are really fast. Reads are also fast – Bitcask stores all of the keys in RAM, so when reading a key, it’s just a single disk seek.

You can also define a TTL (Time To Live or object expiration) for keys.

LevelDB – LevelDB stores keys and values in arbitrary byte arrays where the data is sorted by key. It supports batching writes, forward and backward iteration, and compression.

Bitcask limitations from real production scenarios

We also started with Bitcask as the backend, but after 2.5 years we decided to move to LevelDB.

This is a list of the limitation we saw when using bitcasks:

  • All keys must be stored in RAM – you get a fast response for a get request, but it means that you’ll have to add more memory to the cluster (by adding more memory to the servers or more servers to the cluster).
  • Merge windows – merge windows were pretty intensive in our case (30% writing, 70% reading) and even after 6 hours not all merges were done, which took a lot of storage (due to the dead keys).
  • Startup time – Riak must load all keys to memory before the service becomes ready, in our case it took almost 60 minutes for one node to restart. Please note that you can do a rolling restart and not affect the service.

Before we decided to move to LevelDB, we were concerned with performance, in Bitcask it’s only one disk seek, but in LevelDB it depends on which level the key is (and might need to de-compress it).

 

How to change the backend from Bitcask to LevelDB on production environment

This is the plan we used to change all of our production servers from Bitcask to LevelDB. Note that our Riak version was 1.4.2 (there was a change in config file for version 2.0.0 or above) and that we backed-up the cluster before starting, just in case.

Step Description
1 Verify that riak-01 is disabled on the load balancer (LB).
2

Stop riak-01:

riak stop

3 Wait for the process to complete. Monitor /var/log/riak/console.log .
4

Change Riak’s storage backend on riak-01:

vi /etc/riak/app.config

or

vi /etc/riak/riak.conf

Change the storage backend property:

Your LevelDB related settings in riak.conf should be:

storage_backend = leveldbleveldb.maximum_memory.percent = 10

The equivalent settings in app.config are:

· In the riak_kv section: {storage_backend, leveldb}

· In the eleveldb section: {total_leveldb_mem_percent, 10}

5

Make sure the syntax is OK:

riak chkconfig

6

Start riak-01 node:

riak start

7

Wait for node to startup.

Verfiy that all is OK via logs:

tailf /var/log/riak/console.log

8

Initiate force repairs:

riak attach

{ok, Ring} = riak_core_ring_manager:get_my_ring().

Partitions = [P || {P, 'riak@riak01.local'} <- riak_core_ring:all_owners(Ring)].

[riak_kv_vnode:repair(P) || P <- Partitions].

Close the console using Ctrl+G -> q

9

Monitor the transfers, wait for all repairs to complete:

riak-admin handoff summary

or

riak-admin transfers

10 After verifying that the node and cluster are running OK with LevelDB, continue to the next node (repeat steps 1-9)
Cluster behavior – before and after

Node startup time – When the backend was Bitcasks it was 60 minutes (needs to load all the keys to memory), now in LevelDB it’s 9 seconds

Storage – Total Bitcask directory size was 1.4TB per node. Now in LevelDB the LevelDB directory is 430GB per node.

This graph shows the total vnode gets requests per node in a minute. The average is ~100K requests per node.

 

 

 

 

 

 

 

 

 

 

This graph shows the total vnode puts requests per node in a minute. The average is ~7K requests per node.

 

 

 

 

 

 

 

 

 

 

Performance:

Disk I/O – Write IOPS – Before

 

 

 

 

 

 

 

 

 

 

 

Disk I/O – Write IOPS – After

 

 

 

 

 

 

 

 

 

 

No significant change in the write IOPS graph – this is expected.

 

Disk I/O – Read IOPS – Before

 

 

 

 

 

 

 

 

 

 

Disk I/O – Read IOPS – After

 

 

 

 

 

 

 

 

 

 

Significantly less reads – this is mainly because the merge operations (on Bitcask files). This means that while the backend was Bitcask, the cluster was busy mainly with the merges of Bitcask data files. 

 

Total Get FSM Mean Time – Before

This graph shows total get FSM mean time. This is the mean time between reception of client GET request and subsequent response to client

 

 

 

 

 

 

 

 

 

 

Total Get FSM Mean Time – After

 

 

 

 

 

 

 

 

 

 

Decrease in get FSM mean time – this was one of our major concerns before we change the backend. Bitcask needs only 1 seek to get the value from disk (since all the keys are in RAM) so we had some concerns about the execution time once we’ll use LevelDB. In our case we saw better performance 

Total Put FSM Mean Time – Before

This graph shows total put FSM mean time. This is the mean time between reception of client PUT request and subsequent response to client

 

 

 

 

 

 

 

 

 

 

Total Put FSM Mean Time – After

 

 

 

 

 

 

 

 

 

 

To sum things up, we only benefit from changing the backend from Bitcask to LevelDB:

  • Node startup time – from 60 minutes to 9 seconds.
  • Storage – from 1.4TB per node to 430GB.
  • Execution times – faster response in get requests
  • Overall cluster improvement:
    • No need in merge window (which is I/O intensive).
    • We can restart a node at any time (no need to worry about merge window or long startup time).
    • No need to worry about adding more RAM to the cluster (we don’t need to store all keys in RAM).

One last comment: 2i (secondary indices) are available only in LevelDB and memory backends (not Bitcask). Also – Bitcask backed supports TTL on keys but LevelDB doesn’t. We are not using 2i nor TTL so it wasn’t a factor in our decision.

Leave a Reply

Your email address will not be published. Required fields are marked *