Neo4j – Causal cluster backup and restore

Posted on Posted in DevOps & Networking

Its pretty obvious why you would want to backup and restore your database.

Neo4j Enterprise edition provides a utility that allows us backup the database while its online (as part of the neo4j-admin tool)

 

Backup

With the Causal cluster the server that you will backup will serve requests during the online backup period so you will want the backup to be as short as possible (especially with larger DB sizes)

I want to demonstrate one approach, the idea is to locally run the backup on the neo4j server and then copy the backup to a remote media server

This setup has a disadvantage, it requires additional disk space on the neo4j server (enough space to contain the graph.db folder) in this example the local directory that will contain backups is /mnt/backup

 

Neo4j configuration 

Make sure dbms.backup.enabled is set to true in /etc/neo4j/neo4j.conf (That’s the default value)

dbms.backup.enabled=true

There is another setting that allows remote backups, but we won’t cover that setting in this post.

 

Backup script

I created a very basic shell script that uses the neo4j-admin backup tool to do the backup.

It requires copying the ssh public key to the remote media server (this is done on the neo4j server that we want to backup):

The script accepts one parameter for the backup type (incremental or full)

  • Full – deletes previous local copy and starts a full backup
  • Incremental – doesn’t delete the previous local copy

Note: we are disabling the consistency-check during the backup by passing –check-consistency=false, this shortens the backup period but the downside is that you have to manually run the consistency check on the offline copy (ideally on the backup media server)

Put this script is on the neo4j server that you want to backup (e.g in /usr/local/sbin/neo4j_backup.sh) :

Change the backup configuration according to your setup

Cron schedule

Schedule the script via cron (or any other scheduler you like), in this example full backup is done on Sunday and the rest of the week is incremental backups

Consistency check

The consistency check can be done on the offline backup folder, I also created a very simple shell script for that.

Install neo4j on the backup server and then run this script on the backup server

  • -Xms6g -Xmx6g – This is the java heap size
  • -cp – This is the class path, the consistency check .jar is located in /usr/share/neo4j/lib by default
  • NEO4J_BACKUP_DIR is the directory that holds your backups – this script only looks for folders created yesterday in that folder, but you can change it to whatever you want

Restore

After we covered the backup method I want to discuss how we can restore this backup to the cluster.

  1. Make sure the backup you took passed the consistency check, look at the log and see if the backup is valid.
  2. Copy the backup folder to all of the causal cluster servers
  3. Restore the backup on each instance and change folder permissions (Make sure you have enough space in the neo4j data folder)

4. Shut down all database instances in the cluster and swap the current graph.db folder with the restored restore.db folder

5. Start the database instances.

6. Make sure the cluster is up (look at the debug.log and neo4j.log), after you verified everything is up you can remove the previous database folder.

Leave a Reply

Your email address will not be published. Required fields are marked *