Using s3cmd with DB2 Backup
Following on from a previous blog about using the s3cmd utility with export and import one of the most asked questions was ‘can I use this with backup’, to which my first thought was… no – the backup file name is generated by DB2 including a timestamp so it wouldn’t be practical to backup to a named pipe. A bit more investigation however revealed my thinking was out of date – DB2 developed the ability to backup to a named pipe in version 11.1. With that feature it should be possible to use s3cmd to backup the database without using a storage alias and the required staging area for the backup file, so let’s try it out.
Step One – Install s3cmd
In the last blog we used an Ubuntu VM running on AWS, but this time we’ll be using a Red Hat VM running on the Triton R&D server. Download links for the s3cmd utility can be found at the s3tools.org website, I downloaded the latest version, unzipped & untarred then ran the setup Python script:
curl "https://deac-ams.dl.sourceforge.net/project/s3tools/s3cmd/2.1.0/s3cmd-2.1.0.tar.gz" -o s3cmd-2.1.0.tar.gz
gunzip s3cmd-2.1.0.tar.gz && tar -xvf s3cmd-2.1.0.tar
python setup.py install
Step Two – Create an S3 Bucket
The s3cmd utility is compatible with storage from several cloud providers that utilise the S3 protocol. In the last blog we used AWS, but IBM Cloud currently provide a ‘Lite’ tier of their Cloud Object Storage for no charge which should suffice for this testing:
Once you have activated the service there is a handy ‘Predefined bucket’ quick start, just enter the name and click next until the bucket is created:
Step Three – Create Access Keys
For s3cmd to use our bucket it will need some credentials – these can be created in the Cloud Object Storage resource page in the portal, when creating them remember to choose ‘Include HMAC Credential’ under Advanced options as this will generate the secret_access_key.
Step Four – Configure s3cmd
With the bucket and access credentials set up we can configure s3cmd. As in the last blog we’ll do this as the db2inst1 user as that is the account that will run the backup commands. Running the utility with the –configure option provides a set of prompts to input the required information (or accept defaults) and test the configuration – in my case I’ve inputted the keys, UK for the Default region, an Encryption password, and used the endpoints listed in the portal. If all goes well the tests will give success messages and the option to save the configuration:
su – db2inst1
Step Five – Running a Backup
Configuration complete, you should be able to use ‘s3cmd ls’ to list the contents of the configured bucket with no errors. We can now run the backup to the cloud storage. First set up a named pipe, in this case called ‘mypipe’, then create a background job to feed output from this pipe into the s3cmd utility:
cat /tmp/mypipe | s3cmd -q put - s3://db2-backups-bkt/backup_161220_1429.db2 &
Note that this time we are not piping the data through gzip – that’s because DB2 can compress the backup for us. You can check the pipe and background job exist using the ls and jobs commands. Once they are set up, run the backup specifying the pipe as the destination:
db2 backup db sampledb online to /tmp/mypipe compress
Refreshing the portal shows the backup file is indeed in our cloud storage:
Step Six – Running a Restore
Backing up the database is great but not much use unless we can restore it. There are a couple of ways to go about this – we could pull the file back onto disk and restore from there, but the point of this exercise is to avoid requiring the disk space that would take. Luckily, we can now restore from a named pipe as well. First step is to create a pipe (note the previous one may still exist if not removed after the backup was taken) and feed it with the backup data via s3cmd in a background job, then we can reference it in the restore command:
s3cmd get -q s3://db2-backups-bkt/backup_161220_1429.db2 - > /tmp/mypipe &
db2 restore db sampledb from /tmp/mypipe
My backup was taken online, so a quick rollforward later and we are up and running.
Why Use s3cmd Instead of a Remote Storage Alias?
At this point you may be thinking to yourself that’s great, but if we need DB2 v11.1 to backup and restore using named pipes then why not use the remote storage alias features that are built in? The answer is that storage aliases need disk space to stage the backup file, whereas s3cmd will read from the named pipe and transmit the data in small chunks – here’s a graph demonstrating the disk usage while a 1GB database is backed up using a storage alias, versus using named pipes with s3cmd:
If you would like to do your own comparison testing, take a look at the blog by my colleague Damir covering set up of storage aliases.
Storing DB2 backups in cloud-based object storage gives us the same benefits we saw for export files – offsite storage, immutable data for audit requirements and the ability to distribute the backup files if needed. Being able to backup and restore a database without requiring the staging space for a remote storage alias is however particularly useful for backups, as larger databases could need a lot of disk space which has to be left unused.
Here are some links to resources used in this blog:-Previous | Next »