View All / AWS S3 Recovery for DB2 LUW

AWS S3 Recovery for DB2 LUW

Damir Wilder

February 17, 2022

Get in touch

Introduction

With the arrival of DB2 version 11.5 Fixpack 07 it has finally become possible to configure the DB2 recovery process to revolve entirely around the AWS S3 service (Simple Storage Service).

In this blog article we will explore how this can be done and go through the full cycle – from the initial database setup, over the regular transactional workload that induces log archiving, to a (simulated) database crash and then finally the full database recovery using exclusively the resources in the S3 archive.

Setup

For the purposes of this exercise, we will create and use a completely new and empty database called TESTDB (I haven’t recorded the statement, mea culpa, but the simplest form will do just fine: “db2 create db TESTDB”)

We will also need a S3 resource to work with – an AWS Bucket – where the database backups and the transactional logs will be archived. I already have a suitable test bucket called dwrnd.triton.co.uk (created beforehand in the AWS Region us-east-2) that can be used in this exercise (for details on how to obtain an AWS Bucket please consult the AWS S3 documentation).

Next, we will configure the access to the S3 Bucket in our DB2 instance. This involves installing prerequisite Linux libraries, installing and configuring the AWS CLI, creating a local keystore and configuring DB2 to use it and lastly defining a storage access alias (here called S3DWRND) that points to the S3 Bucket dwrnd.triton.co.uk (you can find the details on how all this can be done here).

The last thing to do is configure the database to archive its transactional logs in the provided S3 Bucket, via the storage access alias (again, the details are here):

First log archive method  (LOGARCHMETH1) = DB2REMOTE://S3DWRND//TESTDB_LOGS

Having done all this, we should be good to go!

Initial Backup

The first thing we want to do is take a full DB backup in order to create a baseline for any later recovery scenarios (which as we all know involve restore and rollforward operations).

So, let’s take a backup to the S3, into the predefined S3 Bucket, via the remote storage alias S3DWRND:

db2 "backup db TESTDB online to DB2REMOTE://S3DWRND//TESTDB/ include logs"
  Backup successful. The timestamp for this backup image is : 20220209095835

The backup image goes into the subdirectory TESTDB to keep it separate from the archived log files (which should be stored in the subdirectory TESTDB_LOGS as mentioned above).
We can verify the backup image is stored in the S3 with the following command:

aws s3 ls s3://dwrnd.triton.co.uk/TESTDB/
2022-02-09 09:58:43  436346880 TESTDB.0.db2dwtst.DBPART000.20220209095835.001

Transactional Activity

At this point we will start transactional activity on the database in order to start filling up the transactional log files and trigger the log archiving to the S3.

Since TESTDB is a completely new and empty database we will create one test table and then start inserting random data into it as a simulation of a real-world transactional activity (this will do the job of archiving the log files quite nicely as we will shortly see).

The test table:

create table TEST.TEST_TABLE(
    ID integer not null primary key
             generated always as identity (start with 1 increment by 1 no maxvalue), 
    TEXT varchar(50))

The random-string UDF:

create function TEST.RND_STRING(in LENGTH integer)
  returns VARCHAR(100)
  RETURN
  with TTB(str, depth) as
   ( select varchar(chr(48+int(78*rand())),100), 1 from sysibm.sysdummy1
    union all
     select varchar(str || chr(48+int(78*rand())),100), depth+1
     from TTB where depth <= LENGTH )
   select str from TTB where depth = LENGTH
;

The insert loop (executed from a BASH shell):

while [[ 1 == 1 ]]; do 
  db2 -x "into test.test_table (text) values (test.rnd_string(1+int(50*rand())))";
done

“Keep it simple”, they said 😊
Checking the row count in the test table confirms the transactions are going through:

db2 "select current timestamp current_time, count(*) total_rows from test.test_table"
    CURRENT_TIME               TOTAL_ROWS
    -------------------------- -----------
    2022-02-10-11.27.19.255196      441167  [the count keeps growing!]

Check the Log Retention

Inspecting the DB2 diagnostic log shows the DB2 logs are being archived to the S3 archive:


2022-02-10-11.09.07.022714+000 E11825573E422         LEVEL: Info
PID     : 10218                TID : 139686391572224 PROC : db2sysc 0
INSTANCE: db2dwtst             NODE : 000            DB   : TESTDB
HOSTNAME: itchy
EDUID   : 511                  EDUNAME: db2logmgr (TESTDB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3109
MESSAGE : ADM1844I  Started archive for log file "S0001559.LOG"

2022-02-10-11.09.09.984758+000 I11825996E575         LEVEL: Info
PID     : 10218                TID : 139686391572224 PROC : db2sysc 0
INSTANCE: db2dwtst             NODE : 000            DB   : TESTDB
HOSTNAME: itchy
EDUID   : 511                  EDUNAME: db2logmgr (TESTDB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3180
DATA #1 : 
Completed archive for log file S0001559.LOG to
   DB2REMOTE://S3DWRND//TESTDB_LOGS/db2dwtst/TESTDB/NODE0000/LOGSTREAM0000/C0000000/
from /home/db2dwtst/db2dwtst/NODE0000/SQL00001/LOGSTREAM0000/.

With the full DB backup image already stored in the S3 (//S3DWRND/TESTDB/) and the transactional log files being automatically archived to the S3 (//S3DWRND/TESTDB_LOGS/) as they are closed, we should be in a position where a full database recovery can be done by using exclusively the S3 archive.
Let’s test this!

Crash/Drop the Database

At this point we will simulate a disaster event by dropping the whole TESTDB database (but you can use your imagination here to come up with a more realistic real-life scenario: fire, flood, bomb, earthquake, the attack of the Teenage Mutant Ninja DBAs, you name it…).

Just for later reference, let’s record the total number of rows in the test table:

db2 "select current timestamp as current_time, count(*) as total_rows from test.test_table"
    CURRENT_TIME               TOTAL_ROWS
    -------------------------- -----------
    2022-02-10-12.41.05.771728      582450
  1 record(s) selected.

The time has come… the database goes down with fanfare:

db2 terminate
DB20000I  The TERMINATE command completed successfully.
db2 deactivate db TESTDB
DB20000I  The DEACTIVATE DATABASE command completed successfully.
db2 drop db TESTDB
DB20000I  The DROP DATABASE command completed successfully.

Shortly afterwards, the entries in the DB2 diagnostic log confirm the disaster has struck:

2022-02-10-15.39.00.684068+000 E12147847E569         LEVEL: Warning
PID     : 10240                TID : 140560080656128 PROC : db2acd 0
INSTANCE: db2dwtst             NODE : 000
HOSTNAME: itchy
FUNCTION: DB2 UDB, Administrative Task Scheduler, AtsTask::run, probe:200
MESSAGE : ZRC=0xFFFF8A93=-30061
          SQL30061N  The database alias or database name "" was not found at the remote node.
DATA #1 : 
[IBM][CLI Driver] SQL30061N  The database alias or database name "TESTDB" was not found at the remote node.  SQLSTATE=08004

With the database AWOL, the alarm bells start ringing – it’s time for a recovery action!

Recover the Database

In a real world, if you’ve lost your whole data centre (or at least the whole server) to a bunch of TMNDs, you would have to start the recovery from the very beginning (see “Setup” above).
But in this little experiment, we will use the same server and the same DB2 instance, which has already been configured for S3 access, so we don’t need to do that again. Therefore, we simply need to initiate a database restore from the latest backup image (stored in the S3 archive), followed by a rollforward operation to the end of logs – which will hopefully recover all the data inserted into the TEST table!
First, find out what is the latest backup image available:

aws s3 ls s3://dwrnd.triton.co.uk/TESTDB/
...
2022-02-09 09:58:43  436346880 TESTDB.0.db2dwtst.DBPART000.20220209095835.001

Let’s run the database restore from the above backup image, straight from the S3 archive:

db2 "restore db TESTDB from DB2REMOTE://S3DWRND//TESTDB/ taken at 20220209095835"
DB20000I  The RESTORE DATABASE command completed successfully.

The restore has completed successfully and the database is back from the dead (well, almost)!
The DB2 diagnostic log shows the following, which confirms the restore was indeed done directly from the remote S3 archive:

2022-02-10-16.13.32.379690+000 E12186877E527         LEVEL: Info
PID     : 10218                TID : 139677038274304 PROC : db2sysc 0
INSTANCE: db2dwtst             NODE : 000            DB   : TESTDB
APPHDL  : 0-22843              APPID: *LOCAL.db2dwtst.220210161332
AUTHID  : DB2DWTST             HOSTNAME: itchy
EDUID   : 491                  EDUNAME: db2agent (TESTDB) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetDegreeParallelism, probe:558
DATA #1 : 
Autonomic backup/restore - using parallelism = 2.
...
DATA #2 : Generic Remote Storage Handle, PD_TYPE_REMSTG_HANDLE, 19728 bytes
  Vendor Type      = S3 (0)
  Role             = Master
  Server Name      = s3.us-east-2.amazonaws.com
  Container Name   = dwrnd.triton.co.uk
  Authentication 1 = 
  Authentication 2 = 
  Prev. Operation  = GetObjectInfo (4)
  Original Input   = S3::s3.us-east-2.amazonaws.com: :::::dwrnd.triton.co.uk::TESTDB/TESTDB.0.db2dwtst.DBPART000.20220209095835.001
  Source Name      = TESTDB/TESTDB.0.db2dwtst.DBPART000.20220209095835.001
  Target Name      = /home/db2dwtst/sqllib/tmp/RemoteStorage.0000/p10218_t887/TESTDB_TESTDB.0.db2dwtst.DBPART000.20220209095835.001
  Staging Location = /home/db2dwtst/sqllib/tmp/RemoteStorage.0000/p10218_t887
  Session Handle   = Initialized

Next, check the database state (should be “rollforward pending”):

db2 rollforward db TESTDB query status
                                 Rollforward Status
 Input database alias                   = TESTDB
 Number of members have returned status = 1
 Member ID                              = 0
 Rollforward status                     = DB  pending
 Next log file to be read               = S0001453.LOG
 Log files processed                    =  -
 Last committed transaction             = 2022-02-09-09.58.39.000000 UTC

Indeed, status is as expected (and the current PIT very much obsolete!).
Let’s check the required logs are still in the S3 archive (notice we need to look into the expanded S3 bucket name: TESTDB_LOGS/db2dwtst/TESTDB/NODE0000/LOGSTREAM0000/C0000000, the same as would happen with the local log archive having several subdirectory levels):

aws s3 ls s3://dwrnd.triton.co.uk/TESTDB_LOGS/db2dwtst/TESTDB/NODE0000/LOGSTREAM0000/C0000000/
2022-02-09 09:58:41    3321320 S0001453.LOG
2022-02-09 10:12:23    3119275 S0001454.LOG
...
2022-02-10 15:33:23    4139135 S0001596.LOG
2022-02-10 15:35:04    1114521 S0001597.LOG

All logs are present, so we can execute the ROLLFORWARD operation:

db2 "rollforward db TESTDB to end of logs and stop"
                                 Rollforward Status
 Input database alias                   = TESTDB
 Number of members have returned status = 1
 Member ID                              = 0
 Rollforward status                     = not pending
 Next log file to be read               =
 Log files processed                    = S0001453.LOG - S0001596.LOG
 Last committed transaction             = 2022-02-10-15.34.11.000000 UTC
DB20000I  The ROLLFORWARD command completed successfully.

Bingo, all nicely done!
Here’s what the DB2 diagnostic log has to say about fetching the log files from the S3 archive (here showing messages just for the last log file):

...
2022-02-10-16.47.59.639589+000 E2137079E613          LEVEL: Info
PID     : 10218                TID : 139677029885696 PROC : db2sysc 0
INSTANCE: db2dwtst             NODE : 000            DB   : TESTDB
HOSTNAME: itchy
EDUID   : 1018                 EDUNAME: db2logmgr (TESTDB) 0
FUNCTION: DB2 UDB, data protection services, sqpLogMgrEdu::sqlpgRetrieveLogFile, probe:4130
DATA #1 : 
Started retrieve for log file S0001596.LOG for log stream 0 from log chain 0.
   Retrieval destination: /home/db2dwtst/db2dwtst/NODE0000/SQL00001/LOGSTREAM0000/LOGSTREAM0000/
   Actual retrieval method: USE_METHOD1
... 
MESSAGE : ADM1845I  Completed retrieve for log file "S0001596.LOG" on chain "0"
          from
          "DB2REMOTE://S3DWRND//TESTDB_LOGS/db2dwtst/TESTDB/NODE0000/LOGSTREAM0
          000/C0000000/" to
          "/home/db2dwtst/db2dwtst/NODE0000/SQL00001/LOGSTREAM0000/LOGSTREAM0000/".

Finally, let’s check if we got all data back into the TEST table:

db2 connect to TESTDB
   Database Connection Information
 Database server        = DB2/LINUXX8664 11.5.7.0
 SQL authorization ID   = DB2DWTST
 Local database alias   = TESTDB

db2 "select count(*) from test.test_table"
    1
    -----------
         582450
  1 record(s) selected.

The row count matches that from before the “disaster” event.
This confirms the DB2 database recovery has completed successfully!

Compare Timings: S3 vs. Local

Just to look at this exercise from a different perspective, let’s check the time required for taking a remotely stored full database backup image vs. a local one:
Take a backup to S3:

time db2 "backup db TESTDB online to DB2REMOTE://S3DWRND//TESTDB/ include logs"
  Backup successful. The timestamp for this backup image is : 20220209095835
  real    3m23.469s
aws s3 ls s3://dwrnd.triton.co.uk/TESTDB/
2022-02-09 09:58:43  436346880 TESTDB.0.db2dwtst.DBPART000.20220209095835.001

Take a backup to a local disk:

time db2 "backup db TESTDB online to /tmp include logs"
  Backup successful. The timestamp for this backup image is : 20220209101216
  real    0m5.810s
ls -l /tmp/TESTDB*
  -rw------- 1 db2dwtst db2dwtst 436346880 Feb  9 10:12
  TESTDB.0.db2dwtst.DBPART000.20220209101216.001

Local backup is considerably faster – 6 seconds vs. 3:23 minutes for the S3 backup!
Test the speed of copying the backup image from S3 to a local disk:

time aws s3 cp
     s3://dwrnd.triton.co.uk/TESTDB/TESTDB.0.db2dwtst.DBPART000.20220209095835.001 .
...Completed 306.0 MiB/416.1 MiB (5.9 MiB/s) with 1 file(s) remaining...
download: s3://dwrnd.triton.co.uk/TESTDB/TESTDB.0.db2dwtst.DBPART000.20220209095835.001 to ./TESTDB.0.db2dwtst.DBPART000.20220209095835.001
real    1m13.482s

Avg. speed of 5.9 MiB/s is obviously not too impressive – but note: this was done via a standard public Internet connection from a UK based server to the rather remote AWS us-east-2 region.
Shorter distances (using a nearby AWS Region) and better Internet connections will certainly provide better speeds.

For even more better speeds there are (paid) services such as the AWS Direct Connect.
Just something to keep in mind…

Conclusion

Even though the speed (of both backups and restores) can be an issue with the S3 storage – most certainly slower than a local network and will vary depending on multiple factors – the undisputable benefit of using such a remote storage service is that it provides a very reliable Disaster Recovery service for your data.

According to the AWS documentation, the data stored in the S3 has a durability of 99,999999999% (11 9s) and can grow with practically no limits. And you can easily access your data from practically anywhere in the World. Try to beat that in your local data centre!

As for the good ol’ DB2, configuring it to access S3 is a simple matter, and the subsequent usage of S3 resources appears to work seamlessly. Even the remote-storage command options are not too cryptic and difficult to remember 😊

All in all, this is an excellent and very welcome update of the DB2 functionality, which integrates DB2 with the AWS cloud services better than ever before.