Ransomware and the DB2 Database – Part 1
What Would an Attack Look Like?
By now I’m sure everyone has heard of the malicious practice known as ransomware attacks, where miscreants break into a corporate network and encrypt data before demanding huge sums of money to provide a method to decrypt that data and make it accessible again. The attacks tend to be insidious – sometimes the attacker is in the network for months before they gain access to the systems they are interested in, and they are known to target backup servers as well as the primary systems to cause maximum inconvenience to the target organisation.
The damage to an organisation’s reputation from such an attack can be far reaching, and the cost of downtime while data is recovered, if it can be recovered, can add up to staggering sums even before considering the cost if the ransom had to be paid. There is also the ethical consideration of whether paying the criminals behind these schemes is just encouraging further attacks.
Thankfully the profile of information security practises is pretty high these days, but it only takes one person to click on a malicious link to let the bad guys in. As database administrators and data engineers we are particularly high-value targets due to our level of access and our normal work patterns of handling large amounts of data.
Given the value of the data stored in our databases, it makes sense to protect them as much as possible, and good security should be in the forefront in the mind of every data professional. But what would an attack on a DB2 database look like – how would we recognise the failure patterns if our server was attacked?
That was the question we were asked recently, so to be able to give a realistic answer I played some war games on a test setup and observed the behaviour and error messages.
The Test Setup
For this testing I was using a DB2 v11.1 HADR primary database running on Redhat Linux. I’m not aware of any malware specifically targeting DB2 itself, and (sadly to some extent) it would be a rather niche target to develop malicious software for specifically, so our target will be the database files on disk.
As the specific order files are encrypted isn’t known, and the failure mode is likely to be different depending on what is attacked first, we will target four groups of files separately:
- The DB2 binaries
- The instance owner home directory
- The tablespace containers
- The active log files
Several database connections were started which inserted and read data from the database to simulate read and write workload while the encryption was occurring.
Despite what some users of code I’ve written may claim, I’m not a malicious software author. So, for this test I’ll use a basic Linux command to encrypt the files one by one – in a real attack scenario the encryption software is likely to be a lot more sophisticated and may alter only certain portions of files, even while they are in use, making the corruption harder to detect.
I did consider putting the command used in this blog, but didn’t want to risk someone running it accidently… suffice to say it involved find, openssl and dd.
Results from the Tests
Test 1 – DB2 Binaries
This targeted the DB2 binaries, in /opt/IBM/db2 and resulted in a sudden and complete failure of the DB2 processes.
Time to notice failure
DB2 engine failure
From the command line:
DB2: error while loading shared libraries: /home/DB2inst1/sqllib/lib64/libDB2.so.1: invalid ELF header
In the syslog:
Aug 5 14:14:54 wargam1 kernel: traps: DB2 general protection ip:7efed498eDB2 sp:7ffd4e06ea98 error:0 in ld-2.17.so[7efed4975000+22000]
Aug 5 14:14:54 wargam1 kernel: traps: DB2fm trap invalid opcode ip:7fbd0e73c871 sp:7fffda5fe680 error:0 in libDB2.so.1[7fbd0d516000+2cac000]
Aug 5 14:14:54 wargam1 kernel: DB2fmcd: segfault at 0 ip 00007faa3b2df9a3 sp 00007ffc32282e00 error 4 in libgcffmcmd.so.1[7faa3b2d5000+1c000]
Aug 5 14:14:54 wargam1 systemd: DB2fmcd.service: main process exited, code=killed, status=11/SEGV
Aug 5 14:14:54 wargam1 systemd: Unit DB2fmcd.service entered failed state.
Aug 5 14:14:54 wargam1 systemd: DB2fmcd.service failed.
Test 2 – Instance owner home directory
Here the home directory of the instance owner, db2inst1 was targeted. It takes longer for error messages to appear in this test, I suspect as the directory mostly contains configuration files they are accessed less frequently – the DB2 process itself can continue for some time.
Time to notice failure
Commands hang or return system errors and errors accessing control files
"MESSAGE : ZRC=0xFFFFEC41=-5055
SQL5055C The content of the local or global database configuration file is not valid.
SQL10003C There are not enough system resources to process the request. The request cannot be processed.
/home/DB2inst1/sqllib/db2profile: line 4: syntax error
Test 3 – Tablespace containers
This targeted the tablespace containers for user tables and indexes. Various types of table were being used in the database including range-partitioned tables with separate tablespace containers for the partitions. The time taken before an error is apparent looks to depend on how soon DB2 needs to read data in from the disk – anything cached in the bufferpool for example will not need a physical IO and therefore won’t see an error. As might be expected, the database engine itself stayed online even if the user data was not accessible.
Time to notice failure
Database remains online, pages in memory continue to be used, reading or writing pages on disk results in error
DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL1655C The operation could not be completed due to an error accessing data on disk. SQLSTATE=58030
Test 4 – Active logs
This targeted the log files in the active log path. Similarly to the tablespace container test, it was when DB2 needed to read the data that errors were reported. There was a particular danger however with the log files, in that the data could be encrypted before the log was archived – and the log archive process would complete successfully. It was only later when attempting to restore the database that the corrupted log file caused an error.
Time to notice failure
Errors seen when log read is attempted
Database remains online, attempt to read log results in error. If log needed for rollback database shuts down.
MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used"
DIA8414C Logging can not continue due to an error.
MESSAGE : ZRC=0x87100048=-2028994488=SQLP_BADLSO "Invalid LSO value."
DIA8546C An invalid log sequence offset (LSO), the value was "".</code
SQL1034C The database was damaged, so all applications processing the database were stopped. SQLSTATE=58031
Thoughts on the Results
Whilst there is no guarantee that the behaviour seen in these tests would be replicated in a real life attack, the results are probably in line with expectations for the behaviour of DB2 when disk corruption is encountered – the possibility of affected log files being archived was the one thing that we had not considered prior to the test. Ensuring monitoring is in place for serious errors at the application, database and server level is vital, and the messages captured during this testing demonstrate the sort of errors that could be seen if the DB2 filesystems are attacked.
The particularly good news in our case is that in all the scenarios tested the HADR standby maintained data integrity and was available to take over the workload. Keep an eye out for the second part of this blog where we’ll go through some best practices to secure your databases against a ransomware attack.