DB2 for z/OS Locking for Application Developers Part 10
This is the last article in the series on locking for developers, and is a wrap up of miscellaneous points arising from previous articles.
Row level locking
The first item under discussion is row-level locking. I mentioned previously that the design default lock size should be page-level, with row-level locking only being used where justified. This is to reduce the CPU and elapsed time overhead typically incurred by row-level locking, and especially to avoid the data sharing overhead involved. The DB2 for z/OS documentation has further information about locking in a data sharing environment, but for the purposes of this article it’s important to stress that row level locking in a data sharing environment can and typically does introduce significant overheads.
Row-level locking is sometimes beneficial for update applications or for applications which mix reads and updates, but in most cases there isn’t much – if any – benefit to be gained from using row-level locking instead of page-level. This is for several reasons. For example, in OLTP environment characterised by random access, the rows being retrieved or updated by concurrent transactions might be distributed across the data pages in such a way that there is rarely any page-level contention. Or consider a batch program reading and updating data sequentially: if there is little or no contention with other batch jobs or online transactions, then performance will be worse because more locks will be taken, with a higher CPU cost and longer elapsed times.
However, there are some cases which mandate row-level locking. Amongst the sort of applications in this category are:
- Applications where the rows being accessed by concurrent transactions are located on the same page or set of pages, resulting in unacceptable page-level lock contention.
- For tables which are subject to high-volume concurrent inserts and which are defined with APPEND and MEMBER CLUSTER, to prevent exponential growth in the size of the table combined with inefficient space use.
Using ROWID and RID functions
You may remember that I suggested that you should be cautious when considering using these two functions, to guard against the case where the ROWID or RID values change between retrieval of a value and its subsequent reuse to access the row again, this time directly. A typical direct row access use case is when using optimistic locking to select and update a row. The reason you need to be careful is that the values for these functions can be changed by a REORG, and in many cases probably will be. However, if you select a ROWID or RID value for a given row and then re-use it for direct row access in the same commit scope, then the use of these functions is safe.
Materialised result sets
To return to materialised result sets, another consideration for applications reading but not updating rows via a read-only cursor on a materialised result set is the fact that the application doesn’t know which version of the row it’s reading: a ‘hot’ row that is frequently updated by other transactions could be updated one or more times since the result set was materialised, depending on the elapsed time between result set materialisation and the row fetch operation. In effect, the application doesn’t know if the row data values are current or have been updated since the result set was materialised. An application reading and then updating the row using safe optimistic locking techniques is protected, but this is a more difficult problem for the read-only application where data currency is important. In an OLTP environment, the number of affected applications is probably quite small, but in some cases, this could be an important consideration. Having cautioned against access-path dependent cursors, one way to protect against this is to make sure there is at least one index that satisfies the predicates used. As always, the need for the index has to be balanced against the cost of index maintenance, if you’ve decided that the benefit of relying on the index offsets the risk of access path changes. Remember, the consideration here is data currency and not performance – this is not a recommendation to avoid index usage.
RR/RS and non-materialised result sets
There some considerations for RR/RS applications and non-materialised result sets. If the result set is materialised, then the application knows all the data row values were current as of cursor open time, and will not change while the application is running because the share locks acquired at cursor open time will be retained until the next commit. However, for a non-materialised result set which takes a long time to consume, it is quite possible that rows are updated between cursor open time and row fetch time, when the rows are evaluated and the locks are acquired. Again, this probably affects a very small set of applications.
I’ve hinted at lock avoidance in several previous articles, without going into any detail about the lock avoidance mechanism or the beneficial effects it can have on performance, especially in data sharing. Here I want to provide just enough information to help application developers understand the role they play in ensuring effective lock avoidance. Lock avoidance is a technique used by DB2 to avoid taking locks for read-only requests where possible. This does not imply an uncommitted or dirty read, but it does mean that DB2 will avoid taking a lock for a read-only request if it can guarantee that the pages where the target rows are located are logically and physically consistent. This is important for two reasons: it can improve concurrency and reduce contention by avoiding unnecessary S locks which can delay updaters; and it avoids the CPU overhead of lock acquisition and release, and more importantly, of contention management, especially in a data sharing environment.
The discussion in this article is limited to just one part of the lock avoidance mechanism, known as the commit log sequence number or CLSN. In a non-data sharing environment, DB2 tracks the time of the latest update to each page in a partition or tablespace (for segmented and simple tablespaces). It also tracks the start time of the oldest uncommitted unit of work that has updated the partition or tablespace – this is the CLSN for this page set. This updating unit of work is known as a Unit of Recovery, or UOR (sometimes called a UR). When these two values are compared, if the time of the update to the data page is earlier than the start time of the oldest UOR updating the partition or tablespace, then DB2 knows all the data on the page has been committed and can therefore employ lock avoidance1. Otherwise, DB2 is unable to use lock avoidance for rows on this page.
The CLSN checking part of the lock avoidance mechanism operates differently in a data sharing environment, but the behaviour changes somewhat in DB2 12. To start off with the behaviour prior to DB2 12, when a partition or tablespace is group buffer pool-dependent (at least two members accessing the object, with at least one of these updating the object), then DB2 uses a global commit log sequence number (global CLSN) value, which is in fact the earliest start time of all uncommitted URs, across all members, across all page sets. This means that a single long-running UR can make lock avoidance less effective for all applications running in a data sharing group: the earlier the global CLSN value, the more likely it is that the most recent update to a page is later than the global CLSN, meaning that lock avoidance is less likely to be successful, even with the additional checking DB2 does.
DB2 12 makes a big step forward in reducing the impact of long-running URs, with each member of the data sharing group maintaining a list of approximately the 500 oldest partition/tablespace CLSN values. When DB2 12 performs CLSN lock avoidance checking this list is checked first. If the partition being accessed is in the list, DB2 uses the associated CLSN. Otherwise, the highest CLSN value in the list is used as the global CLSN value for lock avoidance checking.
Nevertheless, the lesson from this, even in DB2 12, is to keep the CLSN values as recent as possible, to maximise the opportunity for lock avoidance and to minimise contention. It’s hard to overestimate how important this is, as locking is an expensive business, especially in data sharing. There are two best practices to follow to maximise lock avoidance:
- Commit as frequently as practical, to avoid long-running URs pushing the CLSN value far into the past.
- Defer any updates to as late as possible in the transaction, to keep the CLSN value for the transaction as recent as possible. This is because the CLSN value of a UR is determined by the time of the first update statement, not the start of the transaction itself. In any event, this is a best practice for minimising contention by holding onto X locks for as short a time as possible.
And that concludes this series on DB2 for z/OS locking for application developers. If you have any questions or comments, please feel free to do so. Likewise, if you want any additional locking topics covered, please ask. I can’t guarantee any answers, but I’ll do my best.
Email your questions to me here
1 In fact, DB2 doesn’t give up on lock avoidance if the latest update on the page is more recent than the start time of the oldest UOR against the partition – it uses additional techniques to check whether or not the row being accessed consists of committed data.