DB2 for z/OS Locking for Application Developers Part 10

This is the last article in the series on locking for developers, and is a wrap up of miscellaneous points arising from previous articles.

Row level locking

The first item under discussion is row-level locking. I mentioned previously that the design default lock size should be page-level, with row-level locking only being used where justified. This is to reduce the CPU and elapsed time overhead typically incurred by row-level locking, and especially to avoid the data sharing overhead involved. The DB2 for z/OS documentation has further information about locking in a data sharing environment, but for the purposes of this article it’s important to stress that row level locking in a data sharing environment can and typically does introduce significant overheads.

Row-level locking is sometimes beneficial for update applications or for applications which mix reads and updates, but in most cases there isn’t much – if any – benefit to be gained from using row-level locking instead of page-level. This is for several reasons. For example, in OLTP environment characterised by random access, the rows being retrieved or updated by concurrent transactions might be distributed across the data pages in such a way that there is rarely any page-level contention. Or consider a batch program reading and updating data sequentially: if there is little or no contention with other batch jobs or online transactions, then performance will be worse because more locks will be taken, with a higher CPU cost and longer elapsed times.

However, there are some cases which mandate row-level locking. Amongst the sort of applications in this category are:

Applications where the rows being accessed by concurrent transactions are located on the same page or set of pages, resulting in unacceptable page-level lock contention.
For tables which are subject to high-volume concurrent inserts and which are defined with APPEND and MEMBER CLUSTER, to prevent exponential growth in the size of the table combined with inefficient space use.

Using ROWID and RID functions

You may remember that I suggested that you should be cautious when considering using these two functions, to guard against the case where the ROWID or RID values change between retrieval of a value and its subsequent reuse to access the row again, this time directly. A typical direct row access use case is when using optimistic locking to select and update a row. The reason you need to be careful is that the values for these functions can be changed by a REORG, and in many cases probably will be. However, if you select a ROWID or RID value for a given row and then re-use it for direct row access in the same commit scope, then the use of these functions is safe.

Materialised result sets

To return to materialised result sets, another consideration for applications reading but not updating rows via a read-only cursor on a materialised result set is the fact that the application doesn’t know which version of the row it’s reading: a ‘hot’ row that is frequently updated by other transactions could be updated one or more times since the result set was materialised, depending on the elapsed time between result set materialisation and the row fetch operation. In effect, the application doesn’t know if the row data values are current or have been updated since the result set was materialised. An application reading and then updating the row using safe optimistic locking techniques is protected, but this is a more difficult problem for the read-only application where data currency is important. In an OLTP environment, the number of affected applications is probably quite small, but in some cases, this could be an important consideration. Having cautioned against access-path dependent cursors, one way to protect against this is to make sure there is at least one index that satisfies the predicates used. As always, the need for the index has to be balanced against the cost of index maintenance, if you’ve decided that the benefit of relying on the index offsets the risk of access path changes. Remember, the consideration here is data currency and not performance – this is not a recommendation to avoid index usage.

RR/RS and non-materialised result sets

There some considerations for RR/RS applications and non-materialised result sets. If the result set is materialised, then the application knows all the data row values were current as of cursor open time, and will not change while the application is running because the share locks acquired at cursor open time will be retained until the next commit. However, for a non-materialised result set which takes a long time to consume, it is quite possible that rows are updated between cursor open time and row fetch time, when the rows are evaluated and the locks are acquired. Again, this probably affects a very small set of applications.

Lock avoidance

I’ve hinted at lock avoidance in several previous articles, without going into any detail about the lock avoidance mechanism or the beneficial effects it can have on performance, especially in data sharing. Here I want to provide just enough information to help application developers understand the role they play in ensuring effective lock avoidance. Lock avoidance is a technique used by DB2 to avoid taking locks for read-only requests where possible. This does not imply an uncommitted or dirty read, but it does mean that DB2 will avoid taking a lock for a read-only request if it can guarantee that the pages where the target rows are located are logically and physically consistent. This is important for two reasons: it can improve concurrency and reduce contention by avoiding unnecessary S locks which can delay updaters; and it avoids the CPU overhead of lock acquisition and release, and more importantly, of contention management, especially in a data sharing environment.

The discussion in this article is limited to just one part of the lock avoidance mechanism, known as the commit log sequence number or CLSN. In a non-data sharing environment, DB2 tracks the time of the latest update to each page in a partition or tablespace (for segmented and simple tablespaces). It also tracks the start time of the oldest uncommitted unit of work that has updated the partition or tablespace – this is the CLSN for this page set. This updating unit of work is known as a Unit of Recovery, or UOR (sometimes called a UR). When these two values are compared, if the time of the update to the data page is earlier than the start time of the oldest UOR updating the partition or tablespace, then DB2 knows all the data on the page has been committed and can therefore employ lock avoidance¹. Otherwise, DB2 is unable to use lock avoidance for rows on this page.

The CLSN checking part of the lock avoidance mechanism operates differently in a data sharing environment, but the behaviour changes somewhat in DB2 12. To start off with the behaviour prior to DB2 12, when a partition or tablespace is group buffer pool-dependent (at least two members accessing the object, with at least one of these updating the object), then DB2 uses a global commit log sequence number (global CLSN) value, which is in fact the earliest start time of all uncommitted URs, across all members, across all page sets. This means that a single long-running UR can make lock avoidance less effective for all applications running in a data sharing group: the earlier the global CLSN value, the more likely it is that the most recent update to a page is later than the global CLSN, meaning that lock avoidance is less likely to be successful, even with the additional checking DB2 does.

DB2 12 makes a big step forward in reducing the impact of long-running URs, with each member of the data sharing group maintaining a list of approximately the 500 oldest partition/tablespace CLSN values. When DB2 12 performs CLSN lock avoidance checking this list is checked first. If the partition being accessed is in the list, DB2 uses the associated CLSN. Otherwise, the highest CLSN value in the list is used as the global CLSN value for lock avoidance checking.

Nevertheless, the lesson from this, even in DB2 12, is to keep the CLSN values as recent as possible, to maximise the opportunity for lock avoidance and to minimise contention. It’s hard to overestimate how important this is, as locking is an expensive business, especially in data sharing. There are two best practices to follow to maximise lock avoidance:

Commit as frequently as practical, to avoid long-running URs pushing the CLSN value far into the past.
Defer any updates to as late as possible in the transaction, to keep the CLSN value for the transaction as recent as possible. This is because the CLSN value of a UR is determined by the time of the first update statement, not the start of the transaction itself. In any event, this is a best practice for minimising contention by holding onto X locks for as short a time as possible.

And that concludes this series on DB2 for z/OS locking for application developers. If you have any questions or comments, please feel free to do so. Likewise, if you want any additional locking topics covered, please ask. I can’t guarantee any answers, but I’ll do my best.

Email your questions to me here

Footnotes

¹ ^{In fact, DB2 doesn’t give up on lock avoidance if the latest update on the page is more recent than the start time of the oldest UOR against the partition – it uses additional techniques to check whether or not the row being accessed consists of committed data.}

4 thoughts on “DB2 for z/OS Locking for Application Developers Part 10”

Amit Maheshwari says:

February 22, 2022 at 11:31 am

hi,

Is there a way I can create views which will lock the table only for read and not blocking any of the insert/update/delete operations on the table.

Regards,

Amit Maheshwari

Reply
Gareth Copplestone-Jones says:

February 23, 2022 at 2:39 pm

Hi Amit,

There are no fool-proof ways to do this, as even if you define the view as read-only, the key factor here is the way the application referencing the view is coded. Exactly the same rules apply for the result tables returned when selecting from a view as from a base table. To avoid any contention, use ISOLATION(UR) for the application selecting from the view, but only if your application can tolerate uncommitted updates. To minimise contention but guarantee that you’re application only sees committed data, make sure the base table has the LOCKSIZE ANY/PAGE/ROW attribute, and bind your application with ISOLATION(CS) and CURRENTDATA(NO). Please read all the caveats in the blog about the implications of CURRENTDATA(NO) and result set materialisation time.

Gareth

Reply
STALLONE says:

June 17, 2022 at 7:27 pm

Does -904 occur on a table from TRANSACTION-B while trying to insert into that table which TRANSACTION-A has locked it.

We observe a stored procedure keeps erroring with -904 during insert into a table with reason code# 00C90090 & TYPE# 00000000.

When we sought DBA’s help for diagnosis, DBA informed us that sqlcode -904 occurred due to a transaction which ran for a long time and locked the table with little or no commits

Below was the log info shared by DBA
12.27.03 STC21902 DSNI031I – DSNILKES – LOCK ESCALATION HAS OCCURRED 469

469 FOR

469 RESOURCE NAME = EXCGDB01.EXCGTS13

469 LOCK STATE = X

469 PLAN NAME : PACKAGE NAME = DISTSERV : SYSSH200

469 COLLECTION-ID = NULLID

469 STATEMENT NUMBER = 00000001

469 CORRELATION-ID = DataFlowEngi

469 CONNECTION-ID = SERVER

469 LUW-ID = AC1100B1.P322.220514162430

469 THREAD-INFO =

469 MBEXGP1:liibp362:MBEXGP1:DataFlowEngine:DYNAMIC:96237854:*:<172.17.0.

469 177.37666.220514162430

469 PARTITION-INFO = PART 1 AND 1 OTHER PARTS ESCALATED

12.27.09 STC21902 DSNJ031I – DSNJW011 WARNING – UNCOMMITTED UR 471

471 HAS WRITTEN 1000000 LOG RECORDS –

471 CORRELATION NAME = DataFlowEngi

471 CONNECTION ID = SERVER

471 LUWID = AC1100B1.P322.220514162430 = 165929

471 PLAN NAME = DISTSERV

471 AUTHID = MBEXGP1

471 END USER ID = MBEXGP1

471 TRANSACTION NAME = DataFlowEngine

471 WORKSTATION NAME = liibp362

Any thoughts on how to overcome this -904 error

Reply
1. Gareth Copplestone-Jones says:
  
  June 21, 2022 at 10:15 am
  
  Firstly, it looks like you are running IBM Integration Bus or WebSphere Message Broker to connect applications to each other, and one of the applications connecting to Db2 via WMB seems to be the cause of the problem. As you can see from the messages, this distributed transaction is running under the MBEXGP1 authorisation ID with the workstation name of libp362. Be careful with this workstation name – it might have been assigned by the application, so might not be a real workstation name. Nevertheless, this information might help you identify the application. That is the first step to resolving this.
  Secondly, we can see that this transaction has written 1,000,000 log records without committing – that means, 1,000,000 updates without committing. This doesn’t necessarily mean 1,000,000 locks, if page-level locking is used. Nevertheless, with or without lock escalation, it’s almost inevitable that this transaction will cause locking suspensions and probably timeouts for other concurrent transactions. Even worse, if TRANSACTION-A, the long-running transaction, is timed out, then it will hve to back out all its updates, causing even more disruption. It is correct that after lock escalation, the tablespace EXCGDB01.EXCGTS13 is locked by TRANSACTION-A with an exclusive lock (a tablespace X lock, or a number of partition X locks, which is equivalent if all partitions are locked). In this case, inserts into the table will fail because the inserting transaction is unable to obtain an intent lock.
  The only reliable way to avoid these -904 unavailable resource conditions, and stored procedure failures, is to modify the long-running transaction (TRANSACTION-A) so that it commits more frequently. If the only other access to the table is insert processing, then an alternative is to alter the tablespace to set LOCKMAX 0. This is a high-risk option and not recommended, as it means long-running transactions can acquire very large numbers of locks, leading to a other problems, including putting the system under stress.
  
  Reply