Five Days in the labs – Part 3. DB2 pureScale Insert Load & Replication

By Clair Ross

Following on from James Gill’s recent blog post about the Coupling Facility today I’m sharing with you our experiences of working with Insert Load & Replication in DB2 pureScale. 

In order to test the load capacity of the IBM DB2 pureScale machine we needed some test data and lots of it!  Being somewhat familiar with the capabilities of the CF on z/OS I knew that a few thousand records would not be enough.

After some checking about, I decided to “stand on the shoulders of giants” and use the EMPLOYEE table from the DB2 SAMPLE database.  Simple, elegant and to the point.  The only problem was that with the current “out of the box” table definition, there would only be 999,999 records that could be produced.

Column name Schema  Data type name Length Scale Nulls
EMPNO   SYSIBM CHARACTER  6  0 No
FIRSTNME                                                 SYSIBM VARCHAR      12   0 No
MIDINIT                                                 SYSIBM CHARACTER  1 0 YES
LASTNAME                                            SYSIBM  VARCHAR  15 0 NO
WORKDEPT              SYSIBM CHARACTER           3 0 YES
PHONENO                      SYSIBM     CHARACTER    4 0 YES
HIREDATE                                               SYSIBM DATE  4 0 YES
JOB                                              SYSIBM   CHARACTER   8 0 YES
GENDER SYSIBM  CHARACTER     1 0 YES
BIRTHDATE                                       SYSIBM DATE    4 0 YES
SALARY                                           SYSIBM DECIMAL    9 2 YES
BONUS                                      SYSIBM DECIMAL  9 2 YES
COMM                                             SYSIBM     DECIMAL    9 2 YES

14 record(s) selected.

With a little transformation, we changed the EMPNO to be CHAR(12).  This would allow for the increase of data.  (a possible 999,999,999,999 records)

Column name Schema  Data type name Length Scale Nulls
EMPNO                                SYSIBM  CHARACTER     12 0 NO
FIRSTNME                                      SYSIBM VARCHAR    12 0 NO
MIDINIT                           SYSIBM    CHARACTER        1 0 YES
LASTNAME   SYSIBM  VARCHAR  15 0 NO
WORKDEPT           SYSIBM   CHARACTER    3 0 YES
PHONENO  SYSIBM  CHARACTER   4 0 YES
HIREDATE       SYSIBM   DATE       4 0 YES
JOB      SYSIBM  CHARACTER     8 0 YES
EDLEVEL   SYSIBM   SMALLINT 2 0 NO
GENDER SYSIBM  CHARACTER     1 0 YES
BIRTHDATE  SYSIBM   DATE       4 0 YES
SALARY     SYSIBM  DECIMAL         9 2 YES
BONUS     SYSIBM   DECIMAL    9 2 YES
COMM    SYSIBM  DECIMAL 9 2 YES

14 record(s) selected. 

We exported the data into a text delimited file (all 41 records), created a table “LIKE” the employee table, then altered the definition.

I wrote a Q&D Perl script to iterate through the delimited data file and reassign the new EMPNO value.  After a little testing, I just started the program and then turned my attention to items.

After a period of time, I checked the progress and we had approximately 8.5 million records.  I then set up a db2batch script to load the data from the single file.

Other loading strategies attempted were made by splitting the 8.5 million records into multiple files and loading from a single member, then from multiple members.   Autocommit was initially on for the first run, but was switched off and commits were performed at every 10K records.  For the multiple member loads the commit count was at every 500 records.

After the initial load, James suggested that the data file be partitioned three ways for the three data members and to test for contention on the single table.

This was set up quickly and we were good to go.

  • Share/Bookmark

No Comments | Filed under Availability & disaster recovery, Clair Ross, DB2, DB2 LUW, Five days in the labs, db2 pureScale, pureScale

Five Days in the labs – Part 2. DB2 pureScale Coupling Facility

By James Gill

We were lucky enough to get an opportunity to spend a week in the IBM Boeblingen Labs in Germany, to get some hands on experience with DB2 9.8 – pureScale – or Data Sharing on mid-range. 

The platform that we were working with was a five node configuration – two coupling facilities (CFs) and three DB2 member nodes. This was all implemented in three partitioned pSeries p550 servers, with 1TB of disk supporting it. 

On z/OS, the performance and behaviour of the CFs in a Data Sharing Group (DSG) can have an enormous impact on the overall viability of the solution – both in terms of availability and performance. 

In DB2 for z/OS, the CF configuration works as a client-server model. Members in the DSG request actions from the CF, which manages the structures, as well as the client requests.

These requests are delivered to the CF by scheduling them through XES, which queues them for delivery to the CF over XCF. The CF receives the request and can potentially queue them depending on its current workload. The request is dispatched, the answer is resolved and the response is returned to the requester through the XCF transport and XES interface. 

The model is different with pureScale, where the CF is used as a remote data cache – the intelligence being retained in the group members. 

On DB2 pureScale, the CF structures are accessed using the Infiniband (IB) remote direct memory access (RDMA) protocol. This allows a requester on one box to directly interact with a preconfigured data area on another box. Further, the protocol and data access is all managed by the IB cards, without having to interrupt the CPUs to complete any of the operations. This is extremely efficient, especially when coupled with the low latency of IB. So whilst the CFs are presented as simple data areas, the performance of the member interactions is limited only by the capacity in the IB network and horsepower in the IB cards required to access the data areas.

CF implementation in DB2 pureScale has been achieved with three processes:

  • ca-server
  • ca-mgmnt-lwd
  • ca-wdog

These are not currently covered in the DB2 pureScale documentation, but it seems reasonable that one owns the CF structures, one provides restart monitoring and the other provides operational information and management capabilities to the member nodes.

Assessing the performance of the CF is tricky, as the IB network performance is difficult to directly diagnose. As the network stack is not directly involved in the RDMA conversation, tools like netstat do not provide any insight. Further, it does not currently seem possible to detect queuing depths for RDMA requests in the CF IB card.

Having said that, there is a wealth of information available relating to the CF structures themselves and usage levels through enhancements to the db2pd tool. The following are example commands that we used to understand the impact of our workload on the CFs:

db2pd –cfinfo

db2pd –db sample –cfinfo gbp

db2pd –db sample –cfinfo lock

db2pd –db sample –cfinfo lockv

db2pd –db sample –cfinfo list

db2pd –db sample –cfinfo sca

db2pd –db sample –cfinfo 128 perf

We’re looking forwards to the documentation update so that some of the information produced will make more sense!

Note that a lot of the basic information returned by these commands is also available in the SYSIBMADM views implemented in DB2 pureScale. More on these in the following blogs.

We were very impressed by the performance and resilience of DB2 pureScale whilst working in the Labs. These are the main focus of the next two blogs.

  • Share/Bookmark

1 Comment | Filed under Availability & disaster recovery, DB2, Five days in the labs, James Gill, capacity planning, db2 pureScale, pureScale

Five Days in the Labs – Part 1

Last month we were delighted to be given the opportunity to visit IBM’s research and development centre in Boeblingen, Germany.  Before I get on to what exactly we were doing there I’d like to share a little information about the centre itself as it’s quite a place!  Opened in 1953, Boeblingen is host to a whopping 2000 IT specialists, electrical engineers and physicists working on over 40 projects!  Their focus is on the development of next generation microprocessors, mainframes and supercomputers and enterprise software to control business processes.  In addition, Boeblingen is one of the largest LINUX and SAP integration centres within IBM.  So we were rightly excited about the visit. 

The reason for the trip was to carry out our own research and development work on IBM’s newest technology release – IBM DB2 pureScale.  Ever since hearing the first announcement last year, our team of DB2 experts have been itching to get their hands on the software and see what it can do.  The most exciting thing about this trip was that we were to be the first IBM Business Partners in Europe to get a look at DB2 pureScale in action – what a fantastic opportunity!

So, three of our DB2 experts set off for five days in the labs.  After a warm welcome from the pureScale team in Boeblingen, it was soon down to business.  As an R&D team the brief was to gain practical experience in the installation, operation and use of a DB2 pureScale environment, as well as testing DB2 pureScale for scalability and resilience.

Meet the team

Iqbal Goralwalla – Head of DB2 Midrange Solutions
Specialist DB2 pureScale subject – DB2 Self-Tuning Memory Manager (STMM), Bufferpools and Workload Balancing

James Gill – DB2 for z/OS and data sharing expert
Specialist DB2 pureScale subject – Coupling Facility

Clair Ross – DB2 Midrange expert
Specialist DB2 pureScale subject – Insert Load & Replication

Over the next few weeks the team will be blogging about their specialist subjects and what they have learned during their five days in the labs.

  • Share/Bookmark

2 Comments | Filed under Availability & disaster recovery, Clair Ross, DB2, Five days in the labs, Iqbal Goralwalla, James Gill, db2 pureScale, pureScale