Triton Db2 Geek

Confessions of a DB2 geek

IBM Gold Consultant Program and IBM Premier business Partner

Confession of the Month

JCB V’s Mainframe – Seconds out round 1!

September 14th, 2010 - by

A few years ago when water-cooled mainframes were still the done thing, a JCB working on a nearby building site accidentally hit the water main used by the main IT centre for a large DB2 for z/OS customer. With no cooling, the mainframe rapidly shut itself down, taking out the production DB2 system in the process. Unfortunately, at the time DB2 was forced to shut down a long-running DB2 batch process was executing in the Production environment. The process had been written by a very inexperienced programmer, who was attempting to convert an IMS database to DB2 by reading each IMS database segment and inserting the data into some heavily-indexed DB2 tables. The IMS database was BIG, he’d coded no commits or checkpoints, and the batch process had been running for just over 24 hours….


When the local water company fixed the burst pipe and the mainframe was restarted, DB2 immediately began to back-out the 24 hours worth of changes. Unfortunately, a junior operator decided that DB2 had “hung” as it wasn’t starting properly, so after an hour or so he decided to cancel the IRLM address space to force DB2 out. Of course, when DB2 restarted the first thing it had to do was to backout of the previous backout attempt, before it could begin backing out of the original 24 hour batch process again!


This was repeated several times until we were pulled into the situation to assist. We advised the operators to leave DB2 to fully backout, and sat and watched as tape after tape of archive logs was requested. After another 30 hours of backout processing, DB2 reached the oldest archive log in the BSDS (which only held 1,000 logs back then) and was unable to continue with the backout process as it didn’t know which logs to request! This was somewhat frustrating, as the original batch process had “only” used about 800 logs so a normal backout would have been possible if that junior operator hadn’t cancelled the process half a dozen times before calling us…


At that point, we were left with no option but to perform a conditional restart of the production DB2 system to truncate the logs and “cold start” it with no backout processing. Several databases (including the one being populated by the original 24-hour batch process) had to be dropped and recovered to the last known consistent image copies, with a lot of associated business pain.


The moral of this story is never to skimp on proper training for programmers, operators or JCB drivers!


« »

Tag Archives