Triton Db2 Geek

Confessions of a DB2 geek

IBM Gold Consultant Program and IBM Premier business Partner

RUNSTATS versus Actuals

June 10th, 2015 - by

Now and again we get a slightly panicky call from one of our clients as they attempt to execute some part of their application and it takes far longer to run than it should, or than it did the last time it was executed. Our first question is usually “have you run REORGs and RUNSTATS recently ?” and the answer is often “yes, of course we have”. The trouble is, occasionally, that there has been considerable activity since these utilities were executed. We’re often told that it’s only been 24 hours (or 12 or even 2) since they were run but, on closer examination, it turns out that a lot has happened in that elapsed time.

Now the optimizer is a very clever piece of software (in fact, I reckon it’s approaching a level of complexity that could be interpreted as Artificial Intelligence) but it can only work with what it’s given. If it’s asked to get data from a table and the values stored in the system catalogs indicate a very low volume of data, it’ll define an access path accordingly. If you’ve loaded hundreds of thousands, or millions of rows of data since the last RUNSTATS, it won’t have any visibility of that and will end up choosing a sub-optimal access path.

It’s almost better if RUNSTATS hasn’t been run at all; if the optimizer is presented with catalog data where the values are all -1 and the STATS_TIME is still NULL, it’ll know that no RUNSTATS have been completed and it’ll make an educated guess about the access path that’s needed; and often a pretty good guess. But if it’s presented with a list of values and it’ll trust these values to be reasonably accurate.

So, rather than asking the user if they’ve run RUNSTATS, maybe give them the option of comparing the actual figures with what RUNSTATS found when it last ran.

The SQL to do this is basically in 2 parts

1. You have some SQL executing against SYSCAT.TABLES to write a new script, which will run against each table in the schema you specify, but which has the existing stats embedded in it

 

RUNSTATS verus Actuals Image 1

 

That looks a bit horrendous but what you actually get out of it is a line for each table in the schema that looks like this (just showing the rows for the first couple of tables here):

 

RUNSTATS versus Actuals Image 2

 

2. You then run this script and it will return the embedded RUNSTATS data plus the counts from the tables you nominated

 

RUNSTATS versus Actuals Image 3

 

In this case the numbers match, but you can see how this would highlight any discrepancy. You could even suck the output data into an Excel spreadsheet or similar and do something fancy to determine by what percentage the RUNSTATS values are ‘out’ and highlight anything that was + or – 10% or so.

It’s not rocket science but it can be a handy way of quickly getting the data to show to the users when this sort of query arises.

And it might keep the T-800s off your back when the Optimizer becomes self-aware and turns on humanity.

If you would like a copy of a Stored Procedure that will provide this output, then please contact Mark Gillis directly.

 

Comments

  • Raanon says:

    Nice post! I was a little confused, though, because “Actuals” in DB2 usually refer to something else (see Explain section actuals).
    Interesting use of the export utility, although I would have just used “db2 -x” on the command line to remove the output headers. Also, you have a dangling ‘UNION all’ line.
    This is how I would do it… (also ugly, but it’s done all in one step)
    db2 -x “VALUES
    ‘with tmp (stats_date, card, count, table) AS
    (values (repeat( ” ”, 8), 0, 0, repeat( ” ”, 20))’
    UNION ALL
    SELECT ‘union all
    select ”’ || VALUE( VARCHAR( DATE( stats_time), iso), CHAR(‘ ‘,8)) || ”’, ‘
    || card || ‘, count, ”’
    || RTRIM( tabschema) || ‘.’ || RTRIM( tabname)
    || ”’ from ‘ || RTRIM( tabschema) || ‘.’ || RTRIM( tabname)
    FROM
    syscat.tables
    WHERE
    type = ‘T’
    AND
    tabschema IN (‘REUTLING’)
    UNION ALL
    VALUES ‘) select * from tmp order by table;’
    ORDER BY 1 DESC” | db2 +p -t

    • Mark Gillis Mark Gillis says:

      Thanks for the replies Raanon. Yours is an AIX solution I think; mine was running as a Windows script. But yours does have the merit of being just one execution, which is nice. I got round that issue for my environment by encapsulating the whole thing in a Stored Proc so it can be done with a single call.
      Sorry about the confusion in the title; “RUNSTATS: Current figures versus catalog stats” might have been clearer. And the hanging UNION ALL is only because I just wanted to show the first few rows of the generated SQL. If I printed the whole thing, it would blow our blog limit.
      Cheers
      Mark

  • Raanon says:

    darn! I see that the quotes got messed up after posting… there should be no back-quotes (single or double) and the double-quotes should be replaces with two single-quotes, except for the first and last ones.

« »

Tag Archives