January 31st, 2017 - by Mark Gillis
Ever heard this expression? One of my least favourites. Flip it over and express it the other way around:
Wait until it breaks and then try and fix it
I don’t think that is something you seriously want to consider in a Production environment and yet it is something we come across as database support people all the time. A database will perform perfectly well without much intervention, especially in these days of Self-Tuning Memory Management and with (if you like that sort of thing) automated utilities like Backups, RUNSTATS and REORGS chugging away in the background. Automatic Tablespaces grow as required and little intervention is needed to recover fragmented space.
However, that isn’t a good reason to suppose that your critical database will continue to run without any care and attention at all. There are occasions when a critical error will stop the database dead, but there are also plenty of insidious performance issues that will gradually degrade throughput until the database, to all intents and purposes, stops responding altogether. For example:
- Lock escalation will continue until the database manager decides it has run out of resource and then will take a table lock and prevent anyone else using that table.
- Index fragmentation will mean that, one day, the optimizer realizes that it’s not worth using and will revert to a table scan.
- Some dynamic SQL or even a rebound package might get an access path that degrades it from lightning fast to glacially slow.
You can spot these things coming with some pro-active monitoring and, even without commercial software that allows detailed real-time and historical analysis, a few scripts of your own can show a database trending towards trouble.
I love an analogy and the one I often use for database work is your own family car. The engine sounds OK; why put any of that expensive oil in? It seems to be hanging in there on the corners; do I really need to fork out for new tyres? Servicing costs are horrendous; it’s running OK today, why shouldn’t it be tomorrow? Or the next day? Or in 6 weeks’ time, when I’m driving the family across Europe?
I admit I’m a pessimist; but if I see a database which hasn’t had a serious outage for a long period of time, I tend to assume we’re due one. I’d do a bit of investigation to see if there’s anything that might break; do a bit of health check; give it the once over. Because if it ain’t broke now, you don’t want to be trying to sort it out at 4a.m. in the middle of a crucial time for the business, with everyone from the CEO down looking over your shoulder.
If it ain’t broke, do a little proactive work to make sure it stays that way. A straightforward healthcheck, or annual service to go back to the car analogy, is a simple way to make sure you pick up any minor issues before they turn into major ones.