Aug
06
2013 Posted by Iqbal Goralwalla

Less Caffeine, More Compression

DB2 10.5 for Linux, Unix and Windows features exciting and ground breaking data management innovation for analytic processing. The addition of IBM’s BLU Acceleration technology makes it easier, cheaper and faster to analyse massive amounts of data.

 

The seven big ideas of DB2 with BLU Acceleration aim to provide extreme performance, lower operating costs and hardware optimisation. Let’s take a look in more detail:

1. Simple to Implement and Use
With BLU Acceleration, the complexity of analytic database design and tuning which involves tedious and mind boggling iterative tasks, disappears! You simply create your BLU tables (existing row-organized tables can easily be converted to BLU column-organized tables), load the data and you are all set to go. There are no partition and compression strategies to design and implement, no performance enhancing objects like indexes and aggregate tables to worry about, no tuning of memory and I/O. It’s more about what you cannot do than what you can do with BLU Acceleration, in a positive sense of course. Furthermore, BLU tables with their columnar data store and non-BLU tables co-exist in the core DB2 engine with the optimiser being aware of both tables. Both types of tables can be accessed using the same SQL and language interfaces, process model, storage, memory and utilities. You decide what type of table you want to use to best suit your application workload. This makes it very easy to create a reporting database within a single instance of DB2 from a transactional database to enable operational reporting. There is no ETL process, nor additional software or system required for operational reporting with no required transformations, integration or quality cleansing. Similarly, it becomes quite easy to create and load a BLU in-memory data mart within a single instance of DB2 from an Analytics workload for reporting.

 

2. Compression and Computer-Friendly Encoding
Multiple compression techniques are combined in DB2 10.5 with BLU Acceleration to create a near optimal compression strategy. This results in a significant reduction in disk space even compared to Adaptive Compression that was introduced on DB2 10.1. Register-friendly encoding has been introduced which dramatically improves efficiency. The encoded values are packed into bits matching the register width of the CPU. These encoded values do not need to be decompressed during evaluation. Predicates and joins work directly on encoded values. All this results in fewer I/Os, better memory utilization, and fewer CPU cycles to process. And of course this results in performance gains.

 

3. Multiply the Power of the CPU
Hardware optimisation using Parallel Vector Processing (which is a combination of Multi-core and SIMD (Single Instruction Multiple Data) parallelism) to boost performance of all data types found in Big Data. More specifically, using hardware instructions, DB2 with BLU Acceleration can apply a single instruction to many data elements simultaneously by leveraging SIMD parallelism found on the latest chips. DB2 with BLU Acceleration pays careful attention to physical cores of the server so that queries on BLU tables are automatically core-parallelised.

 

4. Core-Friendly Parallelism
In DB2 10.5 with BLU Acceleration, very close attention has been given to multi-core parallelism due to the realization that the number of CPU cores on a single server continues to increase. DB2 with BLU Acceleration is designed from the ground up to take advantage of the cores on your server and to always drive multi-core parallelism for the queries you have. Access plans on column organized tables will leverage all of the cores on the server simultaneously to deliver better analytic query performance. This is all done in shared memory. Central to this idea is to change the inflection point where you need to leverage MPP technologies so that smaller warehouses and marts that contain raw data around 10TB can fit into a single system. This results in lower costs and makes it easier to manage data marts on a single server instead of the overhead of managing a cluster or logical partitions ( continuing the simplicity theme of Idea 1).

In DB2 10.5 with BLU Acceleration, memory latency and memory access have been carefully analysed. For BLU Acceleration, main memory access is too slow. It is something to be avoided. BLU Acceleration is designed to access data on disk very rarely, access RAM only occasionally, and do the overwhelming bulk of its processing from data and instructions that reside in a CPU cache.

 

5. Column Store
DB2 with BLU Acceleration adds columnar capabilities (column-organized tables) to DB2 databases. Table data is stored column organized rather than row organized. With a columnar format, a single page stores the values of just a single column, which means that when the database engine performs I/O to retrieve data, it just performs I/O for only the columns that satisfy the query. As queries progresses through a pipeline the working set of pages is reduced. This can save a lot of resources when processing certain kinds of queries, especially analytic ones. By using a column store with encoding, DB2 is able to get an additional level of compression that leads to even more I/O minimization. In addition, recall from previous ideas that DB2 is able to perform predicate evaluation against compressed data by using its late materialization capabilities, and that even further reduces the amount of I/O and processing that needs to be performed. As the saying goes, the best I/O is no I/O!
6. Scan-Friendly Memory Caching
Effectively caching data in memory has historically been difficult for systems that have more data than memory – especially for analytic workloads. DB2 10.5 with BLU Acceleration has new algorithms that cache interesting data in RAM effectively. Data can be larger than RAM, there is no need to ensure all data fits in memory. When DB2 accesses column-organized data, it‘ll automatically use its scan-friendly memory-caching algorithm to decide which pages should stay in memory in order to minimize I/O, as opposed to using an algorithm based on LRU, which is good for OLTP and not as optimized for analytics. What makes this big idea so unique is that DB2 automatically adapts the way it operates based on the organization of the table (row-organized or column-organized) being accessed. Again, the simplicity theme we covered in Idea 1 is repeated here. The DBA doesn’t have to do anything here; there are no optimization hints to give, configuration parameters to set; it just happens automatically.

 

7. Data Skipping
With data-skipping technology, DB2 10.5 can automatically skip over the non-qualifying data because it keeps metadata that describes the minimum and maximum range of data values on large sections of data. This enables DB2 to automatically detect large sections of data that don‘t qualify for a query and to effectively ignore them. Data skipping can deliver an order of magnitude in savings across compute resources (CPU, RAM, and I/O). Again, in keeping with the Simplicity theme, there is no DBA action to define the minimum and maximum values for the sections of data values. It is truly invisible.

 

So, what are the downsides of DB2 10.5? Well, with my analytic query workload running 45x times faster with BLU Acceleration in DB2 10.5, I no longer have an excuse for my usual coffee run!

 

That aside I believe that DB2 10.5 with BLU Acceleration has is a real leap forward for DB2 LUW and I’m excited about working with customers to help them get the best out of these great new features.

 

We have recorded a podcast series with IBM’s George Baklarz covering the 7 Big Ideas of DB2 10.5 With BLU Acceleration. Listen to the full series on our website.

« | »
Have a Question?

Get in touch with our expert team and see how we can help with your IT project, call us on +44(0) 870 2411 550 or use our contact form…