April 22nd, 2013 - by Iqbal Goralwalla
The IBM BLU Acceleration technology, announced on April 3, 2013, features new innovations from the IBM Research and Development Labs that address the common Big Data issues of size, development time, and response times. This exciting, ground breaking data management innovation for analytic processing is due to be first available in the new DB2 10.5 release for LUW. BLU Acceleration is new technology for analytic queries in DB2 LUW. It introduces columnar capabilities to DB2 databases where table data is stored column-organised rather than row-organised. Its fast, simple to implement and use, low in operating costs, and above all its all embedded in the core DB2 engine. Let me expound on this:
1. Extreme Compression – this addresses the Big Data issue of size.
- Multiple compression techniques are combined in DB2 10.5 with BLU Acceleration to create a near optimal compression strategy. This results in a significant reduction in disk space even compared to Adaptive Compression that was introduced on DB2 10.1.
- Register-friendly encoding has been introduced which dramatically improves efficiency. The encoded values are packed into bits matching the register width of the CPU. These encoded values do not need to be decompressed during evaluation. Predicates and joins work directly on encoded values. All this results in fewer I/Os, better memory utilization, and fewer CPU cycles to process. And of course this results in performance gains.
2. Fast reporting – this addresses the Big Data issue of response times. DB2 with BLU Acceleration comes with:
- Dynamic in-memory columnar data store. This results in:
- Minimal I/O being performed only on the columns and values that match the query. As the query progresses, the working set of pages is reduced. As the saying goes, the best I/O is no I/O!
- Work is performed directly on columns. Rows are not materialized until absolutely necessary to build the result set.
- Columnar data is kept compressed in memory which means more data can fit in memory.
- Data can be larger than the available RAM. There is no requirement (unlike the competition) to ensure all data fits in memory. The technology intelligently moves data from storage to memory as needed, thereby delivering in-memory performance without the limitations of an in-memory only system.
- Smart data skipping that eliminates unnecessary processing of irrelevant data. DB2 with BLU Acceleration automatically detects large sections of data that do not qualify for a query and can be safely ignored. This results in order of magnitude savings in I/O, RAM, and CPU.
- Hardware optimisation using Parallel Vector Processing (which is a combination of Multi-core and SIMD (Single Instruction Multiple Data) parallelism) to boost performance of all data types found in Big Data. More specifically,
- Using hardware instructions, DB2 with BLU Acceleration can apply a single instruction to many data elements simultaneously by leveraging SIMD parallelism found on the latest chips.
- DB2 with BLU Acceleration pays careful attention to physical cores of the server so that queries on BLU tables are automatically core-parallelised.
With all the above, you are on course to getting super-fast response times to your analytic queries. Yes, you longer need to go for that cup of coffee while you wait for your queries to complete!
3. Simplicity – this addresses the Big Data issue of development time and leads to lower operating cost. With BLU Acceleration, the complexity of analytic database design and tuning which involves tedious and mind boggling iterative tasks, disappears! You simply create your BLU tables (existing row-organized tables can easily be converted to BLU column-organized tables), load the data and you are all set to go. Now, hold tight to your chair when you hear this:
- No partition and compression strategies to design and implement.
- No performance enhancing objects like indexes and aggregate tables to worry about. To spell it out:
- No indexes
- No MDC
- No MQTs
- No Materialized Views
- No tuning of memory and I/O
- No statistical views
- No optimizer hints
- No Statistics collection (it’s automated)
- No REORGs (it’s automated)
All of the above make it very easy to create a reporting database within a single instance of DB2 from a transactional database to enable operational reporting. There is no ETL process, nor additional software or system required for operational reporting with no required transformations, integration or quality cleansing. Similarly, it becomes quite easy to create and load a BLU in-memory data mart within a single instance of DB2 from an Analytics workload for reporting.
4. Seamless Integration – BLU tables with their columnar data store and non-BLU tables co-exist in the core DB2 engine with the optimiser being aware of both tables. Both types of tables can be accessed using the same SQL and language interfaces, process model, storage, memory and utilities. This follows the trend of pureXML when the XML data store was embedded in the DB2 engine, with XML tables and non- XML tables co-existing in harmony. You decide what type of table you want to use to best suit your application workload.
All the above indeed qualify for ground breaking data management innovation, but the very fact that BLU Acceleration is embedded in the core DB2 engine, known for its reliability and stability seals it for me. Do I sound excited?
- DB2 LUW