Datawarehouse Concepts: 2014

SQL Server 2012 Columnstore Index

Introduction

Column Store Index is a new feature in SQL Server 2012 that improves performance of data warehouse queries several folds. Unlike regular indexes or heaps, which store data in B-Tree structure (in row-wise fashion), the column store index stores data in columnar fashion and uses compression aggressively to reduce the disk I/O needed to serve the query request.

How Column Store Improves the Performance of the Query

There are 2 main reasons for the performance improvement using the Columnar Index

Columnstore Compression

Compression certainly isn’t new to SQL Server. However, because data in a columnstore index is grouped by columns, rather than by rows, data can be compressed more efficiently than with rowstore indexes. Data read from a single column is more homogenous than data read from rows, and the more similar the data, the easier it is to compress. Add to that equation a low number of distinct values and the use of dictionaries, and the advantages only grow.
The xVelocity technology also brings with it sophisticated compression algorithms that can take full advantage of the indexes’ columnar nature. And the more effectively you can compress your data, the more data you can fit on a single page and the more data you can pull into memory, both of which lead to a lower I/O costs. If you consider the nature of BI workloads, which often involve aggregating large data sets, you can see the clear advantage of the columnstore structure. The need for the increased CPU power that such aggregation requires can be offset by the I/O savings, helping to improve the performance of those gargantuan queries that were bogging down your systems before the columnstore index came along.

Batch Mode Processing

Working with highly compressed columnar indexes is certainly a great start when taking on those BI workloads. But SQL Server 2012 adds another element to the mix to improve performance even more: batch mode processing. Analytical BI queries must often scan extremely large sets of data and perform numerous complex operations, even to produce a small result set. Batch mode processing can operate on many rows at a time—a batch—rather than one row at a time, as is typical with row-mode processing, the type used for most online transactional processing (OLTP) operations.
In SQL Server 2012, when the query processor executes a query, it can use batch mode, row mode, or both. To use both, the query optimizer can create plans that include sub-trees in batch mode even if the main tree is in row mode. However, the goal is to try to get your queries to run in batch mode from beginning to end, and if that’s not possible, to get as much of the query as you can to use batch mode. Even in mix mode, you’re likely to see performance gains.

When to Use Column Store Index

Column store index has been designed to substantially accelerate common data warehouse queries, which require scanning, aggregation and filtering of large amounts of data or joining multiple tables like a star schema. With column store index, you can get interactive response time for queries against billions of rows on an economical SMP server with enough RAM to hold your frequently accessed data.

Creating a columnstore index is a parallel operation, subject to the limitations on the number of CPUs available and any restrictions set on MAXDOP setting.

You can have both row store index and column store index on the same table. Query Optimizer will decide when to use the column store index and when to use other types of indexes. The Query Optimizer will also choose when to use the new batch execution mode and when to use row execution mode during usage of column store index.

Column store index might not be suited for OLTP load for the following reasons and you might continue using row store indexes in those scenarios:

Creating a column store index makes a table read-only table and update is not easy whereas updating data is simpler in row store indexes.
Column store index might not be the ideal choice for selective queries, which touch only one (or a few) rows or queries that lookup a single row or a small range of rows. In this scenario, row store index will perform better.

Limitations of Using Column Store Index

We can create only one non-clustered column store index on a table although we can include all the columns of the table in the single index (recommended too). When we create a column store index it makes the table read only; hence it's not suited for OLTP load where updates are high but it can be used in data warehouse fact tables or on dimension tables (if it contains millions of records) where the number of columns are really big and in a normal query only 10-20% of them are required to serve the request.

To update or load the data in a table with column store index, you can switch in a partition, or disable/drop the columnstore index, update/load the data in table, and rebuild the column store index. Creating a column store index might take slightly longer than creating clustered row store index on the same set of data, as extra CPU cycles are required for compression and all. Though with column store index, you can do aggregation on the fly and hence aggregations (pre-built summary aggregates, whether they are indexed views, user-defined summary tables, or OLAP cube) are not required to be created during data load, this time saving will offset the extra time needed for creating column store index and overall data load or ETL might reduce.

A column store index can be created on a table only; creation on indexed view is not allowed
A column store index can be non-cluster only
Only one column store index is allowed per table (though you can include all the columns of the table in single index; up to 1024 columns limit applies)
A column store index cannot be a filtered index
A column store index cannot include computed or sparse columns
A column store index cannot be unique and cannot be used as primary or foreign key
A column store index definition cannot be changed using ALTER INDEX command; for the index definition change, it has to be dropped and created
A column store index cannot be created using ASC/DESC or INCLUDE keywords
A column store index on a partitioned table must be partition-aligned
A column store cannot be combined with

PAGE or ROW compression
Replication
Change Tracking
Change Data Capture
Filestream

There are certain data types that are not supported and hence any column of that type cannot be part of a column store index like binary, varbinary, text, ntext, image, varchar(max), nvarchar(max), uniqueidentifier, rowversion, timestamp, decimal and numeric data types with precision greater than 18 digits, common language runtime and xml; apart from these, the rest of the other data types are supported.

Tuesday, October 28, 2014

SQL Server 2012 Columnstore Index

Columnstore Compression

Batch Mode Processing