The SQL Server 11.0 release (code named “Denali”) introduces a new data warehouse query acceleration feature based on a new type of index called the columnstore. Columnstore indexing is officially announced in SQL Server 2012. It is working based on xVelocity memory optimised technology and it improves data warehouse query performance significantly. Due to the fact that data warehousing, decision support systems and business intelligence applications are growing very quickly, we need to be able to read and process very large data sets quickly and accurately into useful information and knowledge. Columnstore index technology is especially appropriate for data warehousing data sets. It improves the common data warehousing queries’ performance significantly.
Columnstore index is storing data for each column and joins all the columns to complete the index. There are many advantages of using columnstore indexing in comparison with the traditional rowstore indexing. The term “rowstore” is using to describe either a heap or a B-tree that contains multiple rows per page. As columnstore indexing is pretty new, it has some restrictions and limitations. So, you should be aware of those limitations when you are planning to implement columnstore index in your data warehouse. In this article we will discuss about the below topics:
§ How columnstore index works?
§ Benefits of using columnstore indexes
§ Restrictions of columnstore indexes
§ How to create a SQL Server columnstore index?
§ Planning for creating columnstore index
§ Choosing columns for a columnstore index
How it works?
While rowstore indexing stores multiple rows per page, columnstore index stores each column in disk pages separately. The following image illustrates the difference between columnstore and rowstore indexing from storage perspective:
As you can see C1, C2…C6 are stored in different pages, so:
· only the columns needed in a query are fetched from the disk
· due to the redundancy of data within a column it is easier for data compression
· because of the data compression and frequently accessed parts of commonly used columns are still remain in memory, hence, buffer hit rate is improved.
As discussed, columnstore is working based on xVelocity technology that is in common with SQL Server Analysis Services Tabular Model as well as PowerPivot. Actually, it doesn’t mean that columnstore indexes have to fit in memory; however, they can use available server memory effectively to move portions of columns in and out of memory on demand. As columnstore indexes store all data for separate columns in separate pages, using columnstore indexes improves I/O scan performance significantly.
Benefits of using columnstore indexes
There are several benefits of using columnstore indexes in comparison with rowstore indexes as below:
· As discussed, only the columns needed in a query are fetched from the disk, so, the data warehouse query performance is way faster for common data warehouse queries
· As data is highly compressed using xVelocity technology the disk space reduces effectively
· As the pages are significantly compressed, the pages containing the most frequently accessed columns remain in memory
· As batch mode processing that is an advanced query execution technology that processes chunks of columns is used, the CPU usage is reduced.
Restrictions and limitations of columnstore indexes
Columnstore indexing is a new technology, so, you should be aware of its restrictions if you are planning to implement columnstore indexes. The following restrictions should be considered:
· Columnstore index is available only in SQL Server Enterprise, Developer and Evaluation editions, so, you will face to the following error message if you want to use columnstore index in other editions of SQL Server 2012: “CREATE INDEX statement failed because a columnstore index cannot be created in this edition of SQL Server.”
· Tables containing columnstore indexes cannot be updated. This restriction might be removed in the next releases of SQL Server. Now, how to insert, update or delete data in a table that contains a columnstore index? There are three solutions for this purpose; however, it seems that the first solution is more straightforward than the others.
1. Drop the columnstore index, perform any INSERT, UPDATE, DELETE or MERGE operations, and recreate the columnstore index.
2. Partition the table and switch partitions. For a bulk insert:
§ insert data into a staging table
§ build a columnstore index on the staging table
§ switch the staging table into an empty partition
For other updates:
§ switch a partition out of the main table into a staging table
§ disable or drop the columnstore index on the staging table
§ perform the update operations
§ rebuild or re-create the columnstore index on the staging table
§ switch the staging table back into the main table.
3. Place static data into a main table with a columnstore index, and put new data and recent data likely to change, into a separate table with the same schema that does not have a columnstore index. Apply updates to the table with the most recent data. To query the data, rewrite the query as two queries, one against each table, and then combine the two result sets with UNION ALL. The sub-query against the large main table will benefit from the columnstore index. If the updateable table is much smaller, the lack of the columnstore index will have less effect on performance. While it is also possible to query a view that is the UNION ALL of the two tables, you may not see a clear performance advantage. The performance will depend on the query plan, which will depend on the query, the data, and cardinality estimations. The advantage of using a view is that an INSTEAD OF trigger on the view can divert updates to the table that does not have a columnstore index and the view mechanism would be transparent to the user and to applications. If you use either of these approaches with UNION ALL, test the performance on typical queries and decide whether the convenience of using this approach outweighs any loss of performance benefit.
Note: As we discussed, the tables containing columnstore index, cannot be updated. But, it doesn’t seem to be a good idea to use columnstore to make a read-only table. Because, columnstore index is not designed for this particular purpose and it is possible that Microsoft removes this restriction in the next releases of SQL Server.
· Columnstore indexes are not supporting more than 1024 columns
· Only nonclustered columnstore indexes are available (there is no clustered columnstore index)
· A columnstore index cannot be a unique index
· Creating columnstore indexes on a view or indexed view is not supported
· Columnstore indexes cannot include a sparse column (an ordinary column that has an optimized storage for null values)
· Columnstore indexes cannot act as primary keys or foreign keys (remember that a columnstore index cannot be a unique index)
· Columnstore indexes cannot be modified using “ALTER INDEX” statement. However, the “ALTER INDEX” statement can be used to disable and rebuild a columnstore index. So the only way to modify a columnstore index is to drop and recreate the columnstore index.
· The keyword “INCLUDE” is not supported to create a columnstore index
· Sorting is not allowed in a columnstore index, so, “ASC” and “DESC” keywords are not supported. Actually, columnstore indexes are ordered according to the compression algorithm. Values selected from a columnstore index might be sorted by the search algorithm, but you must use the ORDER BY clause to guarantee sorting of a result set.
· A columnstore index does not use or even keep statistics as rowstore index does
· A columnstore index does not support FILESTREAM attribute, so, only the columns in the table that are not used in the columnstore index can contain the FILESTREAM attribute.
· As column store index is optimized for in-memory processing, so, server memory limitations should be considered
· Columnstore indexes do not support SEEK, so, if the table hint FORCESEEK is used, the optimizer will not consider the columnstore index.
· Columnstore indexes cannot be combined with page and row compression, as columnstore indexes are already compressed in a different format.
· Replication is not supported for tables containing columnstore index
· Change tracking and change data capture are not supported
· Filestream is not supported
· The following data types cannot be included in a columnstore index:
1. binary and varbinary
2. ntext , text, and image
3. varchar(max) and nvarchar(max)
4. uniqueidentifier
5. rowversion (and timestamp)
6. sql_variant
7. decimal (and numeric) with precision greater than 18 digits
8. datetimeoffset with scale greater than 2
9. CLR types (hierarchyid and spatial types)
10. xml
How to create a SQL Server columnstore index?
Creating a columnstore index is just like creating any other index. Generally, there are two ways to create a columnstore index, creating index using T-SQL statements or using SSMS (SQL Server Management Studio).
Creating a columnstore index using T-SQL
In a query editor window execute the following statement:
CREATE NONCLUSTERED COLUMNSTORE INDEX IndexName
ON TableName (Column1, Column2, …)
Creating a columnstore index using SSMS
Open SQL Server Management Studio (SSMS) and connect to a SQL Server database engine. Remember that columnstore index is available just in SQL Server 201 Enterprise Edition.
1. From “Object Explorer”-> expand the instance-> expand the databases-> expand the database-> expand the table-> right click on “Indexes”-> New Index-> Non-Clustered Columnstore Index
2. In “New Index” window-> Index Name (type a name)-> Add-> select the column-> OK-> OK
Now the columnstore index is created and you can see it in the “Indexes” in object explorer.
Planning for creating columnstore index
As columnstore index is a new technology, it has many limitations and restrictions. Although all of the columnstore index restrictions should be considered, one of the most general and important restrictions of columnstore index is that it is NOT available in all versions of SQL Server 2012. So, it is really important to know what version of SQL Server is going to be used in production environment. If your organisation is not going to use SQL Server 2012 Enterprise edition, you cannot use columnstore index at all. So, you have to plan to create rowstore indexes on your data warehouse.
Due to the fact that the indexing is really related to the queries, it should be investigated in a case by case basis. Although columnstore indexing is improving the query performance, however, in some cases it will cause poorer query performance.
Choosing columns for a columnstore index
Some of the performance benefit of a columnstore index is derived from the compression techniques that reduce the number of data pages that must be read and manipulated to process the query. Compression works best on character or numeric columns that have large amounts of duplicated values. For example, dimension tables might have columns for postal codes, cities, and sales regions. If many postal codes are located in each city, and if many cities are located in each sales region, then the sales region column would be the most compressed, the city column would have somewhat less compression, and the postal code would have the least compression. Although all columns are good candidates for a columnstore index, adding the sales region code column to the columnstore index will achieve the greatest benefit from columnstore compression, and the postal code will achieve the least.
References: SQL Server 2012 Books Online, SQL Server Technical Article: Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0; November 2010
Discover more from BI Insight
Subscribe to get the latest posts sent to your email.