Friday, July 25, 2008

Data Compression in SQL Server 2008

When we talk about compression feature in SQL Server 2008, it address two areas. Data Compression and Backup Compression. I have already discussed Backup compression here. This article is specifically discussing Data compression feature in SQL Server 2008. The idea behind Data compression is, if we can reduce the data size then the resource usage like memory (data buffer) , Storage space and Network traffic can be reduced therefore you will have better performance and manageability. But before going for Data Compression you should do proper analysis, whether this feature is appropriate for your environment or not. If the Database Server is having CPU resource crunch you must not opt Data compression because, Data Compression is more CPU oriented operation. When you have a system where there is no problem of CPU but it is more IO based, then you may go for data compression which will give you better performance.

SQL Server 2008 supports ROW and PAGE compression for both tables and indexes. Data compression can be configured for the following database objects:
• A whole table that is stored as a heap.
• A whole table that is stored as a clustered index.
• A whole non-clustered index.
• A whole indexed view.
• For partitioned tables and indexes, the compression option can be configured for each partition, and the various partitions of an object do not have to have the same compression setting.

As already mentioned there are two levels of compression, PAGE and ROW. Before going for compression of the data you must do proper estimation and planning. Use system stored procedure sp_estimate_data_compression_savings to check whether the data compression can be benefited for a particular table.

What is the difference between PAGE and ROW level compression?
(a) PAGE Level Compression• Compressing the leaf level of tables and indexes with page compression consists of three operations in the following order:
• Row compression
• Prefix compression
• Dictionary compression

(b) ROW Level Compression
• It reduces the metadata overhead that is associated with the record. This
metadata is information about columns, their lengths and offsets. In some
cases, the metadata overhead might be larger than the old storage format.
• It uses variable-length storage format for numeric types (for example
integer, decimal, and float) and the types that are based on numeric (for
example datetime and money).
• It stores fixed character strings by using variable-length format by not
storing the blank characters.

Note : You must read more about ROW and PAGE Compression in Books Online. There are few interesting points like when the PAGE compression is actually take place kind. PAGE compression is a Superset of ROW compression and system goes for page compression only when the page is filled. Till then it is a ROW level compression. Once the page is filled, The whole page is reviewed; each column is evaluated for prefix compression, and then all columns are evaluated for dictionary compression. If page compression has created enough room on the page for an additional row, the row is added, and the data is both row- and page-compressed.

How do we do data compression estimation on a table?
(a) Create a sample table Table
Use AdventureWorks
Go
Create table TestCompression (col int,comments char(8000))
Go
Insert into TestCompression select Message_id,text from sys.messages

Note : I have choosen CHAR Datatype intentionally so that I can demonstrate the estimation properly.
(b) Run the sp_estimate_data_compression_savings system stored procedure against this table
-------------------------------------------------------------------------------------
Row Compression Estimation Report

EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'ROW' ;
Go

object_name : TestCompression
schema_name :DBO
index_id :0
partition_number : 1
size_with_current_compression_setting(KB) : 704888
size_with_requested_compression_setting(KB) : 9952

sample_size_with_current_compression_setting(KB) : 40784
sample_size_with_requested_compression_setting(KB) : 576
-------------------------------------------------------------------------------------
Page Compression Estimation Report

EXEC sp_estimate_data_compression_savings 'dbo', 'TestCompression', NULL, NULL, 'Page' ;
Go
object_name : TestCompression
schema_name :DBO
index_id :0
partition_number : 1
size_with_current_compression_setting(KB) : 704888
size_with_requested_compression_setting(KB) : 9134

sample_size_with_current_compression_setting(KB) : 39664
sample_size_with_requested_compression_setting(KB) : 560

-------------------------------------------------------------------------------------
How to read this result ?
I am not going to explain, how to read the result of this stored procedure in detail, since it is clearly mentioned in Books Online. In the result the size_with_current_compression_setting(KB) : 704888 and
size_with_requested_compression_setting(KB) : 9134
will give you fair amount of idea how compression is going to benefits. See the difference in space usage shown in before compression and after the compression 704888: 9134. Without any doubt the compression is going to save lot of space in this table.

Why Page Compression showing more compression of data than ROW Compression?
If you see the result of PAGE and ROW level compression, you can see that page level compression save more space. Why so? Because Page compression do ROW compression as well as Dictionary and Prefix Compression. So PAGE compression is a superset of ROW Compression. So the point here is, if we want highest level of compression, go for page at the cost of more CPU cycle.

Note : One thing I have noticed in the system stored procedure is, for every run the result is different. There is few KBs size different in each run. I have reported this in Microsoft Connect.

Estimation Report in Query Analyser


Now we know the compression feature will be useful for this table. How do we enable compression for this table? Here we go…








There are few options available to enable Data Compress on a table.
(a) From Management Studio.
(b) While creating in the CREATE Table Statement

CREATE TABLE [dbo].[TestCompression](
[col] [int] NULL,
[comments] [char](8000) NULL
) ON [PRIMARY]
with( DATA_COMPRESSION = PAGE )

(c) In ALTER Table Statement

ALTER TABLE [TestCompression]
REBUILD WITH (DATA_COMPRESSION = PAGE);

Enabling Data Compression from Management Studio
Screen 1


Screen 2


Screen 3


Screen 4



Screen 5


Screen 6
















We have successfully enabled data compression on this table. How do we confirm this. You can query sys.partitions. Here we go….
Select OBJECT_NAME(object_id),data_compression_desc,* From sys.partitions where data_compression<>0.



Screen 7






Now we have enabled compression and let us check what happens if we run the same system stored procedure to do the estimation. See the result. No scope for improvement further because it is already compressed















Few more informations from Books online.
• The compression setting of a table is not automatically applied to its nonclustered indexes.
• Each index must be set individually.
• Tables and indexes can be compressed when they are created by using the CREATE TABLE and CREATE INDEX statements. To change the compression state of a table, index, or partition, use the ALTER TABLE or ALTER INDEX statements.
• Compression is not available for system tables.
• Apply compression only after exploring options like Data defragmentation etc.

Summary
Having explained the whole topic I kept one very important point for the summary section. That is, this feature will be only available in Enterprise Edition, Developer Edition and Evaluation Edition. So the system stored procedures and options in GUI will only be available in the above mentioned editions. In IO bound application certainly this feature will be useful. I keep my finger crossed for the feedback from the community who have successfully implemented and experience this feature.

No comments: