Using Column Sets for Sparsely Populated Columns
SQL Server has a little-known feature called Column Sets that is very useful when dealing with large numbers of sparse columns. Sparse columns are useful when you need a table with many columns where the rows are sparsely populated with data. Sparse columns can be a great alternative to using Entity-Attribute-Value design-patterns, and Key-Value stores, where the attributes or keys are known in advance.
Imagine you want to track values for a variety of types of exercise. You capture the type of exercise, the date, and a variety of other details that change with each type of exercise. So, if you’re running you might want to know how far you ran, the route, and the best segment time. If you’re using the treadmill, you might want to know the “program” you used, and the maximum speed you reached. When skiing, you want to know what difficulty level the ski run was, as well as where you skied. You could set this up in a single table, using sparse columns, like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
IF OBJECT_ID(N'dbo.ExerciseHistory', N'U') IS NOT NULL BEGIN DROP TABLE dbo.ExerciseHistory; END GO CREATE TABLE dbo.ExerciseHistory ( ExerciseHistoryID int NOT NULL CONSTRAINT ExerciseHistory_pk PRIMARY KEY CLUSTERED IDENTITY(1,1) , EntryDate datetime NOT NULL CONSTRAINT ExcerciseHistory_EntryDate_DF DEFAULT (GETDATE()) , ExerciseType varchar(9) NOT NULL CONSTRAINT ExerciseType_ck CHECK (ExerciseType IN ('run', 'treadmill', 'ski')) , RunDistance decimal(5,2) SPARSE , RunRouteName varchar(50) SPARSE , RunBestSegmentTime decimal(5,2) SPARSE , TreadmillProgram varchar(15) SPARSE , TreadmillMaxSpeed decimal(5,2) SPARSE , SkiFacilityName varchar(50) SPARSE , SkiRunNumber int SPARSE , SkiSlopeRating varchar(15) SPARSE , ExerciseDetails xml COLUMN_SET FOR ALL_SPARSE_COLUMNS ); |
Inserting three separate exercises can be accomplished two ways. First, the traditional method where we insert discrete values into each column:
1 2 3 4 5 6 7 8 9 |
INSERT INTO dbo.ExerciseHistory (ExerciseType, RunDistance, RunRouteName) VALUES ('run', 10, 'Route 1'); INSERT INTO dbo.ExerciseHistory (ExerciseType, TreadmillProgram, TreadmillMaxSpeed) VALUES ('treadmill', 'Mountain', 12); INSERT INTO dbo.ExerciseHistory (ExerciseType, SkiFacilityName, SkiRunNumber, SkiSlopeRating) VALUES ('ski', 'Whistler', 18, 'Black Diamond'); |
Using SELECT *
against a table with a COLUMN_SET
column is a little different. The column set column combines the sparse columns into a single XML column, with elements for each non-null sparse column. It looks like this:
1 2 |
SELECT * FROM dbo.ExerciseHistory eh; |
Notice many of the columns are not returned by SELECT *
, whereas normally, every column is part of the result set.
Since column set columns are directly updatable, you can insert values using this method:
1 2 3 4 |
INSERT INTO dbo.ExerciseHistory (ExerciseType, ExersizeDetails) VALUES ('run', '<RunDistance>10</RunDistance><RunRouteName>Route 1</RunRouteName>') , ('treadmill', '<TreadmillProgram>Mountain</TreadmillProgram><TreadmillMaxSpeed>12</TreadmillMaxSpeed>') , ('ski', '<SkiFacilityName>Whistler</SkiFacilityName><SkiRunNumber>18</SkiRunNumber><SkiSlopeRating>Black Diamond</SkiSlopeRating>'); |
As you can see from this query where we’re discretely choosing all columns in the table, both methods end up with data inserted the same way:
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT eh.ExerciseHistoryID , eh.ExerciseType , eh.RunDistance , eh.RunRouteName , eh.RunBestSegmentTime , eh.TreadmillProgram , eh.TreadmillMaxSpeed , eh.SkiFacilityName , eh.SkiRunNumber , eh.SkiSlopeRating FROM dbo.ExerciseHistory eh; |
This post shows how to make use of the underappreciated COLUMN SETS
feature in SQL Server. The next post in this series shows how to set up check constraints with column sets.
Check Microsoft’s documentation for column sets here.
This post is part of our series on T-SQL.