top of page
  • Writer's pictureCathy Basil-Jones

The Case for Partitioning Switching in Data Warehouse Workloads

Updated: Oct 19, 2023

Partition switching allows you to move large volumes of data quickly. If you are working in a data warehousing environment with multimillion row+ tables and require high availability of data to end users, along with frequent complex recalculations or complete refreshes of your dataset, partition switching might be for you. Partition switching is also ideal for efficient archival of data.

How it works

Instead of removing and reloading data into a user facing table, data is loaded into an identical non-user facing table in the background. Once the data has been loaded, the data is ‘switched’ into the user facing table using partition switching. This is extremely quick and appears seamless to the end user.


Requirements

You will need two tables – one accessible by your end users and the other for staging data. Both tables need to be partitioned and identical by design. Note there are certain limitations on partitioning and Microsoft’s documentation on the subject can be found here.


Example

We are going to create a simple sales fact table (dbo.SalesOrder) in our data warehouse and an identical staging table (staging.SalesOrder). In this example we are going to simulate an ETL that requires all data to be reloaded.


1. First let’s create our partition function and scheme. We are going to partition our data by the date our sale took place (which in our case is stored as an integer in our sales table). Because we aren’t a particularly big business, we are going to set up our partitions into yearly segments.


-- Create Partition Function 
IF (SELECT TOP 1 1 FROM sys.partition_functions WHERE name = 'pf_int_date') IS NULL 
BEGIN 
CREATE PARTITION FUNCTION [pf_int_date] (INT)   
AS RANGE RIGHT FOR VALUES (20100101, 20110101, 20120101, 20130101, 20140101, 20150101, 20160101, 20170101, 20180101, 20190101, 
20200101, 20210101, 20220101, 20230101, 20240101, 20250101, 20260101, 20270101, 20280101, 20290101) 
END 
 
-- Create Partition Scheme 
IF (SELECT TOP 1 1 FROM sys.partition_schemes WHERE name = 'ps_int_date') IS NULL 
BEGIN 
CREATE PARTITION SCHEME [ps_int_date]   
AS PARTITION [pf_int_date]   
ALL TO ([PRIMARY]) 
END  

2. Next let’s create our two identical partitioned tables

-- Create dbo.SalesOrder 
IF (SELECT TOP 1 1 FROM sys.tables AS s WHERE name = 'SalesOrder' AND SCHEMA_NAME(schema_id) = 'dbo') IS NULL 
BEGIN 
CREATE TABLE [dbo].[SalesOrder]( 
[OrderDateKey] [int] NOT NULL, 
[ProductKey] [int] NOT NULL, 
[CustomerKey] [int] NOT NULL, 
[SalesOrderNumber] [nvarchar](20) NOT NULL, 
[SalesOrderLineNumber] [tinyint] NOT NULL, 
[OrderQuantity] [smallint] NOT NULL, 
[SalesAmount] [money] NOT NULL, 
) ON [ps_int_date]([OrderDateKey]) 
 
CREATE CLUSTERED INDEX [CL_SalesOrder__OrderDateKey] ON [dbo].[SalesOrder] ([OrderDateKey] ASC) ON [ps_int_date]([OrderDateKey]) 
END 
 
-- Create staging.SalesOrder 
IF (SELECT TOP 1 1 FROM sys.tables AS s WHERE name = 'SalesOrder' AND SCHEMA_NAME(schema_id) = 'staging') IS NULL 
BEGIN 
CREATE TABLE [staging].[SalesOrder]( 
[OrderDateKey] [int] NOT NULL, 
[ProductKey] [int] NOT NULL, 
[CustomerKey] [int] NOT NULL, 
[SalesOrderNumber] [nvarchar](20) NOT NULL, 
[SalesOrderLineNumber] [tinyint] NOT NULL, 
[OrderQuantity] [smallint] NOT NULL, 
[SalesAmount] [money] NOT NULL 
) ON [ps_int_date]([OrderDateKey]) 
 
CREATE CLUSTERED INDEX [CL_Staging_SalesOrder__OrderDateKey] ON [staging].[SalesOrder] ([OrderDateKey] ASC) ON [ps_int_date]([OrderDateKey]) 
END 

3. Let's create some test data

--Create a test workload 
DECLARE @LoopCounter INT = 0 
 
WHILE @LoopCounter < 5000 
BEGIN 
INSERT [staging].[SalesOrder] ([OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount]) 
VALUES (CAST(CONVERT(CHAR(8), DATEADD(DAY, -(ABS(CHECKSUM(NEWID()) % 3650 )), GETDATE()), 112) AS INT), --[OrderDateKey] 
 RAND() * 9 + 1, --[ProductKey] 
 RAND() * 9999 + 1, --[CustomerKey] 
 CONCAT(CHAR((RAND() * 25 + 65)), CHAR((RAND() * 25 + 65)), CAST(RAND() * 99 + 1 AS INT)), --[SalesOrderNumber] 
 RAND() * 9 + 1, --[SalesOrderLineNumber] 
 RAND() * 9 + 1, --[OrderQuantity] 
 ROUND(RAND(CHECKSUM(NEWID())) * (100), 2) --[SalesAmount] 
 ) 
  
 SET @LoopCounter = @LoopCounter + 1 
END 

4. We’ll also copy the data across to our user facing table so both table match

--Copy into our staging table so they match 
INSERT [dbo].[SalesOrder] 
SELECT	[OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount] 
FROM	[staging].[SalesOrder] 
GO 

5. Now, let’s insert a few new records to simulate data we’ve received as part of our ETL

-- Insert 10 new records 
DECLARE @LoopCounter INT = 0 
 
WHILE @LoopCounter <= 10 
BEGIN 
INSERT [staging].[SalesOrder] ([OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount]) 
VALUES (CAST(CONVERT(CHAR(8), DATEADD(DAY, -(ABS(CHECKSUM(NEWID()) % 3650 )), GETDATE()), 112) AS INT), --[OrderDateKey] 
 RAND() * 9 + 1, --[ProductKey] 
 RAND() * 9999 + 1, --[CustomerKey] 
 CONCAT(CHAR((RAND() * 25 + 65)), CHAR((RAND() * 25 + 65)), CAST(RAND() * 99 + 1 AS INT)), --[SalesOrderNumber] 
 RAND() * 9 + 1, --[SalesOrderLineNumber] 
 RAND() * 9 + 1, --[OrderQuantity] 
 ROUND(RAND(CHECKSUM(NEWID())) * (100), 2) --[SalesAmount] 
 ) 
  
 SET @LoopCounter = @LoopCounter + 1 
END
GO 

6. Performing the partition switch

DECLARE @CurrentPartition INT 
DECLARE @StagingRowCount INT  
DECLARE @dboRowCount INT  
 
WHILE (SELECT COUNT(*) FROM [staging].[SalesOrder] ) > 0 
BEGIN 
-- Get lowest numbered partition with data in it 
SELECT @CurrentPartition = MIN($PARTITION.[pf_int_date]([OrderDateKey])) FROM [staging].[SalesOrder] 
 
-- Get the row count from staging.SalesOrder and dbo.SalesOrder for @CurrentPartition 
SELECT @StagingRowCount = COUNT(*) FROM [staging].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition 
SELECT @dboRowCount = COUNT(*) FROM [dbo].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition 
 
-- If the row counts don’t match we must have new records in this partition. Move data from staging.SalesOrder to dbo.SalesOrder 
 
IF @StagingRowCount <> @dboRowCount 
BEGIN TRAN 
DELETE FROM [dbo].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition 
ALTER TABLE [staging].[SalesOrder] SWITCH PARTITION @CurrentPartition TO [dbo].[SalesOrder] PARTITION @CurrentPartition  
COMMIT TRAN 
END 
 
Availability

Partitioning is supported in Microsoft SQL Server versions 2016 SP1 and later, and Azure SQL and Azure SQL Managed Instances.

5 views
bottom of page