Partition switching allows you to move large volumes of data quickly. If you are working in a data warehousing environment with multimillion row+ tables and require high availability of data to end users, along with frequent complex recalculations or complete refreshes of your dataset, partition switching might be for you. Partition switching is also ideal for efficient archival of data.
How it works
Instead of removing and reloading data into a user facing table, data is loaded into an identical non-user facing table in the background. Once the data has been loaded, the data is ‘switched’ into the user facing table using partition switching. This is extremely quick and appears seamless to the end user.
Requirements
You will need two tables – one accessible by your end users and the other for staging data. Both tables need to be partitioned and identical by design. Note there are certain limitations on partitioning and Microsoft’s documentation on the subject can be found here.
Example
We are going to create a simple sales fact table (dbo.SalesOrder) in our data warehouse and an identical staging table (staging.SalesOrder). In this example we are going to simulate an ETL that requires all data to be reloaded.
1. First let’s create our partition function and scheme. We are going to partition our data by the date our sale took place (which in our case is stored as an integer in our sales table). Because we aren’t a particularly big business, we are going to set up our partitions into yearly segments.
-- Create Partition Function
IF (SELECT TOP 1 1 FROM sys.partition_functions WHERE name = 'pf_int_date') IS NULL
BEGIN
CREATE PARTITION FUNCTION [pf_int_date] (INT)
AS RANGE RIGHT FOR VALUES (20100101, 20110101, 20120101, 20130101, 20140101, 20150101, 20160101, 20170101, 20180101, 20190101,
20200101, 20210101, 20220101, 20230101, 20240101, 20250101, 20260101, 20270101, 20280101, 20290101)
END
-- Create Partition Scheme
IF (SELECT TOP 1 1 FROM sys.partition_schemes WHERE name = 'ps_int_date') IS NULL
BEGIN
CREATE PARTITION SCHEME [ps_int_date]
AS PARTITION [pf_int_date]
ALL TO ([PRIMARY])
END
2. Next let’s create our two identical partitioned tables
-- Create dbo.SalesOrder
IF (SELECT TOP 1 1 FROM sys.tables AS s WHERE name = 'SalesOrder' AND SCHEMA_NAME(schema_id) = 'dbo') IS NULL
BEGIN
CREATE TABLE [dbo].[SalesOrder](
[OrderDateKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[OrderQuantity] [smallint] NOT NULL,
[SalesAmount] [money] NOT NULL,
) ON [ps_int_date]([OrderDateKey])
CREATE CLUSTERED INDEX [CL_SalesOrder__OrderDateKey] ON [dbo].[SalesOrder] ([OrderDateKey] ASC) ON [ps_int_date]([OrderDateKey])
END
-- Create staging.SalesOrder
IF (SELECT TOP 1 1 FROM sys.tables AS s WHERE name = 'SalesOrder' AND SCHEMA_NAME(schema_id) = 'staging') IS NULL
BEGIN
CREATE TABLE [staging].[SalesOrder](
[OrderDateKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[OrderQuantity] [smallint] NOT NULL,
[SalesAmount] [money] NOT NULL
) ON [ps_int_date]([OrderDateKey])
CREATE CLUSTERED INDEX [CL_Staging_SalesOrder__OrderDateKey] ON [staging].[SalesOrder] ([OrderDateKey] ASC) ON [ps_int_date]([OrderDateKey])
END
3. Let's create some test data
--Create a test workload
DECLARE @LoopCounter INT = 0
WHILE @LoopCounter < 5000
BEGIN
INSERT [staging].[SalesOrder] ([OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount])
VALUES (CAST(CONVERT(CHAR(8), DATEADD(DAY, -(ABS(CHECKSUM(NEWID()) % 3650 )), GETDATE()), 112) AS INT), --[OrderDateKey]
RAND() * 9 + 1, --[ProductKey]
RAND() * 9999 + 1, --[CustomerKey]
CONCAT(CHAR((RAND() * 25 + 65)), CHAR((RAND() * 25 + 65)), CAST(RAND() * 99 + 1 AS INT)), --[SalesOrderNumber]
RAND() * 9 + 1, --[SalesOrderLineNumber]
RAND() * 9 + 1, --[OrderQuantity]
ROUND(RAND(CHECKSUM(NEWID())) * (100), 2) --[SalesAmount]
)
SET @LoopCounter = @LoopCounter + 1
END
4. We’ll also copy the data across to our user facing table so both table match
--Copy into our staging table so they match
INSERT [dbo].[SalesOrder]
SELECT [OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount]
FROM [staging].[SalesOrder]
GO
5. Now, let’s insert a few new records to simulate data we’ve received as part of our ETL
-- Insert 10 new records
DECLARE @LoopCounter INT = 0
WHILE @LoopCounter <= 10
BEGIN
INSERT [staging].[SalesOrder] ([OrderDateKey], [ProductKey], [CustomerKey], [SalesOrderNumber], [SalesOrderLineNumber], [OrderQuantity], [SalesAmount])
VALUES (CAST(CONVERT(CHAR(8), DATEADD(DAY, -(ABS(CHECKSUM(NEWID()) % 3650 )), GETDATE()), 112) AS INT), --[OrderDateKey]
RAND() * 9 + 1, --[ProductKey]
RAND() * 9999 + 1, --[CustomerKey]
CONCAT(CHAR((RAND() * 25 + 65)), CHAR((RAND() * 25 + 65)), CAST(RAND() * 99 + 1 AS INT)), --[SalesOrderNumber]
RAND() * 9 + 1, --[SalesOrderLineNumber]
RAND() * 9 + 1, --[OrderQuantity]
ROUND(RAND(CHECKSUM(NEWID())) * (100), 2) --[SalesAmount]
)
SET @LoopCounter = @LoopCounter + 1
END
GO
6. Performing the partition switch
DECLARE @CurrentPartition INT
DECLARE @StagingRowCount INT
DECLARE @dboRowCount INT
WHILE (SELECT COUNT(*) FROM [staging].[SalesOrder] ) > 0
BEGIN
-- Get lowest numbered partition with data in it
SELECT @CurrentPartition = MIN($PARTITION.[pf_int_date]([OrderDateKey])) FROM [staging].[SalesOrder]
-- Get the row count from staging.SalesOrder and dbo.SalesOrder for @CurrentPartition
SELECT @StagingRowCount = COUNT(*) FROM [staging].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition
SELECT @dboRowCount = COUNT(*) FROM [dbo].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition
-- If the row counts don’t match we must have new records in this partition. Move data from staging.SalesOrder to dbo.SalesOrder
IF @StagingRowCount <> @dboRowCount
BEGIN TRAN
DELETE FROM [dbo].[SalesOrder] WHERE $PARTITION.[pf_int_date]([OrderDateKey]) = @CurrentPartition
ALTER TABLE [staging].[SalesOrder] SWITCH PARTITION @CurrentPartition TO [dbo].[SalesOrder] PARTITION @CurrentPartition
COMMIT TRAN
END
Availability
Partitioning is supported in Microsoft SQL Server versions 2016 SP1 and later, and Azure SQL and Azure SQL Managed Instances.
Comments