Author – Hitesh Srivastava, Associate Data Engineer
This article describes how to configure CDC (Change Data Capture) using SSIS and dump the table from SQL Server 2016 hosted on Azure SQL-VM to Azure SQL Database by enabling CDC on the database and table at source i.e SQL Server 2016..
CDC (Change Data Capture)
Change Data Capture records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables, and makes a record available of what changed, where, and when, in simple relational ‘change tables’. CDC ensures that any modifications made in source data set are automatically transferred to destination data set. Enabling CDC makes the details of the changes available in a cdc.<captured_instance>_CT table with the ranking by which we can classify whether the row is updated, deleted or inserted. Rankings are :
- Delete
- Insert
- Updated row before the change
- Updated row after the change
Prerequisites:
1. For this blog we require Virtual Machine as Source.
2. For this blog we require Azure SQL Database as Destination.
STEPS :
1. For sample data, we will create a new database SourceDB, and create one table source table for this demo.
CREATE TABLE source (ID int, name varchar(250), CONSTRAINT PK PRIMARY KEY(ID) ) INSERT INTO source VALUES(1,'Hitesh') INSERT INTO source VALUES(2,'Ashutosh') INSERT INTO source VALUES(3,'Ayush')
2. Now, Enable CDC on database level and table level.
EXEC sp_changedbowner 'sa' GO USE SourceDB GO /*Enable CDC*/ EXEC sys.sp_cdc_enable_db GO /*Enable CDC to track tables*/ EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'source', @role_name = N'cdc_Admin', @supports_net_changes = 1 GO
After Enabling CDC you will have new tables in System Tables
Below is the screenshot of step 1 and 2
3. Create same table with same schema at destination (Azure SQL DB).
4. Download SQL Server Integration Services Project extension on Visual Studio, then create new project with Integration Services Project.
5. Open SSIS package and add 2 CDC control task tool and data flow task tool.
5.1. Double click the First Control Task (START) to bring up the editor.
- Add a new ADO NET connection manager for the Source database.
- Set CDC Control Operation to Mark initial load start.
- Create a new package variable (CDC_State) to hold the CDC state information.
- Set the 2nd connection manager for the Destination databases.
- Create a table for storing the state ([cdc_states]). This table will be used to track the CDC load information, so that you only pick up new changes each time the incremental load package is run. It will be created in the Destination database.
- Set the state name (CDC_State). This value acts as a key for the CDC state information. Packages that are accessing the same CDC data should be using a common CDC state name.
NOTE : In CDC Control Task Editor, the first connection manager will always be of source database and second connection manager will be always for destination database.
5.2. Double click on Data Flow and add ADO NET source and ADO NET Destination tool.
5.2.1. Double click on ADO NET Source and choose Source connection manager and the source table.
5.2.2. Double click on ADO NET Destination and choose destination connection manager and the destination table.
5.3. Configure the 2nd CDC Control Task(END) with the same settings as the first one, except the CDC Control Operation, it should be set to Mark initial load end.
5.4. Run this package and you can see data in the Destination Table.
6. After that, Create new SSIS Package for Incremental Load. We have to run this package every time we want to grab the latest changes from our Source table.
6.1. Create staging tables for deleted and updated rows at Destination Database(Azure SQL DB).
SELECT TOP 0 * INTO stg_UPDATES FROM dbo.destination SELECT TOP 0 * INTO stg_DELETES FROM dbo.destination
6.2. Create Control Flow by adding 2 CDC Control Task tool, 1 Data Flow Task tool and 4 Execute SQL Task tool.
6.3. Double click on CDC Control task START
- set the first connection manager – Source database.
- Set CDC Control Operations – Get processing range.
- Create a new package variable (CDC_State).
- Set the second connection manager – Destination database.
- Create a table for storing the state ([cdc_states]).
- Set the state name (CDC_State).
6.4. Double Click on data flow task, add CDC Source tool, CDC Splitter tool and 3 ADO NET Destination tool.
6.4.1. Double click on CDC Source:
- Set the Connection Manager to the Source database.
- Select the CDC enabled table which is your source table.
- Set the CDC processing mode to Net.
- Select the CDC_State variable.
6.4.2. CDC Splitter and ADO NET Destinations
- Set ADO NET Destination tools as “New Row”, “Delete” and “Update”.
- Connect the InsertOutput of the CDC Splitter to the “New rows” destination.
- Connect the DeleteOutput of the CDC Splitter to the “Delete” destination.
- Connect the UpdateOutput of the CDC Splitter to the “Update” destination.
6.4.3. Double click on ADO NET Destination – “New Row” and choose destination connection manager and on “use a table or view” choose destination table.
6.4.4. Double click on ADO NET Destination – “Delete” and choose destination connection manager and on “use a table or view” choose stg_DELETES table which you created at your destination database.
6.4.5. Double click on ADO NET Destination – “Update” and choose destination connection manager and on “use a table or view” choose stg_UPDATES table which you created at your destination database.
6.5. Come back to Control Flow and double click on “Perform Updates” and edit connection type to ADO.NET, edit connection to your destination connection manager and at the end edit SQLStatement by following command.
UPDATE [dbo].[destination] SET [dbo].[destination].[ID] = [dbo].[stg_UPDATES].[ID], [dbo].[destination].[name] = [dbo].[stg_UPDATES].[name] FROM [dbo].[destination], [dbo].[stg_UPDATES] WHERE [dbo].[stg_UPDATES].[ID] = [dbo].[destination].[ID]
6.7. Double click on “Perform Delete” and edit connection type to ADO.NET, edit connection to your destination connection manager and at the end edit SQLStatement by following command:-
DELETE FROM [dbo].[destination] WHERE[ID] IN ( SELECT [ID] FROM [dbo].[stg_DELETES] )
6.8. On the CDC Control task END, it should have the same settings as the first CDC Control Task in the package, except the CDC control operation, it will be Mark processed range in this control task.
6.9. Now we need to Truncate staging tables
6.9.1. Double click on Truncate Delete Stage and edit connection type to ADO.NET, edit connection to your destination connection manager and at the end edit SQLStatement by following command.
Truncate table stg_DELETES
6.9.2. Double click on Truncate Update Stage and edit connection type to ADO.NET, edit connection to your destination connection manager and at the end edit SQLStatement by following command:-
Truncate table stg_UPDATES
7. Now at the end, do some updates on source table and after that run incremental load package and the updates will be shown at your destination table.
Do some Updates on the source table.
Now run the Incremental Load Package and look at the destination table.