Data Warehousing Tip: Using T-SQL vs SSIS for Slowly Changing Dimension (SCD) Types 1 and 2

Here’s an example of using T-SQL to process a Slowly Changing Type 1&2 Dimension . Note:  SSIS has an SCD transform ,however, it does not process large dimensional datasets very quickly.
…eh, I’m putting it too nicely, the SSIS SCD transform is painfully slow!  This T-SQL performs 20X quicker than the SSIS SCD transform.
USE [SomeDB]
GO
/****** Object: StoredProcedure [dbo].[sp_UpsertDimGuest] Script Date: 9/11/2013 6:09:04 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_UpsertDimGuest]
AS
BEGIN
–DROP Constraints
–SCD 2 Historical Attribute Tracking via Upsert
INSERT INTO [dbo].[DimGuest]
( [GuestContactSrgKey],
[SourceSystem]
,[GuestIDAlternateKey]
,[GuestAcct]
,[GuestFirstName]
,[GuestLastName]
,[ETLLoadID]
,[EffStartDt]
,[EffEndDt]
,[IsCurrent]
)
SELECT
[GuestContactSrgKey]
,[SourceSystem]
,[GuestIDAlternateKey]
,[GuestAcct]
,[GuestFirstName]
,[GuestLastName]
,[ETLLoadID]
,GETDATE()
,’1/1/3000′
,’true’
FROM (
MERGE GTDW.dbo.DimGuest AS [Target]
USING GTDW_Staging.dbo.DimGuestStagingFinal AS [Source]
ON Target.GuestIDAlternateKey = Source.GuestIDAlternateKey
AND Target.IsCurrent = 1
WHEN MATCHED AND
(Target.[GuestFirstName] <> Source.[GuestFirstName]
OR Target.[GuestLastName] <> Source.[GuestLastName]
)
THEN UPDATE SET
IsCurrent = 0
,EffEndDt = GETDATE()
WHEN NOT MATCHED BY TARGET
THEN INSERT (
[GuestContactSrgKey]
,[SourceSystem]
,[GuestIDAlternateKey]
,[GuestAcct]
,[GuestFirstName]
,[GuestLastName]
,[ETLLoadID]
,[EffStartDt]
,[EffEndDt]
,[IsCurrent]
)
VALUES (
Source.[GuestContactSrgKey]
,Source.[SourceSystem]
,Source.[GuestIDAlternateKey]
,Source.[GuestAcct]
,Source.[GuestFirstName]
,Source.[GuestLastName]
,Source.[ETLLoadID]
,GETDATE()
,’1/1/3000′
,’true’
)
WHEN NOT MATCHED BY SOURCE AND Target.IsCurrent = 1
THEN UPDATE SET
IsCurrent = 0
,EffEndDt = GETDATE()
OUTPUT $action AS Action
,[Source].*
) AS MergeOutput
WHERE MergeOutput.Action = ‘UPDATE’
AND GuestIDAlternateKey IS NOT NULL
;
–Changing Attributes – history is not recorded
UPDATE DimG
SET
[GuestContactSrgKey] = DSF.[GuestContactSrgKey]
,[GuestAcct] = DSF.[GuestAcct]
,[ETLLoadID] = DSF.[ETLLoadID]
FROM DimGuest DimG
INNER JOIN [GTDW_Staging].[dbo].[DimGuestStagingFinal] DSF
ON DimG.GuestIDAlternateKey = DSF.GuestIDAlternateKey
AND DimG.IsCurrent = 1 –Optional
AND (
DimG.[GuestAcct] <> DSF.[GuestAcct]
)
END

SSIS How-To: Reuse Existing Connection Manager to Execute SQL in Script Task

There’s a myriad of examples out there on how to execute a sql task via OLEDB connections and C# in a SSIS Script Task.   However, I have yet to find a best practice solution that reuses an existing connection manager to properly establish, execute the SQL script (either dynamic or static), and dispose of an OLEDB connection.

Below is the base code that follows Microsoft’s best practice guidelines to execute dynamic SQL via C# against an existing SSIS connection manager.

1. In the script task, ensure that the following library is included:

‘using System.Data.OleDb;’


.
.
.
.

2. Import the following DLL to allow C# to import the connection string from the connection manager established in the package – Microsoft.SqlServedllSearchr.DTSRuntimeWrap.dll:

To import a missing DLL:
-If the Project Explorer pane is not visible in the Script Task script then click VIEW >> Project Explorer
-As the pane is now visible, right click REFERENCES >> ADD REFERENCE >> choose BROWSE and find the DLL

Note: On my Visual Studio installation, the DLL was imported from the following location – C:\Program Files\Common Files\microsoft shared\VSTA\Pipeline\AddInViews\Microsoft.SqlServedllSearchr.DTSRuntimeWrap.dll

After the DLL has been imported it can be viewed in the Project Explorer pane, as displayed below.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

3.  Here is a complete example as seen in public void Main():

VariableDispenser variableDispenser =this.Dts.VariableDispenser;
Variables lockedVariables =null;
variableDispenser.LockForRead(“User::VAR_CutoffDate”);
variableDispenser.GetVariables(ref lockedVariables);

string cutOffDate = lockedVariables[“VAR_CutoffDate”].Value.ToString();
string sqlScript = “SELECT “ + cutOffDate;

try
{

ConnectionManager cm = Dts.Connections[“Minerva”];       Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerDatabaseParameters100 cmParams = cm.InnerObject as Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerDatabaseParameters100;

    using (OleDbConnection conn = cmParams.GetConnectionForSchema() as OleDbConnection)
{
OleDbCommand command =new OleDbCommand(sqlScript, conn);
command.ExecuteNonQuery();
}

}

 catch (Exception e)
{
//event handling
    //MessageBox.Show(e.ToString());
 }

if (lockedVariables.Locked) { lockedVariables.Unlock(); }
Dts.TaskResult = (int)ScriptResults.Success;

Note: We’ve all been guilty of not accounting for common exceptions in C# logic, leaving the .Close() or .Dispose() function never to be called.

The line from above – ‘using (OleDbConnection conn = cmParams.GetConnectionForSchema() as OleDbConnection)’  follows Microsoft best practice of incorporating the the ‘using()’ function to automatically dispose of the OleDbConnection object (and any other disposable object for that matter) as opposed to .Close() or .Dispose() even if an exception is thrown.

So go with  ‘using’, it’s less problematic, saves us a few lines of coding, and leaving tools hanging isn’t as hilarious as it is on the street 🙂

%d bloggers like this: