Archive

Posts Tagged ‘SSIS’

Code Reuse in SSIS – Passing Values To and From Child Packages

August 15, 2014 3 comments

I recently found myself having to replicate the same piece of script code in a SSIS package multiple times, which of course makes any developer feel dirty as copied code tends to drift divergently and not be maintained correctly, especially where the number of copies is beyond a manageable few.

Given restrictions of various clients that I’ve worked with I’ve had to avoid creating SSIS extensions or DLLs that would have to be deployed along with the package. This has meant that code reuse is always a little tricky. The tricky aspect is that in order to reuse a script in any meaningful way parameter values need to be passed in and return values received back. Thus, with plain SSIS this is difficult.

Or so I thought. I’ve now found that it is possible, and in fact easy to pass parameters down to child packages, via the well-known ‘Parent package variable’ configuration, but also to pass values back up to the parent package. Below I’ll walk through a simple example showing how this is done.

Firstly, create a parent package as shown below. We’ll fill in the details shortly.

Contents of Parent Package

 

The parent package will execute a child package two times in parallel to show that there are no issues with parallel running.

The Sequence Container labelled ‘Start’ is purely a placeholder and contains no sub-tasks.

The two parallel Sequence Containers each contain a call to the child package and a script task to message box out the result. Message boxes clearly are not the best thing in an SSIS package, but they suit our purposes for this demo.

Against each Sequence Container I set up two variables, one as a parameter value to pass to the child package and another to store the result coming back. To show that the parameter value passed down is consumed correctly each container is given a different value as shown in the screenshots below.

Sequence 1 Variables

Sequence 2 Variables

 

The code to show the message box containing the resultant value is shown below. Don’t forget to add ‘User::Result’ as a read only variable for the script task.

public void Main()
{
    // display result in a message box
    MessageBox.Show(Dts.Variables["User::Result"].Value.ToString());

    // return success as the task result
    Dts.TaskResult = (int)ScriptResults.Success;
}

 

Before we can set up the task to execute the child package we first need to create the child package.

The child package simply consists of two script tasks. The first to create a resultant value from the parent parameter value and the second to sleep for 5 seconds to show parallel running.

Child Package

 

In order to access the parent parameter value we need to set up package configuration on the child package. To do this, firstly create a variable on the child package to hold the parent value.

Child Package Variables

 

Secondly, right click anywhere on the background of the child package and select ‘Package Configurations…’.

Tick the check box labelled ‘Enable package configurations’.

Click ‘Add…’ to add a new configuration item which will pop up the Package Configuration Wizard as per the screenshot below.

Package Configuration Wizard Step 1

 

Select ‘Parent package variable’ as the configuration type.

For ‘Parent variable’ simply enter the name and not the full path of the parent variable, ‘ParameterValue’. By not entering the path, the variable is searched for initially in the scope of the task calling the child package and if not found, then upwards from there. Since we want to reuse the child task it is important that a full path is not specified so that we can pick up the value from wherever the child task is called from.

Click ‘Next >’.

Now we need to assign the parent value to a property within the child package. Select the ‘Value’ property of the variable we created earlier called ‘ParentParameterValue’ as per the screenshot below.

Package Configuration Wizard Step 2

 

Click ‘Next >’.

Finally, give the configuration a name as per the screenshot below. Again, I’ve chosen ‘ParentParameterValue’.

Package Configuration Wizard Step 3

 

Finally, click ‘Finish’ and then ‘Close’.

So now, when the parent package is executed, the variable values set in the parent package can be picked up in the child package. Let’s now see how we can pass result values back. It is surprisingly simple.

Edit the script task in the child package called ‘Set Result Value’. In the initial dialogue box we need to set up the variables that the script can access. Set them up as per the screenshot below, i.e. adding ‘User::ParentParameterValue’ as a read only variable and ‘User::Result’ as a read/write variable.

Child Script Task Configuration

 

But wait! What is this ‘User::Result’ variable. We’ve not set it up in the child package. The script task won’t complain. What this in fact will refer to, once the package is run, is the nearest variable of that name in scope, starting from the child package, but then bubbling up the search through to the parent package starting at the point where the child package is called and then onwards and upwards. One way to think about it is to imagine that a copy of the contents of the child package is in fact inserted into the parent package in place of the ‘Execute Child Package’ task.

Once you think of it like that, the next logical question is whether it is possible to read parent package variables without having to add in the ‘Parent Package Variable’ configuration, and it is. So long as no explicit paths are given a variable reference will bubble up from the scope of the child package through to the parent calling package just as if the child were in fact part of the parent.

So why do we need ‘Parent Package Variable’ configuration at all? It is needed because validation of the child package for nearly all task types will fail if the variable references don’t point to variables that the validation process can see, i.e. are in scope in the current package as packages are validated in their own right and not in the context of any calling package. For Script Tasks this variable validation is not carried out. Variables within the script itself are available through the Dts.Variables object which is an aggregation of all variables in scope at the time of execution going up from the child package up to and including the parent package. Dts.Variables also has a ‘Contains’ method to allow you to check for the existence of a variable before use.

So, back to the demo. The script itself is shown below and simply shows that the parent value is accessible and also adds in a message from the child.

public void Main()
{
    // set the result value to be the parent value plus some changes
    Dts.Variables["User::Result"].Value = "Result: '" + Dts.Variables["User::ParentParameterValue"].Value + "'. Child says hello.";

    // return success as the task result
    Dts.TaskResult = (int)ScriptResults.Success;
}

 

The script for the ‘Sleep for 5 Seconds’ task is as below. You will need to add a ‘using’ to ‘System.Threading’ at the top of the script.

public void Main()
{
    // set the result value to be the parent value plus some changes
    Thread.Sleep(5000);

    // return success as the task result
    Dts.TaskResult = (int)ScriptResults.Success;
}

 

Finally, we can return to the parent package and complete the details required in the ‘Execute Child Package’ tasks, as per the screenshot below.

Execute Child Task

 

Executing the parent package, following the 5 second sleep, results in the following two message boxes popping up simultaneously.

Result 1Result 2

 

From this we can see that both child tasks executed in parallel, both could access variables in the parent package and could set values in the parent package and that the parent package itself could see those changes.

Going back to the initial code reuse problem this means that script tasks can be moved into child packages and can pick up in scope variables values from the executing parent and return values back for further consumption within the parent package. Brilliant.

 

Categories: SSIS Tags: , , ,

SSIS Package Logging

December 5, 2011 1 comment

 

Thank you to Adam Aspin, my much respected friend, who introduced me to, and inspires me in, all things SQL.

SSIS packages, especially once loaded onto a production server are very opaque. There is very little information provided about the progress of the package and no stats are available regarding which tasks are taking the most time, therefore some form of logging is essential.

In this post I’ll outline a very simple and easy to implement approach to logging that can be applied to the SSIS package as well as stored procedures that are called within it.

In order to maintain separation of log info from any particular application it would be preferable to create a separate logging database, however it is not essential. In this post I’ll create a ‘LOG’ schema and create the tables and stored procedures within this schema.

In brief we will:

  1. Create a reference table and data related to logging.
  2. Create a table to store the log entries.
  3. Create stored procedures to facilitate the logging.
  4. Create events within an SSIS package to add logging information.
  5. Annotate a sample stored procedure to include logging.
  6. Review the logging output.

 

Create a reference table and data related to logging

The only reference table we need is to define the type of each log entry. Primarily we need start and stop entries for each type of item we are going to track. Additionally I have added an extra two types, one for information and one for errors.

By using negative numbers mirroring the positive numbers it makes it much easier to see elements start and stop when reviewing a log.

CREATE TABLE LOG.LogEntryType (
    log_entry_type_id smallint NOT NULL,
    name varchar(50) NOT NULL,
    CONSTRAINT PK_LogStepType PRIMARY KEY CLUSTERED (log_entry_type_id ASC)
)
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (1, 'SSIS Package Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-1, 'SSIS Package Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (2, 'SSIS Task Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-2, 'SSIS Task Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (3, 'Stored Procedure Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-3, 'Stored Procedure Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (4, 'Stored Procedure Step Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-4, 'Stored Procedure Step Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (99, 'Information')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (100, 'Error')

 

Create a table to store the log entries

CREATE TABLE LOG.LogEntry(
    log_entry_id bigint IDENTITY(1,1) NOT NULL,
    log_entry_type_id smallint NOT NULL,
    date_created datetime NOT NULL,
    entity_id varchar(50) NULL,
    entity_name varchar(2000) NOT NULL,
    parent_entity_id varchar(50) NULL,
    parent_entity_name varchar(2000) NULL,
    value bigint NULL,
    description varchar(2000) NULL,
    CONSTRAINT PK_LogEntry PRIMARY KEY NONCLUSTERED (log_entry_id ASC)
)

ALTER TABLE LOG.LogEntry WITH CHECK 
ADD CONSTRAINT FK_LogEntry_LogEntryType FOREIGN KEY(log_entry_type_id)
REFERENCES LOG.LogEntryType (log_entry_type_id)

ALTER TABLE LOG.LogEntry CHECK CONSTRAINT FK_LogEntry_LogEntryType

 

Create stored procedures to facilitate the logging.

The code below creates the stored procedure required to add a simple log entry. The parameters will become clear as we work on adding logging to an SSIS package.

CREATE Procedure LOG.AddLogEntry
/******************************************************************************
Description:
    Adds an entry to the log.
History:
    05-Oct-2011 S.Wilber
        Created
******************************************************************************/
    @log_entry_type_id smallint
   ,@entity_id varchar(50)
   ,@entity_name varchar(2000)
   ,@parent_entity_id varchar(50) = Null
   ,@parent_entity_name varchar(2000) = Null
   ,@error_code int = Null
   ,@error_description varchar(2000) = Null
   ,@value bigint = Null
   ,@value_description varchar(2000) = Null
As

Insert          LOG.LogEntry 
                (
                    log_entry_type_id
                   ,date_created
                   ,entity_id
                   ,entity_name
                   ,parent_entity_id
                   ,parent_entity_name
                   ,value
                   ,description
                )
Values          (
                    @log_entry_type_id
                   ,GetDate()
                   ,@entity_id
                   ,@entity_name
                   ,@parent_entity_id
                   ,@parent_entity_name
                   ,Case When @error_code Is Not Null Then Convert(bigint, @error_code) Else @value End
                   ,Case When @error_code Is Not Null Then @error_description Else @value_description End
                )

 

The following code creates the stored procedure that we are going to use to add logging to our own stored procedures. This code is a little more complex as I want to minimise the logging code that I will need to add to any stored procedure. Rather than adding a log entry to start and stop each step within a store procedure I just want to log the start; the code below will automatically add the stop for the previous step as well as closing off the last step when finishing the stored procedure.

CREATE Procedure LOG.AddProcedureLogEntry
/******************************************************************************
Description:
    Adds an entry to the log. This is a wrapper to AddLogEntry but tailored
    for stored procedures.
    Depending on the combination of parameters it closes the previous step
    and starts the current step
History:
    05-Oct-2011 S.Wilber
        Created
******************************************************************************/
    @proc_id int
   ,@step_number smallint = Null Output
   ,@step_description varchar(2000) = Null
   ,@close_only bit = 0
   ,@parent_entity_id varchar(50) = Null
   ,@parent_entity_name varchar(2000) = Null
As


Declare         @log_entry_type_id smallint
               ,@entity_id varchar(50) = Convert(varchar(10), @proc_id) + '-' + Convert(varchar(10), @@spid)
               ,@entity_name varchar(2000) = Object_Name(@proc_id)
               ,@error_code int = Null
               ,@error_description varchar(2000) = Null
               ,@value bigint
               ,@value_description varchar(2000)
               ,@log_entry_type_proc_start smallint = 3
               ,@log_entry_type_proc_end smallint = -3
               ,@log_entry_type_proc_step_start smallint = 4
               ,@log_entry_type_proc_step_end smallint = -4


-- if this is for a step - close off the previous step and then make a new one
If @step_number Is Not Null
  Begin
    If @step_number != 0
      Begin
        Set             @log_entry_type_id = @log_entry_type_proc_step_end
        Set             @value = Convert(bigint, @step_number)
        
        -- get the last description
        Select Top 1    @value_description = description
        From            Log.LogEntry
        Where           log_entry_type_id = @log_entry_type_proc_step_start
        And             entity_id = @entity_id
        And             value = @value
        Order By        log_entry_id desc
        
        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
      End
        
    If @close_only != 1
      Begin
        -- increment the step number
        Set             @step_number = @step_number + 1
        Set             @log_entry_type_id = @log_entry_type_proc_step_start
        Set             @value = Convert(bigint, @step_number)
        Set             @value_description = IsNull(@step_description, 'Step: ' + Convert(varchar(10), @step_number))

        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
        
      End  
  End
Else
  Begin

    -- check that the last step has been closed
    Select Top 1    @value_description = description
                   ,@step_number = Convert(smallint, value)
                   ,@log_entry_type_id = log_entry_type_id
    From            Log.LogEntry
    Where           entity_id = @entity_id
    Order By        log_entry_id desc
    
    If @log_entry_type_id = @log_entry_type_proc_step_start
      Begin
        Set             @log_entry_type_id = @log_entry_type_proc_step_end
        Set             @value = Convert(bigint, @step_number)
        
        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
      End

    -- now just write the procedure start/end              
    Set @log_entry_type_id = Case When @close_only = 0 Then @log_entry_type_proc_start Else @log_entry_type_proc_end End
    Set @value = Null
    Set @value_description = Null
    
    Exec LOG.AddLogEntry
       @log_entry_type_id
      ,@entity_id
      ,@entity_name
      ,@parent_entity_id
      ,@parent_entity_name
      ,@error_code
      ,@error_description
      ,@value
      ,@value_description

  End    

 

Create events within an SSIS package to add logging information.

Logging within SSIS is performed by carrying out actions triggered by ‘events’.

The events we are interested in here are:

  • OnPreExecute – triggered just before the package or a task within the package starts to execute.
  • OnPostExecute – triggered just after the package or a task within the package has executed.
  • OnError – trigged when an error occurs.

Now very conveniently, if the events are created at the package scope, i.e. click on the background of your package before clicking on the ‘Event Handlers’ tab, then the events fire both for the package starting and stopping, but also for each and every task starting and stopping. Therefore if we called our stored procedure ‘LOG.AddLogEntry’ from within the events, then the whole package would be logged.

However, before you dive in, we will need to be able to differentiate between events triggered off for tasks and for the package, and we’ll do this by using a few little variables evaluated from expressions.

The first step is to create three variables, in the scope of the event, that will contain some of the values that we will send to the ‘LOG.AddLogEntry’ stored procedure. The details of the variables are given below, but note that in the properties for the variable, set ‘EvaluateAsExpression’ to ‘True’.

Name Type Expression
LogEntryID String @[System::SourceID]
LogEntryName String @[System::SourceID] == @[System::PackageID] ? @[System::PackageName] : @[System::SourceName]
LogEntryTypeID Int16 (DT_I2)(@[System::SourceID] == @[System::PackageID] ? 1 : 2)

The screenshot below shows the ‘OnPreExecute’ event, which is the same as on the ‘OnPostExecute’ event. You can see the three variables required on the left hand side (note their scope) and also the Execute SQL Task that will call the stored procedure to add the log entry.

Screenshot - OnPreExecute

 

For the execution of the stored procedure to add the log entry I’ve used an ADO.NET connection due to the clarity of the parameter mapping.

The Execute SQL Task merely calls the stored procedure ‘LOG.AddLogEntry’ as shown in the screenshot below.

Screenshot - Execute SQL Task

 

The parameters are mapped as shown in the screenshot below.

Screenshot - Execute SQL Task Parameters

 

The PostExecute event is very similar to the PreExecute event and I won’t go into further details here.

I also like to catch the OnError event, and its implementation is very similar. The ‘LogEntryTypeID’ variable is still require, but it does not need to be an expression and its value can just be set to 100.

Screenshot - OnError

 

The parameters require are slightly different and are shown in the following screenshot.

Screenshot - OnError Parameters

 

Annotate a sample stored procedure to include logging.

Adding logging to long and complex stored procedures can be a really valuable aid in clearly identifying long running sections or for pinning down errors or unexpected flow.

Since we have already created the stored procedure for logging from stored procedures above, here I’ll quickly go through the changes that need to be made to add logging.

The principle behind the code for logging from a stored procedure is to minimise the impact to the logged stored procedure in terms of additional code. Thus, each time the LOG.AddProcedureLogEntry is called potentially two entries are added to the log, one to close of the previous entry, i.e. to mark the previous step as finished and a second to mark the current step as starting.

Three additional parameters can be added to your stored procedure. These will become self-explanatory below.

@log bit = 1
@log_parent_entity_id varchar(50) = null
@log_parent_entity_name varchar(2000) = null

 

Additional to the parameters you will need one variable to record the current step number.

Declare @log_proc_step smallint = 0

 

Finally I add three types of log calls within the stored procedures. Note, you can dispense with the ‘if’ and the additional parameter if you always want logging to be switched on. Most of the parameters are fairly clear, but @close_only is used to instruct the procedure, in the case of a step, to only close off the previous step and not to create a new step entry, and also to differentiate between the stored procedure header/footer logging.

 

Stored procedure start – marking the start of the execution of the stored procedure

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = Null, @close_only = 0, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

 

Note that @step_number is null, so this is for either the header/footer and @close_only = 0, meaning that this is for the header.

 

Step – marking the start of each step

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = @log_proc_step Output, @step_description = 'Fetch Process Run IDs' ,@close_only = 0, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

Note the @step_description which allows you to add a note for the step. This makes reviewing the log so much more simple and I would strongly recommend going to the effort to write this note.

 

Stored Procedure finish – marking the end of the execution of the stored procedure

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = Null, @close_only = 1, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

Note that @step_number is null, so this is for either the header/footer and @close_only = 1, meaning that this is for the footer.

 

Review the logging output.

So, now all the logging is in place, what does the output look like. The screenshot below shows a section from a current project copied to Excel. The data is very simplistic, but you can write additional stored procedures to report out the logging information in a more informative way, including highlighting the % time that each step has used overall, using historical runs to identify the progress in terms of % of a current run, finding the top 10 slowest steps, etc.

Just for clarification I’ve coloured logging that has come from a stored procedures in pale purple.

Screenshot - Sample Log Results in Excel