Archive

Archive for the ‘SQL Server’ Category

Create SQL for Temp Table

Sometimes it is really useful to work with temp tables and to create them up front rather than using ‘Select Into’, but they can be a pain create for a couple of reasons:

  1. The temp table has loads of columns and you really don’t want to sit and type them all in.
  2. You are not aware of the structure of the source data.

The following stored procedure takes the hassle away. Merely select some data (use Select Top(0) for speed) into a temp table that has the desired structure and then execute the stored procedure below to generate the SQL will create the temp table.

Anyway, here’s the code.

 

Create Procedure dbo.GenerateTempTableSQL
/******************************************************************************
Description:
    Given a temp table the sql required to create a new temp table of the same
    structure is generated.
    This is useful during development to select into a temp table and
    then generate the code that could be used to create the temp table.
In:
    @input_temp_table_name
        The name of the temp table to use as the template the generate the SQL
        from.
    @output_temp_table_name [optional: @input_temp_table_name]
        The name of the table that is used in the create table script. If
        left as null, then the @input_temp_table_name will be used.
    @database_collation [optional: current database collation]
        The collation to use for char based columns.
History:
    16-Oct-2015 S.Wilber
        Created
    09-Nov-2015 S.Wilber
        Updated to include # in temp table name if it had been ommitted.
    14-Jul-2016 S.Wilber
        Updated to work from temp tables instead of permanent tables
******************************************************************************/
    @input_temp_table_name sysname
   ,@output_temp_table_name sysname = null
   ,@database_collation sysname = null
As

-- declare constants
Declare     @reserved_words varchar(max) = '|ADD|ALL|ALTER|AND|ANY|AS|ASC|AUTHORIZATION|BACKUP|BEGIN|BETWEEN|BREAK|BROWSE|BULK|BY|CASCADE|CASE|CHECK|CHECKPOINT|CLOSE|CLUSTERED|COALESCE|COLLATE|COLUMN|COMMIT|COMPUTE|CONSTRAINT|CONTAINS|CONTAINSTABLE|CONTINUE|CONVERT|CREATE|CROSS|CURRENT|CURRENT_DATE|CURRENT_TIME|CURRENT_TIMESTAMP|CURRENT_USER|CURSOR|DATABASE|DBCC|DEALLOCATE|DECLARE|DEFAULT|DELETE|DENY|DESC|DISK|DISTINCT|DISTRIBUTED|DOUBLE|DROP|DUMP|ELSE|END|ERRLVL|ESCAPE|EXCEPT|EXEC|EXECUTE|EXISTS|EXIT|EXTERNAL|FETCH|FILE|FILLFACTOR|FOR|FOREIGN|FREETEXT|FREETEXTTABLE|FROM|FULL|FUNCTION|GOTO|GRANT|GROUP|HAVING|HOLDLOCK|IDENTITY|IDENTITY_INSERT|IDENTITYCOL|IF|IN|INDEX|INNER|INSERT|INTERSECT|INTO|IS|JOIN|KEY|KILL|LEFT|LIKE|LINENO|LOAD|MERGE|NATIONAL|NOCHECK|NONCLUSTERED|NOT|NULL|NULLIF|OF|OFF|OFFSETS|ON|OPEN|OPENDATASOURCE|OPENQUERY|OPENROWSET|OPENXML|OPTION|OR|ORDER|OUTER|OVER|PERCENT|PIVOT|PLAN|PRECISION|PRIMARY|PRINT|PROC|PROCEDURE|PUBLIC|RAISERROR|READ|READTEXT|RECONFIGURE|REFERENCES|REPLICATION|RESTORE|RESTRICT|RETURN|REVERT|REVOKE|RIGHT|ROLLBACK|ROWCOUNT|ROWGUIDCOL|RULE|SAVE|SCHEMA|SECURITYAUDIT|SELECT|SEMANTICKEYPHRASETABLE|SEMANTICSIMILARITYDETAILSTABLE|SEMANTICSIMILARITYTABLE|SESSION_USER|SET|SETUSER|SHUTDOWN|SOME|STATISTICS|SYSTEM_USER|TABLE|TABLESAMPLE|TEXTSIZE|THEN|TO|TOP|TRAN|TRANSACTION|TRIGGER|TRUNCATE|TRY_CONVERT|TSEQUAL|UNION|UNIQUE|UNPIVOT|UPDATE|UPDATETEXT|USE|USER|VALUES|VARYING|VIEW|WAITFOR|WHEN|WHERE|WHILE|WITH|WITHIN GROUP|WRITETEXT|'

-- declare variables
Declare     @sql nvarchar(max)

-- table to store the columns
If Object_ID('tempdb..#temp') Is Not Null Drop Table #temp
Create Table #temp
(
    column_id int not null
   ,column_name nvarchar(130) null
   ,type_name nvarchar(130) not null
   ,max_length smallint not null
   ,scale tinyint not null
   ,[precision] tinyint not null
   ,is_nullable bit null
   ,collation_name sysname null
);

-- table to store the generated SQL
Declare     @output_sql table (line_id int identity(1, 1), sql nvarchar(max))
Declare     @object_id int

-- set the output name to be the input name if it is null or blank or whitespace
If NullIf(LTrim(RTrim(@output_temp_table_name)), '') Is Null Set @output_temp_table_name = @input_temp_table_name

-- set the collation if it's not set
If NullIf(LTrim(RTrim(@database_collation)), '') Is Null Set @database_collation = Cast(DatabasePropertyEx(db_name(), 'collation') As nvarchar(128))

-- add a # to the input temp table name if it does not already start with #
If Left(@input_temp_table_name, 1) != '#' Set @input_temp_table_name = '#' + @input_temp_table_name

-- add a # to the output temp table name if it does not already start with #
If Left(@output_temp_table_name, 1) != '#' Set @output_temp_table_name = '#' + @output_temp_table_name

-- get the object id
Select      @object_id = Object_ID('tempdb..' + @input_temp_table_name)

If @object_id Is Null
  Begin
    RaisError('''%s'' cannot be found.', 11, 1, @input_temp_table_name)
    Return
  End

-- generate the sql
Insert      @output_sql (sql)
Select      'If Object_ID(''tempdb..' + @output_temp_table_name + ''') Is Not Null Drop Table ' + @output_temp_table_name

Insert      @output_sql (sql)
Select      'Create Table ' + @output_temp_table_name

Insert      @output_sql (sql)
Select      '('

Insert      #temp (column_id, column_name, type_name, max_length, scale, precision, is_nullable, collation_name)
Select      c.column_id
           ,Case When CharIndex('|' + c.name + '|', @reserved_words) != 0 Then '[' + c.name + ']' Else c.name End As column_name
           ,Case When CharIndex('|' + t.name + '|', @reserved_words) != 0 Then '[' + t.name + ']' Else t.name End As type_name
           ,c.max_length
           ,c.scale
           ,c.precision
           ,c.is_nullable
           ,c.collation_name
From        tempdb.sys.objects o
Inner Join  tempdb.sys.schemas s On (s.schema_id = o.schema_id)
Inner Join  tempdb.sys.columns c On (o.object_id = c.object_id)
Inner Join  tempdb.sys.types t On (t.user_type_id = c.user_type_id)
Where       o.object_id = @object_id

Insert      @output_sql (sql)
Select      Case When column_id = 1 Then space(4) Else space(3) + ',' End
          + column_name + ' '
          + type_name
          + Case
                When Lower(type_name) In ('datetimeoffset', 'datetime2', 'time') Then
                    '(' + Cast(IsNull(c.scale, '') As varchar(10)) + ')'
                When Lower(type_name) In ('decimal', 'numeric') Then
                    '(' + Cast(IsNull(c.precision, '') As varchar(3)) + ', ' + Cast(IsNull(c.scale, '') As varchar(10)) + ')'
                When Lower(type_name) In ('varbinary', 'binary') Then
                    '('
                  + Case
                        When c.max_length = -1 Then
                            'Max'
                        Else
                            Cast(c.max_length As varchar(10))
                    End
                  + ')'
                When Lower(type_name) In ('char', 'varchar', 'nchar', 'nvarchar') Then
                    '('
                  + Case
                        When c.max_length = -1 Then
                            'Max'
                        When type_name In ('nchar', 'nvarchar') Then
                            Cast(c.max_length / 2 As varchar(10))
                        Else
                            Cast(IsNull(c.max_length, '') As varchar(10))
                    End
                  + ')'
                  + Case
                        When c.collation_name != @database_collation Then
                            ' Collate ' + c.collation_name
                        Else ''
                    End
                Else
                    ''
            End
          + Case
                When c.is_nullable = 0 Then
                    ' not null'
                Else
                    ' null'
            End
From        #temp c
Order By    column_id

Insert      @output_sql (sql)
Select      ');'

Select      sql
From        @output_sql
Order By    line_id

T-SQL Strip Invalid Characters

Occasionally I need to strip invalid characters from a string and the requirements are often subtly different, sometimes we want numbers only, other times numbers and decimal points, and maybe only letters.

We’ve probably all had to do this at some point and probably grabbed something off the internet, but I finally took the time to write a more generic and performant version as listed below which allows for a valid character pattern to be passed in.
 

Create Function dbo.StripInvalidCharacters
/******************************************************************************
Description:
    Strips invalid characters from a string. This is useful for cleaning up
    strings such as telephone numbers or human entered values such as
    '10.7 miles'
In:
    @input
        The string to strip characters from.
    @valid_character_pattern
        A search pattern that identifies invalid characters. Some example are:
        - '0-9'    - numbers excluding decimal point
        - '0-9.'   - numbers including decimal point
        - 'a-z'    - letters only
        - 'a-f0-9' - hexadecimal only
History:
    10-May-2016 S.Wilber
        Created
******************************************************************************/
(
    @input varchar(1000)
   ,@valid_character_pattern varchar(100)
)
Returns varchar(1000)
As
Begin
    While PatIndex('%[^' + @valid_character_pattern + ']%', @input) > 0
        Set @input = Stuff(@input,PatIndex('%[^' + @valid_character_pattern + ']%', @input), 1, '')     

    -- return
    Return @input
End

 
Feel free to copy and re-use. If you spot any improvements, please let me know and I’ll update this post.
 

Clear All Tables Within a Schema

August 11, 2015 Leave a comment

Often as part of an ETL process I want to simply clear out all the table within a schema, e.g. ‘landing’ or ‘etl’.

However, as new tables are added to the schema, the ‘clear out’ code needs to be maintained. In addition, if any referential integrity exists, the tables will need to be cleared in a certain order. Finally, the Truncate Table statement is significantly faster than a simple Delete, but there are restrictions on when it can be used.
All of which led me to create the following stored procedure so that I don’t have to think these concerns again, allowing me time to focus on the core of the ETL work.

Feel free to use/copy/amend as appropriate, but please let me have your comments/updates.


Create Procedure etl.ClearSchemaTables
/******************************************************************************
Description:
    Clears down any tables in the target schema.
    Tables are cleared down in referential order using truncate where possible.
In:
    @target_schema_name
        The name of the schema whose tables should be cleared, e.g. 'landing'.
History:
    03-Aug-2015 S.Wilber
        Created
******************************************************************************/
    @target_schema_name sysname
As

Set         NoCount On

-- declare variables
Declare     @object_id int
           ,@table_name sysname
           ,@schema_name sysname
           ,@is_referenced bit
           ,@success bit
           ,@sql nvarchar(max)

If Object_ID('tempdb..#table_dependencies') Is Not Null Drop Table #table_dependencies
Create Table #table_dependencies
(
    source_object_id int
   ,dependancy_object_id int
)

-- declare tables
If Object_ID('tempdb..#tables') Is Not Null Drop Table #tables
Create Table #tables
(
    object_id int
   ,schema_name sysname
   ,table_name sysname
   ,is_referenced bit default(0)
   ,is_processed bit default(0)
)

-- find all tables in the schema - look for fk references that are not from self
Insert      #tables (object_id, schema_name, table_name, is_referenced)
Select      t.object_id
           ,s.name
           ,t.name
           ,Case When fk.object_id Is Not Null Then 1 Else 0 End
From        sys.tables t
Inner Join  sys.schemas s On (s.schema_id = t.schema_id)
Left Join   sys.foreign_keys fk On (fk.referenced_object_id = t.object_id and fk.parent_object_id != t.object_id)
Where       s.name = @target_schema_name

-- retrieve all interdependancies between batch managed tables.
Insert      #table_dependencies (source_object_id, dependancy_object_id)
Select      fk.parent_object_id
           ,fk.referenced_object_id
From        sys.foreign_keys fk
Inner Join  #tables s On (s.object_id = fk.parent_object_id)
Inner Join  #tables t On (t.object_id = fk.referenced_object_id)

-- loop through all the tables delete the content, either via a truncate or delete depending on foreign keys
While Exists (Select 1 From #tables Where is_processed = 0)
  Begin
  
    -- find the first table where all of the dependant tables have been processed
    Select      @object_id = object_id
               ,@schema_name = schema_name
               ,@table_name = table_name
               ,@is_referenced = is_referenced
    From        #tables sbt
    Where       is_processed = 0
    And         Not Exists
                (
                    Select      1
                    From        #table_dependencies td
                    Inner Join  #tables tbt On (tbt.object_id = td.source_object_id)
                    Where       td.dependancy_object_id = sbt.object_id
                    And         tbt.is_processed = 0
                )

    -- if none found, then raise an error as it seems that we have a circular dependancy that we can't resolve
    If @object_id Is Null
      Begin
        RaisError ('A table where all dependant tables have been processed cannot be found. This implies that circular dependencies exist. Batch rollback cannot continue.', 11, 0)
      End

    Begin Try
        -- if referenced then we need to perform a delete otherwise we can perform a truncate
        If @is_referenced = 1 
          Begin
            Set         @sql = 'Delete From [' + @schema_name + '].[' + @table_name + ']'
          End
        Else
          Begin
            Set         @sql = 'Truncate Table [' + @schema_name + '].[' + @table_name + ']'
          End
          
        -- exeute the sql
        Exec        sp_executesql @stmt = @sql          
    
    End Try
    Begin Catch
        -- flag as unsuccessful
        Set         @success = 0

        -- declare variables
        Declare         @error_description varchar(2000)
                       ,@error_severity int
                       ,@error_state int

        -- fetch the error details
        Select         @error_description = ERROR_MESSAGE(), @error_severity = ERROR_SEVERITY(), @error_state = ERROR_STATE()

        -- throw the error
        RaisError (@error_description, @error_severity, @error_state) 
            
    End Catch
    
    -- update the table to state that that it has been processed
    Update      #tables
    Set         is_processed = 1
    Where       object_id = @object_id
    
  End  

Temporarily Disabling Referential Integrity Constraints on a SQL Table

I recently came across a situation where part of the data processing would put the data temporarily into a non-referential state. In order to accommodate this, without scripting Drop/Create statements which could get out of sync with other development work, I put together the following stored procedure which enables/disables check constraints.

Feel free to use/copy/amend as appropriate, but please let me have your comments/updates.

Create Procedure [etl].[SetCheckContraintsState]
/******************************************************************************
Description:
    Enables/Disables any check contraints that reference the target table.
    This is useful for when referenced data is going to be temporarily modified
    in a way that will invalidate any check constraints.
History:
    02-Jun-2015 S.Wilber
        Created
******************************************************************************/
    @target_schema_name sysname
   ,@target_table_name sysname
   ,@do_set_enabled bit
As

-- declare variables
Declare     @constraint_name sysname
           ,@referencing_schema_name sysname
           ,@referencing_table_name sysname
           ,@sql nvarchar(max)

Declare @constraints table
(
    ConstraintName sysname
   ,ReferencingSchemaName sysname
   ,ReferencingTableName sysname
)

Insert      @constraints (ConstraintName, ReferencingSchemaName, ReferencingTableName)
Select      fk.name As ConstraintName
           ,sch.name As ReferencingSchemaName
           ,t.name As ReferencingTableName
From        sys.foreign_keys fk
Inner Join  sys.tables t On (t.object_id = fk.parent_object_id)
Inner Join  sys.schemas sch On (sch.schema_id = t.schema_id)
Where       fk.referenced_object_id = Object_ID(QuoteName(@target_schema_name) + '.' + QuoteName(@target_table_name))

Declare     constraint_cursor Cursor Fast_Forward For
Select      ConstraintName
           ,ReferencingSchemaName
           ,ReferencingTableName
From        @constraints

Open constraint_cursor
Fetch From constraint_cursor Into @constraint_name, @referencing_schema_name, @referencing_table_name

While @@Fetch_Status = 0
  Begin
    If @do_set_enabled = 1
        Set @sql = 'Alter Table ' + QuoteName(@referencing_schema_name) + '.' + QuoteName(@referencing_table_name) + ' With Check Check Constraint ' + QuoteName(@constraint_name)
    Else
        Set @sql = 'Alter Table ' + QuoteName(@referencing_schema_name) + '.' + QuoteName(@referencing_table_name) + ' NoCheck Constraint ' + QuoteName(@constraint_name)

    Exec sp_executesql @sql

    -- get the next constraint
    Fetch From constraint_cursor Into @constraint_name, @referencing_schema_name, @referencing_table_name
End

Close constraint_cursor
Deallocate constraint_cursor

Fast Row Count for All Tables

November 12, 2013 Leave a comment

 

Using Count(*) to obtain a row count can be very slow on large tables. This is the only truly reliable way to obtain an accurate row count but, post SQL Server 2000, the mess that was sys.indexes, etc. was largely cleaned up and nowadays the row count can be obtained using the following code. Microsoft still states that the row count in sys.partitions is an “Approximate number of rows in this partition”, but it appears to be stable and accurate with appropriate caveats with regards to uncommitted transactions.

 

Select      s.Name
           ,t.Name
           ,Sum(p.rows) As Rows
From        sys.tables t
Inner Join  sys.schemas s On (s.schema_id = t.schema_id)
Inner Join  sys.indexes i On (i.object_id = t.object_id and i.index_id < 2)
Inner Join  sys.partitions p On (p.object_id = t.object_id)
Where       p.index_id In (0, 1)
Group By    s.Name
           ,t.Name
Order By    s.name
           ,t.name
Categories: SQL Server, T-SQL Tags: , , ,

Mark Items as Default in Reference Data … and Ensure There is Only One

November 6, 2013 Leave a comment

 

Often it is convenient to be able to default values in where they are not specified, or to not require items to be specified to make life easier.
For maintainability it is best to mark default values in the data itself, rather than in some code somewhere, and where better than in the source reference table.
By adding a bit column ‘IsDefault’ it is simple to mark a particular record as the default, however it is also easy to mark several items as default by mistake.
But with careful use of a filtered unique index it is possible to apply a constraint such that there can be only one default record as below.

 

Create Unique NonClustered Index IX_UNIQUE_IS_DEFAULT On MyReferenceTable
(
	IsDefault Asc
)
Where (IsDefault=(1))

 

By filtering, we don’t index the ‘false’ values so that more than one record can be marked as “not the default”, and by making it unique we ensure that we only have a single record marked as default.

Generate Strong Password in T-SQL

October 30, 2013 Leave a comment

 

Recently I needed to create a stored procedure to automate the creation of user logins within SQL Server and required a way to generate strong passwords. The following is the resultant code. It is fairly simplistic, but flexible and convenient.
It uses a recursive common table expression to create a number list and then uses the Crypt_Gen_Random function to generate a random number in the binary variable which is then used to dip into the character string to pull out the characters to build up the password. Due to the use of “For XML Path” characters which would be HTML encoded have been excluded.

 

Declare     @length int = 10
Declare     @chars char(92)
           ,@bytes binary(128)
           ,@password varchar(128)

Set         @chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!"£$%^*()_+-={[}]:;@~#,.?/'
Set         @bytes = Crypt_Gen_Random(128)
  
;With cte_numbers (number) As
(
    Select      Cast(1 As int)
    Union All
    Select      number + 1
    From        cte_numbers
    Where       number < @length
)
Select      @password = 
            (
                Select      SubString(@chars, (SubString(@bytes, n.Number, 1) % Len(@chars)) + 1, 1)
                From        cte_numbers n
                For         XML Path ('')
            )
            
-- display the resultant password
Select      @password
Categories: SQL Server, T-SQL

Listing duplicates from the same table without repetition

December 8, 2011 Leave a comment

 

I recently came across what was apparently a simple problem, but which had me initially stumped. The problem was, how to list out duplicates within the same table without repetition. For example if rows A and B in a table are deemed to have matched, then selecting from the table joined to itself will give two results, A matches B and B matches A. Both are valid results, but it effectively doubles the number of duplicates reported. I only needed one of the two results.
The solution turned out to be simple, of course, but since I spent some time thinking it through I thought it worth creating a quick post.

Walking through a simple example, create the following sample table.

CREATE TABLE dbo.DuplicateSample(
    id int IDENTITY(1,1) NOT NULL,
    name varchar(50) NOT NULL
)

 

Throw a load of sample data into it.

Insert dbo.DuplicateSample (name) Values('Bob')
Insert dbo.DuplicateSample (name) Values('Paul')
Insert dbo.DuplicateSample (name) Values('James')
Insert dbo.DuplicateSample (name) Values('Claire')
Insert dbo.DuplicateSample (name) Values('Alex')
Insert dbo.DuplicateSample (name) Values('Bob')
Insert dbo.DuplicateSample (name) Values('Rebecca')
Insert dbo.DuplicateSample (name) Values('Paul')
Insert dbo.DuplicateSample (name) Values('Andrew')
Insert dbo.DuplicateSample (name) Values('Giles')
Insert dbo.DuplicateSample (name) Values('Terry')
Insert dbo.DuplicateSample (name) Values('Robert')
Insert dbo.DuplicateSample (name) Values('Ralph')
Insert dbo.DuplicateSample (name) Values('Alex')
Insert dbo.DuplicateSample (name) Values('Heather')
Insert dbo.DuplicateSample (name) Values('Alice')
Insert dbo.DuplicateSample (name) Values('Heather')
Insert dbo.DuplicateSample (name) Values('Oliver')
Insert dbo.DuplicateSample (name) Values('Jack')
Insert dbo.DuplicateSample (name) Values('Harry')
Insert dbo.DuplicateSample (name) Values('Alfie')
Insert dbo.DuplicateSample (name) Values('Charlie')
Insert dbo.DuplicateSample (name) Values('Thomas')
Insert dbo.DuplicateSample (name) Values('William')
Insert dbo.DuplicateSample (name) Values('Joshua')
Insert dbo.DuplicateSample (name) Values('George')
Insert dbo.DuplicateSample (name) Values('James')

 

Now our initial attempt at selecting duplicates might be something like the following.

Select          ds1.id As id1
               ,ds1.name As name1
               ,ds2.id As id2
               ,ds2.name As name2
From            DuplicateSample ds1
Inner Join      DuplicateSample ds2 On (ds1.name = ds2.name And ds1.id != ds2.id)

 

This will give us the following results.

id1 name1 id2 name2
6 Bob 1 Bob
8 Paul 2 Paul
27 James 3 James
14 Alex 5 Alex
1 Bob 6 Bob
2 Paul 8 Paul
5 Alex 14 Alex
17 Heather 15 Heather
15 Heather 17 Heather
3 James 27 James

 

Note that all the rows are shown twice, but switched round, i.e. Bob 1 matches Bob 6, but also Bob 6 matches Bob 1.

So, how do we solve this? By adding in an order to the rows and only matching ahead.

;With cte_DuplicateSample As (
    Select          Row_Number() Over (Order By id) As row_number
                   ,id
                   ,name
    From            DuplicateSample
)
Select          ds1.id As id1
               ,ds1.name As name1
               ,ds2.id As id2
               ,ds2.name As name2
From            cte_DuplicateSample ds1
Inner Join      cte_DuplicateSample ds2 On (ds1.name = ds2.name And ds1.row_number < ds2.row_number)

 

This will give us the following, much better, results.

id1 name1 id2 name2
1 Bob 6 Bob
2 Paul 8 Paul
5 Alex 14 Alex
15 Heather 17 Heather
3 James 27 James

 

The CTE selects out the same source data, but adds in a row_number, then in the joining clause we only match for rows that are ahead of the current row, i.e. when the row_number is greater that the current row_number.

Simple in the end.

Categories: SQL Server Tags: , ,

SSIS Package Logging

December 5, 2011 1 comment

 

Thank you to Adam Aspin, my much respected friend, who introduced me to, and inspires me in, all things SQL.

SSIS packages, especially once loaded onto a production server are very opaque. There is very little information provided about the progress of the package and no stats are available regarding which tasks are taking the most time, therefore some form of logging is essential.

In this post I’ll outline a very simple and easy to implement approach to logging that can be applied to the SSIS package as well as stored procedures that are called within it.

In order to maintain separation of log info from any particular application it would be preferable to create a separate logging database, however it is not essential. In this post I’ll create a ‘LOG’ schema and create the tables and stored procedures within this schema.

In brief we will:

  1. Create a reference table and data related to logging.
  2. Create a table to store the log entries.
  3. Create stored procedures to facilitate the logging.
  4. Create events within an SSIS package to add logging information.
  5. Annotate a sample stored procedure to include logging.
  6. Review the logging output.

 

Create a reference table and data related to logging

The only reference table we need is to define the type of each log entry. Primarily we need start and stop entries for each type of item we are going to track. Additionally I have added an extra two types, one for information and one for errors.

By using negative numbers mirroring the positive numbers it makes it much easier to see elements start and stop when reviewing a log.

CREATE TABLE LOG.LogEntryType (
    log_entry_type_id smallint NOT NULL,
    name varchar(50) NOT NULL,
    CONSTRAINT PK_LogStepType PRIMARY KEY CLUSTERED (log_entry_type_id ASC)
)
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (1, 'SSIS Package Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-1, 'SSIS Package Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (2, 'SSIS Task Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-2, 'SSIS Task Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (3, 'Stored Procedure Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-3, 'Stored Procedure Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (4, 'Stored Procedure Step Start')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (-4, 'Stored Procedure Step Finish')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (99, 'Information')
INSERT LOG.LogEntryType (log_entry_type_id, name) VALUES (100, 'Error')

 

Create a table to store the log entries

CREATE TABLE LOG.LogEntry(
    log_entry_id bigint IDENTITY(1,1) NOT NULL,
    log_entry_type_id smallint NOT NULL,
    date_created datetime NOT NULL,
    entity_id varchar(50) NULL,
    entity_name varchar(2000) NOT NULL,
    parent_entity_id varchar(50) NULL,
    parent_entity_name varchar(2000) NULL,
    value bigint NULL,
    description varchar(2000) NULL,
    CONSTRAINT PK_LogEntry PRIMARY KEY NONCLUSTERED (log_entry_id ASC)
)

ALTER TABLE LOG.LogEntry WITH CHECK 
ADD CONSTRAINT FK_LogEntry_LogEntryType FOREIGN KEY(log_entry_type_id)
REFERENCES LOG.LogEntryType (log_entry_type_id)

ALTER TABLE LOG.LogEntry CHECK CONSTRAINT FK_LogEntry_LogEntryType

 

Create stored procedures to facilitate the logging.

The code below creates the stored procedure required to add a simple log entry. The parameters will become clear as we work on adding logging to an SSIS package.

CREATE Procedure LOG.AddLogEntry
/******************************************************************************
Description:
    Adds an entry to the log.
History:
    05-Oct-2011 S.Wilber
        Created
******************************************************************************/
    @log_entry_type_id smallint
   ,@entity_id varchar(50)
   ,@entity_name varchar(2000)
   ,@parent_entity_id varchar(50) = Null
   ,@parent_entity_name varchar(2000) = Null
   ,@error_code int = Null
   ,@error_description varchar(2000) = Null
   ,@value bigint = Null
   ,@value_description varchar(2000) = Null
As

Insert          LOG.LogEntry 
                (
                    log_entry_type_id
                   ,date_created
                   ,entity_id
                   ,entity_name
                   ,parent_entity_id
                   ,parent_entity_name
                   ,value
                   ,description
                )
Values          (
                    @log_entry_type_id
                   ,GetDate()
                   ,@entity_id
                   ,@entity_name
                   ,@parent_entity_id
                   ,@parent_entity_name
                   ,Case When @error_code Is Not Null Then Convert(bigint, @error_code) Else @value End
                   ,Case When @error_code Is Not Null Then @error_description Else @value_description End
                )

 

The following code creates the stored procedure that we are going to use to add logging to our own stored procedures. This code is a little more complex as I want to minimise the logging code that I will need to add to any stored procedure. Rather than adding a log entry to start and stop each step within a store procedure I just want to log the start; the code below will automatically add the stop for the previous step as well as closing off the last step when finishing the stored procedure.

CREATE Procedure LOG.AddProcedureLogEntry
/******************************************************************************
Description:
    Adds an entry to the log. This is a wrapper to AddLogEntry but tailored
    for stored procedures.
    Depending on the combination of parameters it closes the previous step
    and starts the current step
History:
    05-Oct-2011 S.Wilber
        Created
******************************************************************************/
    @proc_id int
   ,@step_number smallint = Null Output
   ,@step_description varchar(2000) = Null
   ,@close_only bit = 0
   ,@parent_entity_id varchar(50) = Null
   ,@parent_entity_name varchar(2000) = Null
As


Declare         @log_entry_type_id smallint
               ,@entity_id varchar(50) = Convert(varchar(10), @proc_id) + '-' + Convert(varchar(10), @@spid)
               ,@entity_name varchar(2000) = Object_Name(@proc_id)
               ,@error_code int = Null
               ,@error_description varchar(2000) = Null
               ,@value bigint
               ,@value_description varchar(2000)
               ,@log_entry_type_proc_start smallint = 3
               ,@log_entry_type_proc_end smallint = -3
               ,@log_entry_type_proc_step_start smallint = 4
               ,@log_entry_type_proc_step_end smallint = -4


-- if this is for a step - close off the previous step and then make a new one
If @step_number Is Not Null
  Begin
    If @step_number != 0
      Begin
        Set             @log_entry_type_id = @log_entry_type_proc_step_end
        Set             @value = Convert(bigint, @step_number)
        
        -- get the last description
        Select Top 1    @value_description = description
        From            Log.LogEntry
        Where           log_entry_type_id = @log_entry_type_proc_step_start
        And             entity_id = @entity_id
        And             value = @value
        Order By        log_entry_id desc
        
        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
      End
        
    If @close_only != 1
      Begin
        -- increment the step number
        Set             @step_number = @step_number + 1
        Set             @log_entry_type_id = @log_entry_type_proc_step_start
        Set             @value = Convert(bigint, @step_number)
        Set             @value_description = IsNull(@step_description, 'Step: ' + Convert(varchar(10), @step_number))

        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
        
      End  
  End
Else
  Begin

    -- check that the last step has been closed
    Select Top 1    @value_description = description
                   ,@step_number = Convert(smallint, value)
                   ,@log_entry_type_id = log_entry_type_id
    From            Log.LogEntry
    Where           entity_id = @entity_id
    Order By        log_entry_id desc
    
    If @log_entry_type_id = @log_entry_type_proc_step_start
      Begin
        Set             @log_entry_type_id = @log_entry_type_proc_step_end
        Set             @value = Convert(bigint, @step_number)
        
        Exec LOG.AddLogEntry
           @log_entry_type_id
          ,@entity_id
          ,@entity_name
          ,@parent_entity_id
          ,@parent_entity_name
          ,@error_code
          ,@error_description
          ,@value
          ,@value_description
      End

    -- now just write the procedure start/end              
    Set @log_entry_type_id = Case When @close_only = 0 Then @log_entry_type_proc_start Else @log_entry_type_proc_end End
    Set @value = Null
    Set @value_description = Null
    
    Exec LOG.AddLogEntry
       @log_entry_type_id
      ,@entity_id
      ,@entity_name
      ,@parent_entity_id
      ,@parent_entity_name
      ,@error_code
      ,@error_description
      ,@value
      ,@value_description

  End    

 

Create events within an SSIS package to add logging information.

Logging within SSIS is performed by carrying out actions triggered by ‘events’.

The events we are interested in here are:

  • OnPreExecute – triggered just before the package or a task within the package starts to execute.
  • OnPostExecute – triggered just after the package or a task within the package has executed.
  • OnError – trigged when an error occurs.

Now very conveniently, if the events are created at the package scope, i.e. click on the background of your package before clicking on the ‘Event Handlers’ tab, then the events fire both for the package starting and stopping, but also for each and every task starting and stopping. Therefore if we called our stored procedure ‘LOG.AddLogEntry’ from within the events, then the whole package would be logged.

However, before you dive in, we will need to be able to differentiate between events triggered off for tasks and for the package, and we’ll do this by using a few little variables evaluated from expressions.

The first step is to create three variables, in the scope of the event, that will contain some of the values that we will send to the ‘LOG.AddLogEntry’ stored procedure. The details of the variables are given below, but note that in the properties for the variable, set ‘EvaluateAsExpression’ to ‘True’.

Name Type Expression
LogEntryID String @[System::SourceID]
LogEntryName String @[System::SourceID] == @[System::PackageID] ? @[System::PackageName] : @[System::SourceName]
LogEntryTypeID Int16 (DT_I2)(@[System::SourceID] == @[System::PackageID] ? 1 : 2)

The screenshot below shows the ‘OnPreExecute’ event, which is the same as on the ‘OnPostExecute’ event. You can see the three variables required on the left hand side (note their scope) and also the Execute SQL Task that will call the stored procedure to add the log entry.

Screenshot - OnPreExecute

 

For the execution of the stored procedure to add the log entry I’ve used an ADO.NET connection due to the clarity of the parameter mapping.

The Execute SQL Task merely calls the stored procedure ‘LOG.AddLogEntry’ as shown in the screenshot below.

Screenshot - Execute SQL Task

 

The parameters are mapped as shown in the screenshot below.

Screenshot - Execute SQL Task Parameters

 

The PostExecute event is very similar to the PreExecute event and I won’t go into further details here.

I also like to catch the OnError event, and its implementation is very similar. The ‘LogEntryTypeID’ variable is still require, but it does not need to be an expression and its value can just be set to 100.

Screenshot - OnError

 

The parameters require are slightly different and are shown in the following screenshot.

Screenshot - OnError Parameters

 

Annotate a sample stored procedure to include logging.

Adding logging to long and complex stored procedures can be a really valuable aid in clearly identifying long running sections or for pinning down errors or unexpected flow.

Since we have already created the stored procedure for logging from stored procedures above, here I’ll quickly go through the changes that need to be made to add logging.

The principle behind the code for logging from a stored procedure is to minimise the impact to the logged stored procedure in terms of additional code. Thus, each time the LOG.AddProcedureLogEntry is called potentially two entries are added to the log, one to close of the previous entry, i.e. to mark the previous step as finished and a second to mark the current step as starting.

Three additional parameters can be added to your stored procedure. These will become self-explanatory below.

@log bit = 1
@log_parent_entity_id varchar(50) = null
@log_parent_entity_name varchar(2000) = null

 

Additional to the parameters you will need one variable to record the current step number.

Declare @log_proc_step smallint = 0

 

Finally I add three types of log calls within the stored procedures. Note, you can dispense with the ‘if’ and the additional parameter if you always want logging to be switched on. Most of the parameters are fairly clear, but @close_only is used to instruct the procedure, in the case of a step, to only close off the previous step and not to create a new step entry, and also to differentiate between the stored procedure header/footer logging.

 

Stored procedure start – marking the start of the execution of the stored procedure

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = Null, @close_only = 0, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

 

Note that @step_number is null, so this is for either the header/footer and @close_only = 0, meaning that this is for the header.

 

Step – marking the start of each step

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = @log_proc_step Output, @step_description = 'Fetch Process Run IDs' ,@close_only = 0, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

Note the @step_description which allows you to add a note for the step. This makes reviewing the log so much more simple and I would strongly recommend going to the effort to write this note.

 

Stored Procedure finish – marking the end of the execution of the stored procedure

If @log = 1 Exec LOG.AddProcedureLogEntry @proc_id = @@ProcID, @step_number = Null, @close_only = 1, @parent_entity_id = @log_parent_entity_id, @parent_entity_name = @log_parent_entity_name

Note that @step_number is null, so this is for either the header/footer and @close_only = 1, meaning that this is for the footer.

 

Review the logging output.

So, now all the logging is in place, what does the output look like. The screenshot below shows a section from a current project copied to Excel. The data is very simplistic, but you can write additional stored procedures to report out the logging information in a more informative way, including highlighting the % time that each step has used overall, using historical runs to identify the progress in terms of % of a current run, finding the top 10 slowest steps, etc.

Just for clarification I’ve coloured logging that has come from a stored procedures in pale purple.

Screenshot - Sample Log Results in Excel