Archive
Create SQL for Temp Table
Sometimes it is really useful to work with temp tables and to create them up front rather than using ‘Select Into’, but they can be a pain create for a couple of reasons:
- The temp table has loads of columns and you really don’t want to sit and type them all in.
- You are not aware of the structure of the source data.
The following stored procedure takes the hassle away. Merely select some data (use Select Top(0) for speed) into a temp table that has the desired structure and then execute the stored procedure below to generate the SQL will create the temp table.
Anyway, here’s the code.
Create Procedure dbo.GenerateTempTableSQL /****************************************************************************** Description: Given a temp table the sql required to create a new temp table of the same structure is generated. This is useful during development to select into a temp table and then generate the code that could be used to create the temp table. In: @input_temp_table_name The name of the temp table to use as the template the generate the SQL from. @output_temp_table_name [optional: @input_temp_table_name] The name of the table that is used in the create table script. If left as null, then the @input_temp_table_name will be used. @database_collation [optional: current database collation] The collation to use for char based columns. History: 16-Oct-2015 S.Wilber Created 09-Nov-2015 S.Wilber Updated to include # in temp table name if it had been ommitted. 14-Jul-2016 S.Wilber Updated to work from temp tables instead of permanent tables ******************************************************************************/ @input_temp_table_name sysname ,@output_temp_table_name sysname = null ,@database_collation sysname = null As -- declare constants Declare @reserved_words varchar(max) = '|ADD|ALL|ALTER|AND|ANY|AS|ASC|AUTHORIZATION|BACKUP|BEGIN|BETWEEN|BREAK|BROWSE|BULK|BY|CASCADE|CASE|CHECK|CHECKPOINT|CLOSE|CLUSTERED|COALESCE|COLLATE|COLUMN|COMMIT|COMPUTE|CONSTRAINT|CONTAINS|CONTAINSTABLE|CONTINUE|CONVERT|CREATE|CROSS|CURRENT|CURRENT_DATE|CURRENT_TIME|CURRENT_TIMESTAMP|CURRENT_USER|CURSOR|DATABASE|DBCC|DEALLOCATE|DECLARE|DEFAULT|DELETE|DENY|DESC|DISK|DISTINCT|DISTRIBUTED|DOUBLE|DROP|DUMP|ELSE|END|ERRLVL|ESCAPE|EXCEPT|EXEC|EXECUTE|EXISTS|EXIT|EXTERNAL|FETCH|FILE|FILLFACTOR|FOR|FOREIGN|FREETEXT|FREETEXTTABLE|FROM|FULL|FUNCTION|GOTO|GRANT|GROUP|HAVING|HOLDLOCK|IDENTITY|IDENTITY_INSERT|IDENTITYCOL|IF|IN|INDEX|INNER|INSERT|INTERSECT|INTO|IS|JOIN|KEY|KILL|LEFT|LIKE|LINENO|LOAD|MERGE|NATIONAL|NOCHECK|NONCLUSTERED|NOT|NULL|NULLIF|OF|OFF|OFFSETS|ON|OPEN|OPENDATASOURCE|OPENQUERY|OPENROWSET|OPENXML|OPTION|OR|ORDER|OUTER|OVER|PERCENT|PIVOT|PLAN|PRECISION|PRIMARY|PRINT|PROC|PROCEDURE|PUBLIC|RAISERROR|READ|READTEXT|RECONFIGURE|REFERENCES|REPLICATION|RESTORE|RESTRICT|RETURN|REVERT|REVOKE|RIGHT|ROLLBACK|ROWCOUNT|ROWGUIDCOL|RULE|SAVE|SCHEMA|SECURITYAUDIT|SELECT|SEMANTICKEYPHRASETABLE|SEMANTICSIMILARITYDETAILSTABLE|SEMANTICSIMILARITYTABLE|SESSION_USER|SET|SETUSER|SHUTDOWN|SOME|STATISTICS|SYSTEM_USER|TABLE|TABLESAMPLE|TEXTSIZE|THEN|TO|TOP|TRAN|TRANSACTION|TRIGGER|TRUNCATE|TRY_CONVERT|TSEQUAL|UNION|UNIQUE|UNPIVOT|UPDATE|UPDATETEXT|USE|USER|VALUES|VARYING|VIEW|WAITFOR|WHEN|WHERE|WHILE|WITH|WITHIN GROUP|WRITETEXT|' -- declare variables Declare @sql nvarchar(max) -- table to store the columns If Object_ID('tempdb..#temp') Is Not Null Drop Table #temp Create Table #temp ( column_id int not null ,column_name nvarchar(130) null ,type_name nvarchar(130) not null ,max_length smallint not null ,scale tinyint not null ,[precision] tinyint not null ,is_nullable bit null ,collation_name sysname null ); -- table to store the generated SQL Declare @output_sql table (line_id int identity(1, 1), sql nvarchar(max)) Declare @object_id int -- set the output name to be the input name if it is null or blank or whitespace If NullIf(LTrim(RTrim(@output_temp_table_name)), '') Is Null Set @output_temp_table_name = @input_temp_table_name -- set the collation if it's not set If NullIf(LTrim(RTrim(@database_collation)), '') Is Null Set @database_collation = Cast(DatabasePropertyEx(db_name(), 'collation') As nvarchar(128)) -- add a # to the input temp table name if it does not already start with # If Left(@input_temp_table_name, 1) != '#' Set @input_temp_table_name = '#' + @input_temp_table_name -- add a # to the output temp table name if it does not already start with # If Left(@output_temp_table_name, 1) != '#' Set @output_temp_table_name = '#' + @output_temp_table_name -- get the object id Select @object_id = Object_ID('tempdb..' + @input_temp_table_name) If @object_id Is Null Begin RaisError('''%s'' cannot be found.', 11, 1, @input_temp_table_name) Return End -- generate the sql Insert @output_sql (sql) Select 'If Object_ID(''tempdb..' + @output_temp_table_name + ''') Is Not Null Drop Table ' + @output_temp_table_name Insert @output_sql (sql) Select 'Create Table ' + @output_temp_table_name Insert @output_sql (sql) Select '(' Insert #temp (column_id, column_name, type_name, max_length, scale, precision, is_nullable, collation_name) Select c.column_id ,Case When CharIndex('|' + c.name + '|', @reserved_words) != 0 Then '[' + c.name + ']' Else c.name End As column_name ,Case When CharIndex('|' + t.name + '|', @reserved_words) != 0 Then '[' + t.name + ']' Else t.name End As type_name ,c.max_length ,c.scale ,c.precision ,c.is_nullable ,c.collation_name From tempdb.sys.objects o Inner Join tempdb.sys.schemas s On (s.schema_id = o.schema_id) Inner Join tempdb.sys.columns c On (o.object_id = c.object_id) Inner Join tempdb.sys.types t On (t.user_type_id = c.user_type_id) Where o.object_id = @object_id Insert @output_sql (sql) Select Case When column_id = 1 Then space(4) Else space(3) + ',' End + column_name + ' ' + type_name + Case When Lower(type_name) In ('datetimeoffset', 'datetime2', 'time') Then '(' + Cast(IsNull(c.scale, '') As varchar(10)) + ')' When Lower(type_name) In ('decimal', 'numeric') Then '(' + Cast(IsNull(c.precision, '') As varchar(3)) + ', ' + Cast(IsNull(c.scale, '') As varchar(10)) + ')' When Lower(type_name) In ('varbinary', 'binary') Then '(' + Case When c.max_length = -1 Then 'Max' Else Cast(c.max_length As varchar(10)) End + ')' When Lower(type_name) In ('char', 'varchar', 'nchar', 'nvarchar') Then '(' + Case When c.max_length = -1 Then 'Max' When type_name In ('nchar', 'nvarchar') Then Cast(c.max_length / 2 As varchar(10)) Else Cast(IsNull(c.max_length, '') As varchar(10)) End + ')' + Case When c.collation_name != @database_collation Then ' Collate ' + c.collation_name Else '' End Else '' End + Case When c.is_nullable = 0 Then ' not null' Else ' null' End From #temp c Order By column_id Insert @output_sql (sql) Select ');' Select sql From @output_sql Order By line_id
Clear All Tables Within a Schema
Often as part of an ETL process I want to simply clear out all the table within a schema, e.g. ‘landing’ or ‘etl’.
However, as new tables are added to the schema, the ‘clear out’ code needs to be maintained. In addition, if any referential integrity exists, the tables will need to be cleared in a certain order. Finally, the Truncate Table statement is significantly faster than a simple Delete, but there are restrictions on when it can be used.
All of which led me to create the following stored procedure so that I don’t have to think these concerns again, allowing me time to focus on the core of the ETL work.
Feel free to use/copy/amend as appropriate, but please let me have your comments/updates.
Create Procedure etl.ClearSchemaTables /****************************************************************************** Description: Clears down any tables in the target schema. Tables are cleared down in referential order using truncate where possible. In: @target_schema_name The name of the schema whose tables should be cleared, e.g. 'landing'. History: 03-Aug-2015 S.Wilber Created ******************************************************************************/ @target_schema_name sysname As Set NoCount On -- declare variables Declare @object_id int ,@table_name sysname ,@schema_name sysname ,@is_referenced bit ,@success bit ,@sql nvarchar(max) If Object_ID('tempdb..#table_dependencies') Is Not Null Drop Table #table_dependencies Create Table #table_dependencies ( source_object_id int ,dependancy_object_id int ) -- declare tables If Object_ID('tempdb..#tables') Is Not Null Drop Table #tables Create Table #tables ( object_id int ,schema_name sysname ,table_name sysname ,is_referenced bit default(0) ,is_processed bit default(0) ) -- find all tables in the schema - look for fk references that are not from self Insert #tables (object_id, schema_name, table_name, is_referenced) Select t.object_id ,s.name ,t.name ,Case When fk.object_id Is Not Null Then 1 Else 0 End From sys.tables t Inner Join sys.schemas s On (s.schema_id = t.schema_id) Left Join sys.foreign_keys fk On (fk.referenced_object_id = t.object_id and fk.parent_object_id != t.object_id) Where s.name = @target_schema_name -- retrieve all interdependancies between batch managed tables. Insert #table_dependencies (source_object_id, dependancy_object_id) Select fk.parent_object_id ,fk.referenced_object_id From sys.foreign_keys fk Inner Join #tables s On (s.object_id = fk.parent_object_id) Inner Join #tables t On (t.object_id = fk.referenced_object_id) -- loop through all the tables delete the content, either via a truncate or delete depending on foreign keys While Exists (Select 1 From #tables Where is_processed = 0) Begin -- find the first table where all of the dependant tables have been processed Select @object_id = object_id ,@schema_name = schema_name ,@table_name = table_name ,@is_referenced = is_referenced From #tables sbt Where is_processed = 0 And Not Exists ( Select 1 From #table_dependencies td Inner Join #tables tbt On (tbt.object_id = td.source_object_id) Where td.dependancy_object_id = sbt.object_id And tbt.is_processed = 0 ) -- if none found, then raise an error as it seems that we have a circular dependancy that we can't resolve If @object_id Is Null Begin RaisError ('A table where all dependant tables have been processed cannot be found. This implies that circular dependencies exist. Batch rollback cannot continue.', 11, 0) End Begin Try -- if referenced then we need to perform a delete otherwise we can perform a truncate If @is_referenced = 1 Begin Set @sql = 'Delete From [' + @schema_name + '].[' + @table_name + ']' End Else Begin Set @sql = 'Truncate Table [' + @schema_name + '].[' + @table_name + ']' End -- exeute the sql Exec sp_executesql @stmt = @sql End Try Begin Catch -- flag as unsuccessful Set @success = 0 -- declare variables Declare @error_description varchar(2000) ,@error_severity int ,@error_state int -- fetch the error details Select @error_description = ERROR_MESSAGE(), @error_severity = ERROR_SEVERITY(), @error_state = ERROR_STATE() -- throw the error RaisError (@error_description, @error_severity, @error_state) End Catch -- update the table to state that that it has been processed Update #tables Set is_processed = 1 Where object_id = @object_id End
Temporarily Disabling Referential Integrity Constraints on a SQL Table
I recently came across a situation where part of the data processing would put the data temporarily into a non-referential state. In order to accommodate this, without scripting Drop/Create statements which could get out of sync with other development work, I put together the following stored procedure which enables/disables check constraints.
Feel free to use/copy/amend as appropriate, but please let me have your comments/updates.
Create Procedure [etl].[SetCheckContraintsState] /****************************************************************************** Description: Enables/Disables any check contraints that reference the target table. This is useful for when referenced data is going to be temporarily modified in a way that will invalidate any check constraints. History: 02-Jun-2015 S.Wilber Created ******************************************************************************/ @target_schema_name sysname ,@target_table_name sysname ,@do_set_enabled bit As -- declare variables Declare @constraint_name sysname ,@referencing_schema_name sysname ,@referencing_table_name sysname ,@sql nvarchar(max) Declare @constraints table ( ConstraintName sysname ,ReferencingSchemaName sysname ,ReferencingTableName sysname ) Insert @constraints (ConstraintName, ReferencingSchemaName, ReferencingTableName) Select fk.name As ConstraintName ,sch.name As ReferencingSchemaName ,t.name As ReferencingTableName From sys.foreign_keys fk Inner Join sys.tables t On (t.object_id = fk.parent_object_id) Inner Join sys.schemas sch On (sch.schema_id = t.schema_id) Where fk.referenced_object_id = Object_ID(QuoteName(@target_schema_name) + '.' + QuoteName(@target_table_name)) Declare constraint_cursor Cursor Fast_Forward For Select ConstraintName ,ReferencingSchemaName ,ReferencingTableName From @constraints Open constraint_cursor Fetch From constraint_cursor Into @constraint_name, @referencing_schema_name, @referencing_table_name While @@Fetch_Status = 0 Begin If @do_set_enabled = 1 Set @sql = 'Alter Table ' + QuoteName(@referencing_schema_name) + '.' + QuoteName(@referencing_table_name) + ' With Check Check Constraint ' + QuoteName(@constraint_name) Else Set @sql = 'Alter Table ' + QuoteName(@referencing_schema_name) + '.' + QuoteName(@referencing_table_name) + ' NoCheck Constraint ' + QuoteName(@constraint_name) Exec sp_executesql @sql -- get the next constraint Fetch From constraint_cursor Into @constraint_name, @referencing_schema_name, @referencing_table_name End Close constraint_cursor Deallocate constraint_cursor
Listing duplicates from the same table without repetition
I recently came across what was apparently a simple problem, but which had me initially stumped. The problem was, how to list out duplicates within the same table without repetition. For example if rows A and B in a table are deemed to have matched, then selecting from the table joined to itself will give two results, A matches B and B matches A. Both are valid results, but it effectively doubles the number of duplicates reported. I only needed one of the two results.
The solution turned out to be simple, of course, but since I spent some time thinking it through I thought it worth creating a quick post.
Walking through a simple example, create the following sample table.
CREATE TABLE dbo.DuplicateSample( id int IDENTITY(1,1) NOT NULL, name varchar(50) NOT NULL )
Throw a load of sample data into it.
Insert dbo.DuplicateSample (name) Values('Bob') Insert dbo.DuplicateSample (name) Values('Paul') Insert dbo.DuplicateSample (name) Values('James') Insert dbo.DuplicateSample (name) Values('Claire') Insert dbo.DuplicateSample (name) Values('Alex') Insert dbo.DuplicateSample (name) Values('Bob') Insert dbo.DuplicateSample (name) Values('Rebecca') Insert dbo.DuplicateSample (name) Values('Paul') Insert dbo.DuplicateSample (name) Values('Andrew') Insert dbo.DuplicateSample (name) Values('Giles') Insert dbo.DuplicateSample (name) Values('Terry') Insert dbo.DuplicateSample (name) Values('Robert') Insert dbo.DuplicateSample (name) Values('Ralph') Insert dbo.DuplicateSample (name) Values('Alex') Insert dbo.DuplicateSample (name) Values('Heather') Insert dbo.DuplicateSample (name) Values('Alice') Insert dbo.DuplicateSample (name) Values('Heather') Insert dbo.DuplicateSample (name) Values('Oliver') Insert dbo.DuplicateSample (name) Values('Jack') Insert dbo.DuplicateSample (name) Values('Harry') Insert dbo.DuplicateSample (name) Values('Alfie') Insert dbo.DuplicateSample (name) Values('Charlie') Insert dbo.DuplicateSample (name) Values('Thomas') Insert dbo.DuplicateSample (name) Values('William') Insert dbo.DuplicateSample (name) Values('Joshua') Insert dbo.DuplicateSample (name) Values('George') Insert dbo.DuplicateSample (name) Values('James')
Now our initial attempt at selecting duplicates might be something like the following.
Select ds1.id As id1 ,ds1.name As name1 ,ds2.id As id2 ,ds2.name As name2 From DuplicateSample ds1 Inner Join DuplicateSample ds2 On (ds1.name = ds2.name And ds1.id != ds2.id)
This will give us the following results.
id1 | name1 | id2 | name2 |
---|---|---|---|
6 | Bob | 1 | Bob |
8 | Paul | 2 | Paul |
27 | James | 3 | James |
14 | Alex | 5 | Alex |
1 | Bob | 6 | Bob |
2 | Paul | 8 | Paul |
5 | Alex | 14 | Alex |
17 | Heather | 15 | Heather |
15 | Heather | 17 | Heather |
3 | James | 27 | James |
Note that all the rows are shown twice, but switched round, i.e. Bob 1 matches Bob 6, but also Bob 6 matches Bob 1.
So, how do we solve this? By adding in an order to the rows and only matching ahead.
;With cte_DuplicateSample As ( Select Row_Number() Over (Order By id) As row_number ,id ,name From DuplicateSample ) Select ds1.id As id1 ,ds1.name As name1 ,ds2.id As id2 ,ds2.name As name2 From cte_DuplicateSample ds1 Inner Join cte_DuplicateSample ds2 On (ds1.name = ds2.name And ds1.row_number < ds2.row_number)
This will give us the following, much better, results.
id1 | name1 | id2 | name2 |
---|---|---|---|
1 | Bob | 6 | Bob |
2 | Paul | 8 | Paul |
5 | Alex | 14 | Alex |
15 | Heather | 17 | Heather |
3 | James | 27 | James |
The CTE selects out the same source data, but adds in a row_number, then in the joining clause we only match for rows that are ahead of the current row, i.e. when the row_number is greater that the current row_number.
Simple in the end.