Thursday, March 28, 2013

Basics of SQL Server Job Scheduling


To create and attach a schedule to a job

1.    In Object Explorer, connect to an instance of the SQL Server Database Engine, and then expand that instance.
2.    Expand SQL Server Agent, expand Jobs, right-click the job you want to schedule, and click Properties.
3.    Select the Schedules page, and then click New.
4.    In the Name box, type a name for the new schedule.
5.    Clear the Enabled check box if you do not want the schedule to take effect immediately following its creation.
6.    For Schedule Type, select one of the following:
·         Click Start automatically when SQL Server Agent starts to start the job when the SQL Server Agent service is started.
·         Click Start whenever the CPUs become idle to start the job when the CPUs reach an idle condition.
·         Click Recurring if you want a schedule to run repeatedly. To set the recurring schedule, complete the Frequency, Daily Frequency, and Duration groups on the dialog.
·         Click one time if you want the schedule to run only once. To set the One time schedule, complete the One-time occurrence group on the dialog.

To attach a schedule to a job

1.    In Object Explorer, connect to an instance of the SQL Server Database Engine, and then expand that instance.
2.    Expand SQL Server Agent, expand Jobs, right-click the job that you want to schedule, and click Properties.
3.    Select the Schedules page, and then click Pick.
4.    Select the schedule that you want to attach, and then click OK.
5.    In the Job Properties dialog box, double-click the attached schedule.
6.    Verify that Start date is set correctly. If it is not, set the date when you want for the schedule to start, and then click OK.
7.    In the Job Properties dialog box, click OK.

To prevent a scheduled job from running, you must do one of the following:

·         Disable the schedule.
·         Disable the job.
·         Detach the schedule from the job.
·         Stop the SQL Server Agent service.
·         Delete the schedule.
If the schedule is not enabled, the job can still run in response to an alert or when a user runs the job manually. When a job schedule is not enabled, the schedule is not enabled for any job that uses the schedule.
You must explicitly re-enable a schedule that has been disabled. Editing the schedule does not automatically re-enable the schedule.

Monday, March 25, 2013

Deleting Duplicate Data From SQL Server Table

Duplicate data is a very well known and frequently occurring issue in any DBMS and dealing with deletion of duplicate data is something not that easy for one specially one who is dealing with crucial and huge data.

There are 2 scenarios which can come while deleting duplicate data -

1. when we have an identity column in our table containing duplicates.

The problem becomes much easier here and we can just keep the data with MIN or MAX keys removing all other.


--Create New Table
create table Person
(
ID int identity(1,1),
Name varchar(30),
age int
)

--Insert Data into Table with multiple Duplicate entries

insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Gaurav',26)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)

--Check the data before deleting duplicates
select * from Person

DELETE FROM Person
WHERE ID NOT IN
(
SELECT MIN(ID) FROM Person
GROUP BY Name,age HAVING COUNT(1) > 1
UNION
SELECT MIN(ID) FROM Person
GROUP BY Name,age HAVING COUNT(1) = 1
)

--Check the data After deleting duplicates
select * from Person


Use Below Query --

2. when we don't have any unique identifier in our table and it's a bit tricky here -

there are 2 ways of doing it here again --

(i) using the Row_Number() function comes with only SQL Server 2005 and later versions


--Check if Table already exists

IF EXISTS(select * from INFORMATION_SCHEMA.TABLES where TABLE_NAME = 'Person')
BEGIN
      --Drop if already exists
      Drop Table Person
END

--Create New Table
create table Person
(
Name varchar(30),
age int
)

--Insert Data into Table with multiple Duplicate entries

insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Gaurav',26)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)

--Check the data before deleting duplicates
select * from Person

;WITH CTE (Name,Age, DuplicateCount)
AS
(
SELECT Name,Age,
ROW_NUMBER() OVER(PARTITION BY Name,Age ORDER BY Name) AS DuplicateCount
FROM Person
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO

--Check the data After deleting duplicates
select * from Person


(ii) Here is some piece of code which i will suggest as best to deal with this and can be used anywhere in even the older version of SQL Server like 2000 without using any special functions.


--Check if Table already exists

IF EXISTS(select * from INFORMATION_SCHEMA.TABLES where TABLE_NAME = 'Person')
BEGIN
      --Drop if already exists
      Drop Table Person
END

--Create New Table
create table Person
(
Name varchar(30),
age int
)

--Insert Data into Table with multiple Duplicate entries

insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Ashish',24)
insert into Person(Name,age) values('Gaurav',26)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)
insert into Person(Name,age) values('Nayan',25)

--Check the data before deleting duplicates
select * from Person

--Declare 2 variables as @name (which is duplicate key) and @duplicateCount as Number of Duplicates
DECLARE @name varchar(30), @duplicateCount int

--Declare a Cursor to iterate through each duplicate name
DECLARE Cursor_Person CURSOR FOR
select Name,(count(1)-1) cnt
from Person
group by Name
having count(1) > 1
OPEN Cursor_Person

FETCH NEXT FROM Cursor_Person INTO @name, @duplicateCount
WHILE @@FETCH_STATUS = 0
BEGIN

--delete all duplicate entries for each duplicate key keeping only one row
delete top (@duplicateCount) from Person where Name = @name

FETCH NEXT FROM Cursor_Person INTO @name, @duplicateCount
END
CLOSE Cursor_Person
DEALLOCATE Cursor_Person

--Check the data After deleting duplicates
select * from Person

Please do comment if you find this post useful, suggestion are appreciated.
Happy Coding !!



Generic Function for Padding a Character in SQL Server

Padding character to a string is something which we use quite often in our day to day development activities and is what i found as a very good generic way of doing this -

CREATE FUNCTION dbo.fn_PadValue
(
  @InValue varchar(50),
  @PadLimit int,
  @PadChar char(1),
  @PadSide char(1)
)
RETURNS varchar(50)
AS
BEGIN
  Declare @Result varchar(50)
  Declare @Padding int
  --Determine the amount of padding
  Set @Padding = @PadLimit - Len(@InValue)
  IF @Padding < 0
    Set @Padding = 0

  IF @PadSide = 'L' -- Pad the left side
    Set @Result = Replicate(@PadChar, @Padding) + @InValue

  IF @PadSide = 'R' -- Pad the right side
    Set @Result = @InValue + Replicate(@PadChar, @Padding)

  RETURN (@Result)
END

The Above code can be used in both LEFT and RIGHT side padding just by passing appropriate parameter to the function.

Usage --

select dbo.fn_PadValue('171',6,'0','L')         --For Left Side Padding
select dbo.fn_PadValue('171',6,'0','R')         --For Right Side Padding


References -
http://www.sqlservercentral.com/scripts/T-SQL/64358/

Thursday, March 21, 2013

Popular Date Time formats in SQL Server

We usually come across scenario's where different date time conversions and formats are required in our day to day development activities.

here are some useful format codes used in SQL Server -


select convert(varchar(20),GETDATE(),100) --Apr 6 2011 4:38AM
select convert(varchar(20),GETDATE(),101) --04/06/2011            (MM/DD/YYYY)
select convert(varchar(20),GETDATE(),102) --2011.04.06            (YYYY.MM.DD)
select convert(varchar(20),GETDATE(),103) --06/04/2011            (DD/MM/YYYY)
select convert(varchar(20),GETDATE(),104) --06.04.2011            (DD.MM.YYYY)
select convert(varchar(20),GETDATE(),105) --06-04-2011            (DD-MM-YYYY)
select convert(varchar(20),GETDATE(),106) --06 Apr 2011
select convert(varchar(20),GETDATE(),107) --Apr 06, 2011
select convert(varchar(20),GETDATE(),108) --04:42:19              (HH:MM:SS)
select convert(varchar(20),GETDATE(),109) --Apr 6 2011 4:42:47
select convert(varchar(20),GETDATE(),110) --04-06-2011            (MM-DD-YYYY)
select convert(varchar(20),GETDATE(),111) --2011/04/06            (YYYY/MM/DD)
select convert(varchar(20),GETDATE(),112) --20110406              (YYYYMMDD)
select convert(varchar(20),GETDATE(),113) --06 Apr 2011 04:44:10
select convert(varchar(20),GETDATE(),114) --04:44:25:583          (HH:MM:SS:sss)
select convert(varchar(20),GETDATE(),120) --2011-04-06 04:45:20

List all SQL Agent Jobs with atleast one SSIS Package used in it


Use [msdb]

SELECT sysjobs.name AS 'Job Name',
sysjobsteps.step_id AS 'Step Number',
sysjobsteps.command AS 'SSIS Package Location'
FROM sysjobs 
INNER JOIN
sysjobsteps ON sysjobs.job_id = sysjobsteps.job_id
where subsystem='ssis'
order by 1

Get the latest run duration of all jobs in SQL Server

The Below Query helps to find the run duration of all jobs in a particular server --


SELECT
    j.name,
    h.run_status,
    durationHHMMSS = STUFF(STUFF(REPLACE(STR(h.run_duration,7,0),
        ' ','0'),4,0,':'),7,0,':'),
    [start_date] = CONVERT(DATETIME, RTRIM(run_date) + ' '
        + STUFF(STUFF(REPLACE(STR(RTRIM(h.run_time),6,0),
        ' ','0'),3,0,':'),6,0,':'))
FROM
    msdb.dbo.sysjobs AS j
INNER JOIN
    (
        SELECT job_id, instance_id = MAX(instance_id)
            FROM msdb.dbo.sysjobhistory
            GROUP BY job_id
    ) AS l
    ON j.job_id = l.job_id
INNER JOIN
    msdb.dbo.sysjobhistory AS h
    ON h.job_id = l.job_id
    AND h.instance_id = l.instance_id
ORDER BY
    CONVERT(INT, h.run_duration) DESC,
    [start_date] DESC

Monday, March 11, 2013

SSIS vs. T-SQL

This is a common question which comes often before any developer to choose between the two as to follow which path to choose as SSIS or T-SQL for developing ETL solutions.

well i also got similar scenario some time back and was in a dilemma to decide this.

on a big picture it's always better to go with conventional way i.e. using T-SQL and not to go with the tool.
But on doing some more reading and research over many blogs and forums and also the development phase of my work over some 2 and a half odd years. I found some good reasons and explanations over this as not to always stick to a particular approach and do proper analysis of the requirements to take a decision.

Below are some of the key points which i was able to collect regarding this.

1. If the data does need to be persisted temporarily it can be to a raw file in SSIS whereas T-SQL requires temporary tables for which there may be a management/security overhead.


2. SSIS have a very good visual representation of the "work".

3. SSIS manages memory very efficiently, which can result in big performance improvements compared to     T-SQL.

4. Bad data can be captured to a different data sink for examination later in SSIS as it have an in-built debugger where we can attach the data in between the transformations without using any temporary object and can copy that data to excel file to analyze.

5.  Exception Handling is better in SSIS as compared to T-SQL.


6. Data-flows are, to some extent, self-documenting so improves code understand-ability as far as documentation is concerned.

7. User-controlled parallel execution of data-flows is possible where it isn't really in the inherent batch operated world of stored procedures.

8. "Heavy-lifting" of data can occur on a different to machine to that storing the data.
Regarding the performance impact on the source server, since the Stored Proc's have to do all the work on the server itself, while the data flow can be pulling data in an asynchronous way and doing the heavy lifting on a different machine.

9. By default, events containing very pertinent information such as "component "<component name>" (5824)" wrote 2398156 rows" are rose which might be very useful for logging and process tracking in case of any failures or logging scenarios.

10. For each Loop Container in Integration Services takes a data type of System.Object (not the physical SQL table), which is an in-memory object that is native to the Integration Services, which is very effective in terms of performance optimization as compared to Temp tables\tables variables\cursors.

Hope you find this post Useful.

Do comment if it you need any further information on this or if you find anything here which doesn't looks convincing.

Happy Learning !!