List Failed SQL Server Agent Jobs, with Restart Command
Managing a large number of SQL Server instances, with many SQL Server Agent jobs on each instance, can provide a large amount of tedious work when many jobs fail simultaneously. For instance, if you run backups to a file share, but the file server is restarted during your backups, every running backup job will fail. Failed SQL Server Agent jobs can be really, really painful, if you have to point-and-click on every failed job!
The script below returns a single row for each SQL Server Agent job that has failed its most recent run. Failed SQL Server Agent Jobs are only included in the list if they have failed within the past day, and are scheduled for a future run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
;WITH sys_job_history AS ( SELECT * , rn = ROW_NUMBER() OVER (PARTITION BY sjh.job_id ORDER BY msdb.dbo.agent_datetime(sjh.RUN_DATE, sjh.RUN_TIME) DESC) FROM msdb.dbo.sysjobhistory sjh WHERE sjh.step_id > 0 AND sjh.run_date > CONVERT(int, CONVERT(varchar(8), DATEADD(DAY, -1, GETDATE()), 112)) ) SELECT sj.name , sjh.step_name , sjh.step_id , sjh.sql_message_id , sjh.sql_severity , sjh.message , sjh.server , RunDateTime = msdb.dbo.agent_datetime(sjh.run_date, sjh.run_time) , StartJobCommand = N'EXEC msdb.dbo.sp_start_job @job_name = ''' + sj.name + N''', @step_name = ''' + sjh.step_name + N''';' FROM sys_job_history sjh INNER JOIN msdb.dbo.sysjobs sj ON sjh.job_id = sj.job_id INNER JOIN msdb.dbo.sysjobschedules sjsch ON sj.job_id = sjsch.job_id INNER JOIN msdb.dbo.sysschedules ss ON sjsch.schedule_id = ss.schedule_id WHERE sjh.rn = 1 AND sjh.run_status = 0 --failed AND ss.enabled = 1 ORDER BY sj.name; |
The last column in the output contains the sp_start_job command you’d need to run to manually restart the job at the failed step. In the case of the backup file server being rebooted in the middle of your backups, you could simply copy-and-paste that entire column into a new query window to quickly and easily restart all failed backups. Obviously, you’ll want to validate the commands before blindly running them in case you have some jobs that shouldn’t run except during the precise window where they are scheduled.
As an aside, a nice mitigation for failed SQL Server Agent Jobs that fail frequently is to add an automatic retry to the job step, as shown in this snippet:
I hope this post helps ease your SQL Server Agent management pain!
This script is part of our ongoing series on SQL Server Agent.