Unreported SQL Server Agent Job Failures
SQL Server Agent provides an easy-to-configure framework for notifying DBAs when jobs fail to run for any reason. Unfortunately, there are circumstances where job failures may not result in SQL Server sending a notification. For instance, the job may have been misconfigured, or perhaps the email server was down when the job failed. Either way, you think everything is fine, but that critical backup job hasn’t ran in 3 days and you’re about to need a résumé update.
Instead of thinking everything is fine, run these queries against your SQL Server instances on a regular basis to ensure you’re not missing notifications for jobs that have failed, or have been misconfigured.
This quick query shows jobs that haven’t been configured to notify the operator via email. You’ll want to look at these jobs and ensure they have been configured properly.
1 2 3 4 5 6 7 8 9 10 11 |
SELECT [ServerName] = @@SERVERNAME , [JobName] = [sj].[name] FROM [msdb].[dbo].[sysjobs] [sj] WHERE [sj].[notify_level_email] = 0 OR [sj].[notify_email_operator_id] = 0 ORDER BY [sj].[name]; |
This query shows jobs that have failed over the past 30 days, where no notification was sent, even though the job is configured to notify the operator via email.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
DECLARE @cutoff_date int = CONVERT(int, CONVERT(varchar(8), GETDATE(), 112)) - 30; SELECT [ServerName] = [jh].[server] , [JobName] = [j].[name] , [JobStep] = [js].[step_id] , [JobStepName] = [js].[step_name] , [SQLSeverity] = [jh].[sql_severity] , [JobCommand] = [js].[command] , [MostRecentFailure] = MAX(CONVERT(nvarchar(30), [msdb].[dbo].[agent_datetime]([jh].[run_date], [jh].[run_time]), 120)) , [NumberOfErrors] = COUNT_BIG(1) FROM [msdb].[dbo].[sysjobhistory] [jh] INNER JOIN [msdb].[dbo].[sysjobs] [j] ON [jh].[job_id] = [j].[job_id] INNER JOIN [msdb].[dbo].[sysjobsteps] [js] ON [j].[job_id] = [js].[job_id] AND [jh].[step_id] = [js].[step_id] WHERE [jh].[run_date] > @cutoff_date AND [jh].[run_status] = 0 /* Failed */ AND [jh].[step_id] = 0 /* (Job outcome) */ AND ( [jh].[operator_id_emailed] = 0 AND [j].[notify_level_email] > 0 AND [j].[notify_email_operator_id] > 0 ) GROUP BY [jh].[server] , [j].[name] , [js].[step_id] , [js].[step_name] , [jh].[sql_severity] , [js].[command] ORDER BY [jh].[server] , [j].[name]; |
I’ve seen a surprising number of unreported job failures across a variety of SQL Server versions, that can’t be reliably proven to be a result of the email server being unavailable, so I’m not certain what is causing properly-configured jobs to fail without an email being sent.
In case you haven’t seen it, Microsoft has great documentation for setting up SQL Server Agent Job Notifications.
This post is part of our series on SQL Server Maintenance
[…] that ensures your recovery point objective can be met. So you schedule log backups to occur, setup email notifications so you can respond when those log backups fail (and they invariably will at some point!), then go […]