Saturday, January 9, 2021

One way to fix Failed to run web job error on Azure App Service

If you have just updated web job (see the following related articles: Upload WebJob to Azure App service with predefined schedule and How to remove Azure web job via Azure RM PowerShell) you may face with the following issue: when you will try to run updated web job you will get the error “Failed to run web job”. The problem is that error notification won’t contain any description and it will be hard to find the actual reason of this problem. In this article I will describe one of the reasons.

When web job is started it creates special lock file triggeredJob.lock in /data/jobs/triggered/{webJobName} folder of SCM site. You may check it if will connect to the site via FTP:

In the same folder there will be sub folders with timestamps which correspond to webjob runs. In these sub folders you will find output logs of each web job run – the same logs which are shown in Azure portal > App service > Web jobs > Logs. The purpose of this triggeredJob.lock file is to prevent launch of second instance of web job when previous instance didn’t finish yet. And the problem is that if during update web job was running this lock file may not be successfully deleted. As result when you will try to run updated version of web job you will get “Failed to run web job” error.

Solution will be to delete this file manually. However it is also not that straightforward. If you will try to remove it from FTP client you will get error

“The process cannot access the file because it is being used by another process”:

In order to delete it we need to stop both App Service and SCM site. Note that it is mandatory to stop both sites – if you will only stop App Service from Azure portal triggeredJob.lock will be still used by the process. Stopping of SCM site is more tricky than App Service. The process is described here: Full stopping a Web App. You need to go to Resource Explorer (azure.com) and select your App service. After that in JSON view on the right side click Edit and do 2 things:

  1. Change “state” from “Running” to “Stopped” – this is the same as if you would stop App service form Azure portal
  2. Find “scmSiteAlsoStopped” property and set it from “false” to “true” – it will stop SCM file

After that click PUT button on the top. It will stop both sites and you will be able to delete triggeredJob.lock now. Then go to Azure portal and start App service – it will start both sites. After all these steps you should be able to run web job again.