2021-04-29

Hangfire Recurring Jobs Break

I've got a HangFire setup in a .NET Core app. I've got several recurring jobs that are set to run every 15 minutes. In the Set table, when the scheduling is working correctly, you can see the next run time in Epoch format:

enter image description here

However, seemingly at random, the scheduling seems to die, looking more like this instead:

enter image description here

As per @Satpal's suggestion, I connected the dashboard and that has shed some more light on the situation.

System.TypeLoadException
Could not load type 'X.API.Controllers.YController' from assembly 'X.API, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.

System.TypeLoadException: Could not load type 'X.API.Controllers.YController' from assembly 'X.API, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.
   at System.Reflection.RuntimeAssembly.GetType(QCallAssembly assembly, String name, Boolean throwOnError, Boolean ignoreCase, ObjectHandleOnStack type, ObjectHandleOnStack keepAlive, ObjectHandleOnStack assemblyLoadContext)
   at System.Reflection.RuntimeAssembly.GetType(String name, Boolean throwOnError, Boolean ignoreCase)
   at Hangfire.Common.TypeHelper.TypeResolver(Assembly assembly, String typeName, Boolean ignoreCase)
   at System.TypeNameParser.ResolveType(Assembly assembly, String[] names, Func`4 typeResolver, Boolean throwOnError, Boolean ignoreCase, StackCrawlMark& stackMark)
   at System.TypeNameParser.ConstructType(Func`2 assemblyResolver, Func`4 typeResolver, Boolean throwOnError, Boolean ignoreCase, StackCrawlMark& stackMark)
   at System.TypeNameParser.GetType(String typeName, Func`2 assemblyResolver, Func`4 typeResolver, Boolean throwOnError, Boolean ignoreCase, StackCrawlMark& stackMark)
   at System.Type.GetType(String typeName, Func`2 assemblyResolver, Func`4 typeResolver, Boolean throwOnError)
   at Hangfire.Common.TypeHelper.DefaultTypeResolver(String typeName)
   at Hangfire.Storage.InvocationData.DeserializeJob()

I know that the method being called by the recurring job is available and working currently, as manually rescheduling one of the failed jobs has worked as expected. My suspicion is the methods are unavailable when deploying updates - the period in which the app service is restarting is maybe enough for the jobs to fail, and die. I believe this is supported by a section I found in Hangfire's documentation, where it notes that:

background processing will be stopped after 10 retry attempts with increasing delay modifier

Two questions that I have regarding this are:

  1. What is the best way to be notified of job failure? A quick google seems to suggest adding logging to Hangfire and using an error log event to trigger an email? Is there a better way?
  2. Is there a way to automatically retrigger recurring jobs that've been stopped due to job failure?

Thanks!



from Recent Questions - Stack Overflow https://ift.tt/3t0fDsC
https://ift.tt/3e08NiI

No comments:

Post a Comment