Recently in one of our projects,
we experienced some strange errors from
Delayed::Job workers started successfully,
but when they were starting to lock the jobs, workers failed
PG::Error: no connection to server or
PG::Error: FATAL: invalid frontend message type 60errors.
We started isolating the problem and digging through the recent changes we had made to the project. Since the last release the only significant modification had been made to internationalization. We had started using I18n-active_record .
for Delayed Job we had extra check as
After some serious searching and digging through both
Delayed::Job source code and how we were using to setup its config, we started noticing some issues.
The first thing we found was that the problem did not turn up when delayed job workers were started using
rake jobs:work task.
After looking at DelayedJob internals we found that the main difference between a rake task and a binstub was in the
fork method that was invoked in the binstub version.
The binstub version was being executed seamlessly using
Daemons#run_process method and had a slightly different lifecycle of execution.
Let’s take a look into DelayedJob internals before proceeding. DelayedJob has systems of the hooks that can be used by plugin-writers and in our applications.
All this events functionality is hidden in
Delayed::Lifecycle class. Each worker has its own instance of that class.
So, which events exactly do we have here?
You can setup callbacks to be run on
around events simply using
Let’s move on to our problem. It turned out that
delayed job active record gem was closing all
database connections in
before_fork hook and reestablishing them in
It was clear that I18n-active-record did not play well with this, causing the issue at hand.
We looked into DelayedJob lifecycle and chose
before :execute hook, which was executed after all DelayedJob ActiveRecord backend connections manipulations.
Finally the locales initializer for delayed_job workers was changed to match as below:
This helped us to mitigate the connection errors, and connections stopped dying abruptly.