Sidekiq is a background job processing library for Ruby. Sidekiq offers three versions: OSS, Pro and Enterprise.
OSS is free and open source and has basic features. Pro and Enterprise versions are closed source and paid, thus comes with more advanced features. To compare the list of features offered by each of these versions, please visit Sidekiq website.
Sidekiq Pro 3.4.0
to reliably fetch jobs from the queue in Redis.
In this post, we will discuss the benefits of using
Open source version of Sidekiq comes with
Let’s see an example to understand how it works.
Let’s add Sidekiq to our
Gemfile and run
bundle install to install it.
Add following Sidekiq worker in
This worker does nothing great but sleeps for 30 seconds.
Let’s open Rails console and schedule this worker to run as a background job asynchronously.
As we can see, queue now has 1 job scheduled to be processed.
Let’s start Sidekiq in another terminal tab.
As we can see, the job with ID
was picked up by Sidekiq
and it started processing the job.
If we check the Sidekiq queue size in the Rails console, it will be zero now.
Let’s shutdown the Sidekiq process gracefully
while Sidekiq is still in the middle of processing our scheduled job.
Ctrl-C or run
kill -SIGINT <PID> command.
As we can see, Sidekiq pushed back the unfinished job back to Redis queue
when Sidekiq received a
Let’s verify it.
Before we move on, let’s learn some basics about signals such as
A crash course on POSIX signals
SIGINT is an interrupt signal.
It is an alternative to hitting
Ctrl-C from the keyboard.
When a process is running in foreground,
we can hit
Ctrl-C to signal the process to shut down.
When the process is running in background,
we can use
kill command to send a
SIGINT signal to the process’ PID.
A process can optionally catch this signal and shutdown itself gracefully.
If the process does not respect this signal and ignores it,
then nothing really happens and the process keeps running.
SIGINT are identical signals.
Another useful signal is
It is called a termination signal.
A process can either catch it
and perform necessary cleanup or just ignore it.
Similar to a
if a process ignores this signal, then the process keeps running.
Note that, if no signal is supplied to the
SIGTERM is used by default.
SIGTERM are identical signals.
TSTP is called terminal stop signal.
It is an alternative to hitting
Ctrl-Z on the keyboard.
This signal causes a process to suspend further execution.
SIGKILL is known as kill signal.
This signal is intended to kill the process immediately and forcefully.
A process cannot catch this signal,
therefore the process cannot perform cleanup or graceful shutdown.
This signal is used
when a process does not respect and respond
9 are identical signals.
There are a lot of other signals besides these, but they are not relevant for this post. Please check them out here.
A Sidekiq process pays respect
to all of these signals and behaves as we expect.
When Sidekiq receives a
Sidekiq terminates itself gracefully.
Back to our example
Coming back to our example from above,
we had sent a
SIGINT signal to the Sidekiq process.
On receiving this
Sidekiq process having PID 40510 terminated quiet workers,
paused the queue and waited for a while
to let busy workers finish their jobs.
Since our busy SleepWorker did not finish quickly,
Sidekiq terminated that busy worker
and pushed it back to the queue in Redis.
After that, Sidekiq gracefully terminated itself with an exit code 0.
Note that, the default timeout is 8 seconds
until which Sidekiq can wait to let the busy workers finish
otherwise it pushes the unfinished jobs back to the queue in Redis.
This timeout can be changed with
given at the startup of Sidekiq process.
to send a
TSTP and a
to ensure that the Sidekiq process shuts down safely and gracefully.
On receiving a
Sidekiq stops pulling new work
finishes the work which is in-progress.
The idea is to first send a
wait as much as possible (by default for 8 seconds as discussed above)
to ensure that busy workers finish their jobs
and then send a
to shutdown the process.
Sidekiq pushes back the unprocessed job in Redis when terminated gracefully. It means that Sidekiq pulls the unfinished job and starts processing again when we restart the Sidekiq process.
We can see that Sidekiq pulled the previously terminated job
5d8bf898c36a60a1096cf4d3 and processed that job again.
So far so good.
This behavior is implemented using
strategy which is present in the open sourced version of Sidekiq.
Sidekiq uses BRPOP Redis command
to fetch a scheduled job from the queue.
When a job is fetched,
that job gets removed from the queue and
that job no longer exists in Redis.
If this fetched job is processed, then all is good.
Also, if the Sidekiq process is terminated gracefully on
receiving either a
SIGINT or a
Sidekiq will push back the unfinished jobs back to the queue in Redis.
But what if the Sidekiq process crashes in the middle while processing that fetched job?
A process is considered as crashed
if the process does not shutdown gracefully.
As we discussed before,
when we send a
SIGKILL signal to a process,
the process cannot receive or catch this signal.
Because the process cannot shutdown gracefully and nicely,
it gets crashed.
When a Sidekiq process is crashed, the fetched jobs by that Sidekiq process which are not yet finished get lost forever.
Let’s try to reproduce this scenario.
We will schedule another job.
Now, let’s start Sidekiq process and kill it using
Let’s check if Sidekiq had pushed the busy (unprocessed) job back to the queue in Redis before terminating.
No. It does not.
Actually, the Sidekiq process did not get a chance to shutdown gracefully
when it received the
If we restart the Sidekiq process, it cannot fetch that unprocessed job since the job was not pushed back to the queue in Redis at all.
the job having name argument as
B or ID as
is completely lost.
There is no way to get that job back.
Losing job like this may not be a problem for some applications but for some critical applications this could be a huge issue.
We faced a similar problem like this.
One of our clients’ application is deployed on a Kubernetes cluster.
Our Sidekiq process runs in a Docker container
in the Kubernetes
which we call
Here’s our stripped down version of Kubernetes deployment manifest which creates a Kubernetes deployment resource. Our Sidekiq process runs in the pods spawned by that deployment resource.
When we apply an updated version of this manifest ,for say, changing the Docker image, the running pods are terminated and new pods are created.
Before terminating the only container in the pod,
sidekiqctl stop $pid 60 command
which we have defined using the
Note that, Kubernetes already sends
to the container being terminated inside the pod
before invoking the
preStop event handler.
The default termination grace period is 30 seconds and it is configurable.
If the container doesn’t terminate within the termination grace period,
SIGKILL signal will be sent to forcefully terminate the container.
sidekiqctl stop $pid 60 command executed in the
preStop handler does
- Sends a
SIGTERMsignal to the Sidekiq process running in the container.
- Waits for 60 seconds.
- Sends a
SIGKILLsignal to kill the Sidekiq process forcefully if the process has not terminated gracefully yet.
This worked for us when the count of busy jobs was relatively small.
When the number of processing jobs is higher, Sidekiq does not get enough time to quiet the busy workers and fails to push some of them back on the Redis queue.
We found that some of the jobs were getting lost
background pod restarted.
We had to restart our background pod for
reasons such as
updating the Kubernetes deployment manifest,
pod being automatically evicted by Kubernetes
due to host node encountering OOM (out of memory) issue, etc.
We tried increasing both
terminationGracePeriodSeconds in the deployment manifest
as well as the
sidekiqctl stop command’s timeout.
we still kept facing the same issue
of losing jobs whenever pod restarts.
We even tried sending
TSTP and then
TERM after a timeout
relatively longer than 60 seconds.
But the pod was getting harshly terminated
without gracefully terminating Sidekiq process running inside it.
Therefore we kept losing the busy jobs
which were running during the pod termination.
Sidekiq Pro’s super_fetch
We were looking for a way to stop losing our Sidekiq jobs
or a way to recover them reliably when our
background Kubernetes pod restarts.
We realized that the commercial version of Sidekiq,
Sidekiq Pro offers an additional fetch strategy,
which seemed more efficient and reliable
Let’s see what difference
We will need to use
sidekiq-pro gem which needs to be purchased.
Since Sidekiq Pro gem is close sourced, we cannot fetch it
from the default public gem registry,
Instead, we will have to fetch it from a private gem registry
which we get after purchasing it.
We add following code to our
Gemfile and run
we need to add following code
in an initializer
Well, that’s it.
Sidekiq will use
super_fetch instead of
super_fetch is activated, Sidekiq process’ graceful shutdown behavior
is similar to that of
That looks good.
As we can see, Sidekiq moved busy job back from a private queue
to the queue in Redis
when Sidekiq received a
Now, let’s try to kill Sidekiq process forcefully
without allowing a graceful shutdown
by sending a
Since Sidekiq was gracefully shutdown before,
if we restart Sidekiq again,
it will re-process the pushed back job having ID
It appears that Sidekiq didn’t get any chance
to push the busy job back to the queue in Redis
on receiving a
So, where is the magic of
Did we lose our job again?
Let’s restart Sidekiq and see it ourself.
Whoa, isn’t that cool?
See that line where it says
SuperFetch: recovered 1 jobs.
Although the job wasn’t pushed back to the queue in Redis,
Sidekiq somehow recovered our lost job having ID
and reprocessed that job again!
Interested to learn about how Sidekiq did that? Keep on reading.
Note that, since Sidekiq Pro is a close sourced and commercial software,
we cannot explain
super_fetch’s exact implementation details.
As we discussed in-depth before,
basic_fetch strategy uses
BRPOP Redis command
to fetch a job from the queue in Redis.
It works great to some extent,
but it is prone to losing job
if Sidekiq crashes or does not shutdown gracefully.
On the other hand, Sidekiq Pro offers
super_fetch strategy which uses
RPOPLPUSH Redis command to fetch a job.
RPOPLPUSH Redis command provides
a unique approach towards implementing a reliable queue.
RPOPLPUSH command accepts two lists
namely a source list and a destination list.
This command atomically
returns and removes the last element from the source list,
and pushes that element as the first element in the destination list.
Atomically means that both pop and push operations
are performed as a single operation at the same time;
i.e. both should succeed, otherwise both are treated as failed.
super_fetch registers a private queue in Redis
for each Sidekiq process on start-up.
super_fetch atomically fetches a scheduled job
from the public queue in Redis
and pushes that job into the private queue (or working queue)
RPOPLPUSH Redis command.
Once the job is finished processing,
Sidekiq removes that job from the private queue.
During a graceful shutdown,
Sidekiq moves back the unfinished jobs
from the private queue to the public queue.
If shutdown of Sidekiq process is not graceful,
the unfinished jobs of that Sidekiq process
remain there in the private queue which are called as orphaned jobs.
On restarting or starting another Sidekiq process,
super_fetch looks for such orphaned jobs in the private queues.
If Sidekiq finds orphaned jobs, Sidekiq re-enqueue them and processes again.
It may happen that
we have multiple Sidekiq processes running at the same time.
If a process dies among them, its unfinished jobs become orphans.
This Sidekiq wiki
describes in detail the criteria which
super_fetch relies upon
for identifying which jobs are orphaned and which jobs are not orphaned.
If we don’t restart or start another process,
super_fetch may take 5 minutes or 3 hours to recover such orphaned jobs.
The recommended approach is to restart or start another Sidekiq process
super_fetch to look for orphans.
Interestingly, in the older versions of Sidekiq Pro,
super_fetch performed checks for orphaned jobs and queues
every 24 hours
at the Sidekiq process startup.
Due to this, when the Sidekiq process crashes,
the orphaned jobs of that process remain unpicked for up to 24 hours
until the next restart.
This orphan delay check window
had been later lowered to 1 hour in Sidekiq Pro 3.4.1.
Another fun thing to know is that,
there existed two fetch strategies namely
in the older versions of Sidekiq Pro.
did not work with Docker
timed_fetch had asymptotic computational complexity
which has asymptotic computational complexity
Both of these strategies had been deprecated
in Sidekiq Pro 3.4.0 in favor of
Later, both of these strategies had been
in Sidekiq Pro 4.0
and are not documented anywhere.
We have enabled
super_fetch in our application and
it seemed to be working without any major issues so far.
background pods does not seem to
be loosing any jobs when these pods are restarted.
Update : Mike Pheram, author of Sidekiq, posted following comment.
Faktory provides all of the beanstalkd functionality, including the same reliability, with a nicer Web UI. It’s free and OSS. https://github.com/contribsys/faktory http://contribsys.com/faktory/