San Francisco, USA

5214F Diamond Heights Blvd #553
San Francisco, CA 94131

Pune, India

203, Jewel Towers, 2nd Floor
Lane Number 5, Koregaon Park
Pune 411001, India

301 - 275 - 3997
hello@BigBinary.com

Faster JSON generation using PostgreSQL JSON function

There are various ways to generate JSON in Rails. There is to_json method built in Rails. We can also use jbuilder gem or active model serializer gem which can help us achieve the same.

As the number of records in the database grow, Rails can take a very long time to generate a response. The bottleneck can generally be traced back to JSON generation.

Recently, in one of our applications, we faced this issue where a page was taking too long to load. The load time was critical for us as this was the most visited page on the site. The page was loading a race, its racers and their lap details.

The page was loading fine for short races, with 10-15 racers and each racer having 30-50 laps. But for endurance races, with around 50-80 racers and each racer having around 700-800 laps, we were hitting the bottleneck with load times.

After benchmarking, JSON generation at the backend was found to be the culprit.

Looking out for solutions to fix the problem, we came across PostgreSQL JSON functions.

PostgreSQL 9.2 and above have built in support for generating JSON using functions row_to_json and array_to_json. Let’s look into both of them in detail.

row_to_json

row_to_json returns each of the rows as JSON object.

select row_to_json(laps) from laps;

{"id":1,
 "number":1,
 "position":4,
 "time":"628.744",
 "flag_type":"Green"
}
.
.
.

We could use a subquery to only fetch the attributes/columns which we require.

select row_to_json(lap)
from (
  select id, number, position, time, flag_type from laps
) lap;

{"id":1,"number":1,"position":4,"time":"628.744","flag_type":"Green"}
{"id":2,"number":2,"position":4,"time":"614.424","flag_type":"Green"}
.
.
.

array_to_json

To understand array_to_json function, we must first look into array_agg. array_agg is an aggregate function. Aggregate functions compute a single result from a set of input values. sum, min, max are some other examples of aggregate functions. array_agg concatenates all the input values into a PostgreSQL array.

select array_agg(lap)
from (
  select id, number, position, time, flag_type from laps
) lap;

{"(1,1,4,\"628.744\",\"Green\")","(2,2,4,\"614.424\",\"Green\")", ... }

To convert this PostgreSQL array into JSON, we can use the array_to_json function.

select array_to_json(array_agg(lap))
from (
  select id, number, position, time, flag_type from laps
) lap;

[{"id":1,
  "number":1,
  "position":4,
  "time":"628.744",
  "flag_type":"Green"},
  ...]

A more complex example

We can use the above two functions together to generate custom JSON response.

select row_to_json(u)
from (
  select first_name, last_name,
    (
      select array_to_json(array_agg(b))
      from (
        select number, position, time, flag_type
        from laps
        inner join racer_laps
        on laps.id = racer_laps.lap_id
        where racer_laps.racer_id = racers.id
      ) b
    ) as laps
  from racers
  where first_name = 'Jack'
) u;

{
  "first_name": "Jack",
  "last_name": "Altenwerth",
  "laps": [
    {
      "number": 1,
      "position": 4,
      "time": "628.744",
      "flag_type": "Green"
    },
    {
      "number": 2,
      "position": 4,
      "time": "614.424",
      "flag_type": "Green"
    },
    ...
  ]
}

Using the functions in Rails

We can use the above mentioned functions in Rails as shown here.

query = <<~EOQ
select row_to_json(u)
from (
  select first_name, last_name,
    (
      select array_to_json(array_agg(b))
      from (
        select number, position, time, flag_type
        from laps
        inner join racer_laps
        on laps.id = racer_laps.lap_id
        where racer_laps.racer_id = racers.id
      ) b
    ) as laps
  from racers
  where first_name = 'Jack'
) u;
EOQ

generated_json = ActiveRecord::Base.connection.execute(query).values;
puts generated_json

{
  "first_name": "Jack",
  "last_name": "Altenwerth",
  "laps": [
    {
      "number": 1,
      "position": 4,
      "time": "628.744",
      "flag_type": "Green"
    },
    {
      "number": 2,
      "position": 4,
      "time": "614.424",
      "flag_type": "Green"
    },
    ...
  ]
}

Although the code to generate the JSON using the above way is more verbose and less readable compared to other ways to generate JSON in Rails, it is more performant.

Observations

On the racer’s page, generating the JSON using the PostgreSQL functions, gave us the following improvements.

For short races (10-15 racers and each racer having 30-50 laps), the average API response time decreased from 40ms to 15ms.

For endurance races (50-80 racers and each racer having around 700-800 laps), the average API response time decreased from 1200ms to 20ms.

Conclusion

Use Rails way of generating JSON as long as you can. If performance starts to be an issue then don’t be afraid of using the features made available by the database. In this case we would be trading performance for complexity in code. However sometimes this trade is worth it.

APISnapshot built using Elm and Ruby on Rails is open source

APISnapshot is built using Elm and Ruby on Rails. Today we’re happy to announce that the code is publicly available on GitHub.

We built APISnapshot for two reasons.

We wanted to work with Elm.

We wanted to have a tool that is easy to use and that will help us capture what response we are getting from the API in a format that is easy to share in github issue, in slack or in an email. As a consulting company we work with various teams around the world and during development phase either API is unstable or they do not do what they should be doing.

We originally built this tool using React. Elm compiler is quite strict and forced us to take into consideration all possibilities. This lead us to notice a few bugs which were still present in React code. In this way Elm compiler helped us produce “correct” software by eliminating some of the bugs that we would have found later.

JSON encoding/decoding is a hard problem in Elm in general. In most of the cases we know the shape of the API response we are going to get.

In the case of APISnapshot we do not know the shape of the JSON response we will get. Because of that it took us a bit longer to build the application. However, this forced us to really dig deep into JSON encoding/decoding issue in Elm and we learned a lot.

We would like to thank all the contributors to the project. Special shout out goes to Jasim for the discussion and the initial work on parsing the JSON file.

We like the combination of Elm and Ruby on Rails so much so that we are building AceHelp using the same technologies. AceHelp is open source from day one.

Generating filmstrip using puppeteer for better debugging

We are writing a lot of automation tests using Puppeteer.

Since puppeteer scripts execute so fast certain tests fail when they should be passing. Debugging those tests can be a challenge.

Chrome devtools comes with filmstrip feature. In “Performance” tab we can see screenshots of the site as they change over time.

We wanted puppeteer to generate similar filmstrip so that we can visually identify the source of the problem.

It turns out that puppeteer makes it very easy. Here is the full code.

import puppeteer from 'puppeteer';

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 1024 });

  await page.tracing.start({path: 'trace.json', screenshots: true});

  await page.goto('https://www.bigbinary.com');
  await Promise.all([
    page.waitForNavigation(),
    page.click('#navbar > ul > li:nth-child(1) > a')
  ]);

  await page.tracing.stop();

  await browser.close();
})();

If we execute this script then it will generate a file called trace.json. This file has images embedded in it which are base64 encoded.

To see the the filmstrip drag the trace.json file to “Performance” tab in the Chrome devtool. Here is a quick video explaining this.

Ruby 2.5 added lazy proc allocation for block parameters

This blog is part of our Ruby 2.5 series.

irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb> greet_with_welcome { p 'BigBinary' }
Welcome
"BigBinary"
  => "BigBinary"

In Ruby 2.4 when we pass a block to a method which further passes to another method, ruby creates a new Proc object by the given block before passing this proc to the another method.

This creates unnecessary objects even when block parameter is not accessed. It also creates a chain of Proc objects when the block parameter is passed through various methods.

Proc creation is one of heavyweight operation because we need to store all of local variables (represented by Env objects in MRI internal) to the heap.

Ruby 2.5 introduced lazy proc allocation. Ruby 2.5 does not create Proc object while passing a block to another method. It rather passes the block information. If the block is accessed somewhere then it creates a Proc object by the given block.

This resulted in lesser memory allocation and faster execution.

Ruby 2.4

irb> require 'benchmark'
  => true
irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb>
irb> Benchmark.measure { 1000.times { greet_with_welcome { 'BigBinary' } } }
Welcome
Welcome
...
...
...
  => #<Benchmark::Tms:0x007fe6ab929de0 @label="", @real=0.022295999999187188, @cstime=0.0, @cutime=0.0, @stime=0.01, @utime=0.0, @total=0.01>

Ruby 2.5

irb> require 'benchmark'
  => true
irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb>
  irb> Benchmark.measure { 1000.times { greet_with_welcome { 'BigBinary' } } }
Welcome
Welcome
...
...
...
  => #<Benchmark::Tms:0x00007fa4400871b8 @label="", @real=0.004612999997334555, @cstime=0.0, @cutime=0.0, @stime=0.001524000000000001, @utime=0.0030690000000000023, @total=0.004593000000000003>

As we can see there is considerable improvement in execution time in Ruby 2.5 when a block param is passed.

Here is relevant commit and discussion.

Rails 5.2 fixes query caching in MySQL and PostgreSQL adapters

Prior to Rails 5.2, MySQL and PostgreSQL adapters had select_value, select_values & select_rows select_{value,values,rows} methods. They improve the performance by not instantiating ActiveRecord::Result.

However these methods broke query caching of ActiveRecord::FinderMethods#exists? method. Let’s check the issue.

>> User.cache do
>>   2.times { User.exists?(1) }
>> end

User Exists (2.1ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

User Exists (2ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

As we can see, query was not cached and sql was executed second time.

From Rails 5.2, MySQL and PostgreSQL adapters are no longer override select_{value,values,rows} methods which fix this query caching issue.

Also, the performance improvement provided by these methods was marginal and not a hotspot in Active Record, so this change was accepted.

Let’s check query caching of ActiveRecord::FinderMethods#exists? after the change.

>> User.cache do
>>   2.times { User.exists?(1) }
>> end

User Exists (2.1ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

CACHE User Exists (0.0ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

Now, query has been cached as expected.

This change has been backported in rails 5.1 from version 5.1.2 as well.

How to mitigate DDoS using Rack::Attack

Recently, we faced a DDoS attack in one of the clients’ projects. There were many requests from different IPs to root and login paths, and we were running thrice the usual number of servers to keep the system alive.

We are using Cloudflare’s HTTP proxy and it was doing a great job preventing malicious requests, but we wanted to check if we can avoid the loading/captcha pages which Cloudflare uses to filter requests. We came to a conclusion that we would be able to mitigate the ongoing attack if we could throttle requests by IP.

Cloudflare has an inbuilt Rate Limiting feature to throttle requests, but it would be a little expensive in our case since Cloudflare charges by the number of good requests and it was a high traffic website. On further analysis, we found that throttling at application level would be enough in that situation and the gem Rack::Attack helped us with that.

Rack::Attack is a Rack middleware from Kickstarter. It can be configured to throttle requests based on IP or any other parameter.

To use Rack::Attack, include the gem in Gemfile.

gem "rack-attack"

After bundle install, configure the middleware in config/application.rb:

config.middleware.use Rack::Attack

Now we can create the initializer config/initializers/rack_attack.rb to configure Rack::Attack.

By default, Rack::Attack uses Rails.cache to store requests information. In our case, we wanted a separate cache for Rack::Attack and it was configured as follows.

redis_client = Redis.connect(url: ENV["REDIS_URL"])
Rack::Attack.cache.store = Rack::Attack::StoreProxy::RedisStoreProxy.new(redis_client)

If the web server is behind a proxy like Cloudflare, we have to configure a method to fetch the correct remote_ip address. Otherwise, it would block based on proxy’s IP address and would result in blocking a lot of legit requests.

class Rack::Attack
  class Request < ::Rack::Request
    def remote_ip
      # Cloudflare stores remote IP in CF_CONNECTING_IP header
      @remote_ip ||= (env['HTTP_CF_CONNECTING_IP'] ||
                      env['action_dispatch.remote_ip'] ||
                      ip).to_s
    end
  end
end

Requests can be throttled based on IP address or any other parameter. In the following example, we are setting a limit of 40rpm/IP for “/” path.

class Rack::Attack
  throttle("req/ip", :limit => 40, :period => 1.minute) do |req|
    req.remote_ip if req.path == "/"
  end
end

The downside of this configuration is that after the 1 minute period, the attacker can launch another 40 requests/IP simultaneously and it would exert pressure on the servers. This can be solved using exponential backoff.

class Rack::Attack
  # Exponential backoff for all requests to "/" path
  #
  # Allows 240 requests/IP in ~8 minutes
  #        480 requests/IP in ~1 hour
  #        960 requests/IP in ~8 hours (~2,880 requests/day)
  (3..5).each do |level|
    throttle("req/ip/#{level}",
               :limit => (30 * (2 ** level)),
               :period => (0.9 * (8 ** level)).to_i.seconds) do |req|
      req.remote_ip if req.path == "/"
    end
  end
end

If we want to turn off throttling for some IPs (Eg.: Health check services), then those IPs can be safelisted.

class Rack::Attack
  class Request < ::Rack::Request
    def allowed_ip?
      allowed_ips = ["127.0.0.1", "::1"]
      allowed_ips.include?(remote_ip)
    end
  end

  safelist('allow from localhost') do |req|
    req.allowed_ip?
  end
end

We can log blocked requests separately and this is helpful for analyzing the attack.

ActiveSupport::Notifications.subscribe('rack.attack') do |name, start, finish, request_id, req|
  if req.env["rack.attack.match_type"] == :throttle
    request_headers = { "CF-RAY" => req.env["HTTP_CF_RAY"],
                        "X-Amzn-Trace-Id" => req.env["HTTP_X_AMZN_TRACE_ID"] }

    Rails.logger.info "[Rack::Attack][Blocked]" <<
                      "remote_ip: \"#{req.remote_ip}\"," <<
                      "path: \"#{req.path}\", " <<
                      "headers: #{request_headers.inspect}"
  end
end

A sample initializer with these configurations can be downloaded from here.

Application will now throttle requests and will respond with HTTP 429 Too Many Requests response for the throttled requests.

We now block a lot of malicious requests using Rack::Attack. Here’s a graph with % of blocked requests over a week.

Blocked requests %

EDIT: Updated the post to add more context to the situation.

Increase reliability of background job processing using super_fetch of Sidekiq Pro

Sidekiq is a background job processing library for Ruby. Sidekiq offers three versions: OSS, Pro and Enterprise.

OSS is free and open source and has basic features. Pro and Enterprise versions are closed source and paid, thus comes with more advanced features. To compare the list of features offered by each of these versions, please visit Sidekiq website.

Sidekiq Pro 3.4.0 introduced super_fetch strategy to reliably fetch jobs from the queue in Redis.

In this post, we will discuss the benefits of using super_fetch strategy.

Problem

Open source version of Sidekiq comes with basic_fetch strategy. Let’s see an example to understand how it works.

Let’s add Sidekiq to our Gemfile and run bundle install to install it.

gem 'sidekiq'

Add following Sidekiq worker in app/workers/sleep_worker.rb.

class SleepWorker
  include Sidekiq::Worker

  def perform(name)
    puts "Started #{name}"
    sleep 30
    puts "Finished #{name}"
  end
end

This worker does nothing great but sleeps for 30 seconds.

Let’s open Rails console and schedule this worker to run as a background job asynchronously.

>> require "sidekiq/api"
=> true
>> Sidekiq::Queue.new.size
=> 0
>> SleepWorker.perform_async("A")
=> "5d8bf898c36a60a1096cf4d3"
>> Sidekiq::Queue.new.size
=> 1

As we can see, queue now has 1 job scheduled to be processed.

Let’s start Sidekiq in another terminal tab.

$ bundle exec sidekiq

40510 TID-owu1swr1i INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-40510", :url=>nil}
40510 TID-owu1swr1i INFO: Starting processing, hit Ctrl-C to stop
40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: start
Started A

As we can see, the job with ID 5d8bf898c36a60a1096cf4d3 was picked up by Sidekiq and it started processing the job.

If we check the Sidekiq queue size in the Rails console, it will be zero now.

>> Sidekiq::Queue.new.size
=> 0

Let’s shutdown the Sidekiq process gracefully while Sidekiq is still in the middle of processing our scheduled job. Press either Ctrl-C or run kill -SIGINT <PID> command.

$ kill -SIGINT 40510
40510 TID-owu1swr1i INFO: Shutting down
40510 TID-owu1swr1i INFO: Terminating quiet workers
40510 TID-owu1x00rm INFO: Scheduler exiting...
40510 TID-owu1swr1i INFO: Pausing to allow workers to finish...
40510 TID-owu1swr1i WARN: Terminating 1 busy worker threads
40510 TID-owu1swr1i WARN: Work still in progress [#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="{\"class\":\"SleepWorker\",\"args\":[\"A\"],\"retry\":true,\"queue\":\"default\",\"jid\":\"5d8bf898c36a60a1096cf4d3\",\"created_at\":1525427293.956314,\"enqueued_at\":1525427293.957355}">]
40510 TID-owu1swr1i INFO: Pushed 1 jobs back to Redis
40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: fail: 19.576 sec
40510 TID-owu1swr1i INFO: Bye!

As we can see, Sidekiq pushed back the unfinished job back to Redis queue when Sidekiq received a SIGINT signal.

Let’s verify it.

>> Sidekiq::Queue.new.size
=> 1

Before we move on, let’s learn some basics about signals such as SIGINT.

A crash course on POSIX signals

SIGINT is an interrupt signal. It is an alternative to hitting Ctrl-C from the keyboard. When a process is running in foreground, we can hit Ctrl-C to signal the process to shut down. When the process is running in background, we can use kill command to send a SIGINT signal to the process’ PID. A process can optionally catch this signal and shutdown itself gracefully. If the process does not respect this signal and ignores it, then nothing really happens and the process keeps running. Both INT and SIGINT are identical signals.

Another useful signal is SIGTERM. It is called a termination signal. A process can either catch it and perform necessary cleanup or just ignore it. Similar to a SIGINT signal, if a process ignores this signal, then the process keeps running. Note that, if no signal is supplied to the kill command, SIGTERM is used by default. Both TERM and SIGTERM are identical signals.

SIGTSTP or TSTP is called terminal stop signal. It is an alternative to hitting Ctrl-Z on the keyboard. This signal causes a process to suspend further execution.

SIGKILL is known as kill signal. This signal is intended to kill the process immediately and forcefully. A process cannot catch this signal, therefore the process cannot perform cleanup or graceful shutdown. This signal is used when a process does not respect and respond to both SIGINT and SIGTERM signals. KILL, SIGKILL and 9 are identical signals.

There are a lot of other signals besides these, but they are not relevant for this post. Please check them out here.

A Sidekiq process pays respect to all of these signals and behaves as we expect. When Sidekiq receives a TERM or SIGTERM signal, Sidekiq terminates itself gracefully.

Back to our example

Coming back to our example from above, we had sent a SIGINT signal to the Sidekiq process.

$ kill -SIGINT 40510

On receiving this SIGINT signal, Sidekiq process having PID 40510 terminated quiet workers, paused the queue and waited for a while to let busy workers finish their jobs. Since our busy SleepWorker did not finish quickly, Sidekiq terminated that busy worker and pushed it back to the queue in Redis. After that, Sidekiq gracefully terminated itself with an exit code 0. Note that, the default timeout is 8 seconds until which Sidekiq can wait to let the busy workers finish otherwise it pushes the unfinished jobs back to the queue in Redis. This timeout can be changed with -t option given at the startup of Sidekiq process.

Sidekiq recommends to send a TSTP and a TERM together to ensure that the Sidekiq process shuts down safely and gracefully. On receiving a TSTP signal, Sidekiq stops pulling new work and finishes the work which is in-progress. The idea is to first send a TSTP signal, wait as much as possible (by default for 8 seconds as discussed above) to ensure that busy workers finish their jobs and then send a TERM signal to shutdown the process.

Sidekiq pushes back the unprocessed job in Redis when terminated gracefully. It means that Sidekiq pulls the unfinished job and starts processing again when we restart the Sidekiq process.

$ bundle exec sidekiq

45916 TID-ovfq8ll0k INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-45916", :url=>nil}
45916 TID-ovfq8ll0k INFO: Starting processing, hit Ctrl-C to stop
45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: start
Started A
Finished A
45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: done: 30.015 sec

We can see that Sidekiq pulled the previously terminated job with ID 5d8bf898c36a60a1096cf4d3 and processed that job again.

So far so good.

This behavior is implemented using basic_fetch strategy which is present in the open sourced version of Sidekiq.

Sidekiq uses BRPOP Redis command to fetch a scheduled job from the queue. When a job is fetched, that job gets removed from the queue and that job no longer exists in Redis. If this fetched job is processed, then all is good. Also, if the Sidekiq process is terminated gracefully on receiving either a SIGINT or a SIGTERM signal, Sidekiq will push back the unfinished jobs back to the queue in Redis.

But what if the Sidekiq process crashes in the middle while processing that fetched job?

A process is considered as crashed if the process does not shutdown gracefully. As we discussed before, when we send a SIGKILL signal to a process, the process cannot receive or catch this signal. Because the process cannot shutdown gracefully and nicely, it gets crashed.

When a Sidekiq process is crashed, the fetched jobs by that Sidekiq process which are not yet finished get lost forever.

Let’s try to reproduce this scenario.

We will schedule another job.

>> SleepWorker.perform_async("B")
=> "37a5ab4139796c4b9dc1ea6d"

>> Sidekiq::Queue.new.size
=> 1

Now, let’s start Sidekiq process and kill it using SIGKILL or 9 signal.

$ bundle exec sidekiq

47395 TID-ow8q4nxzf INFO: Starting processing, hit Ctrl-C to stop
47395 TID-ow8qba0x7 SleepWorker JID-37a5ab4139796c4b9dc1ea6d INFO: start
Started B
[1]    47395 killed     bundle exec sidekiq
$ kill -SIGKILL 47395

Let’s check if Sidekiq had pushed the busy (unprocessed) job back to the queue in Redis before terminating.

>> Sidekiq::Queue.new.size
=> 0

No. It does not.

Actually, the Sidekiq process did not get a chance to shutdown gracefully when it received the SIGKILL signal.

If we restart the Sidekiq process, it cannot fetch that unprocessed job since the job was not pushed back to the queue in Redis at all.

$ bundle exec sidekiq

47733 TID-ox1lau26l INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-47733", :url=>nil}
47733 TID-ox1lau26l INFO: Starting processing, hit Ctrl-C to stop

Therefore, the job having name argument as B or ID as 37a5ab4139796c4b9dc1ea6d is completely lost. There is no way to get that job back.

Losing job like this may not be a problem for some applications but for some critical applications this could be a huge issue.

We faced a similar problem like this. One of our clients’ application is deployed on a Kubernetes cluster. Our Sidekiq process runs in a Docker container in the Kubernetes pods which we call background pods.

Here’s our stripped down version of Kubernetes deployment manifest which creates a Kubernetes deployment resource. Our Sidekiq process runs in the pods spawned by that deployment resource.

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: background
spec:
  replicas: 2
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: background
        image: <%= ENV['IMAGE'] %>
        env:
        - name: POD_TYPE
          value: background
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/bash
              - -l
              - -c
              - for pid in tmp/pids/sidekiq*.pid; do bin/bundle exec sidekiqctl stop $pid 60; done

When we apply an updated version of this manifest ,for say, changing the Docker image, the running pods are terminated and new pods are created.

Before terminating the only container in the pod, Kubernetes executes sidekiqctl stop $pid 60 command which we have defined using the preStop event handler. Note that, Kubernetes already sends SIGTERM signal to the container being terminated inside the pod before invoking the preStop event handler. The default termination grace period is 30 seconds and it is configurable. If the container doesn’t terminate within the termination grace period, a SIGKILL signal will be sent to forcefully terminate the container.

The sidekiqctl stop $pid 60 command executed in the preStop handler does three things.

  1. Sends a SIGTERM signal to the Sidekiq process running in the container.
  2. Waits for 60 seconds.
  3. Sends a SIGKILL signal to kill the Sidekiq process forcefully if the process has not terminated gracefully yet.

This worked for us when the count of busy jobs was relatively small.

When the number of processing jobs is higher, Sidekiq does not get enough time to quiet the busy workers and fails to push some of them back on the Redis queue.

We found that some of the jobs were getting lost when our background pod restarted. We had to restart our background pod for reasons such as updating the Kubernetes deployment manifest, pod being automatically evicted by Kubernetes due to host node encountering OOM (out of memory) issue, etc.

We tried increasing both terminationGracePeriodSeconds in the deployment manifest as well as the sidekiqctl stop command’s timeout. Despite that, we still kept facing the same issue of losing jobs whenever pod restarts.

We even tried sending TSTP and then TERM after a timeout relatively longer than 60 seconds. But the pod was getting harshly terminated without gracefully terminating Sidekiq process running inside it. Therefore we kept losing the busy jobs which were running during the pod termination.

Sidekiq Pro’s super_fetch

We were looking for a way to stop losing our Sidekiq jobs or a way to recover them reliably when our background Kubernetes pod restarts.

We realized that the commercial version of Sidekiq, Sidekiq Pro offers an additional fetch strategy, super_fetch, which seemed more efficient and reliable compared to basic_fetch strategy.

Let’s see what difference super_fetch strategy makes over basic_fetch.

We will need to use sidekiq-pro gem which needs to be purchased. Since Sidekiq Pro gem is close sourced, we cannot fetch it from the default public gem registry, https://rubygems.org. Instead, we will have to fetch it from a private gem registry which we get after purchasing it. We add following code to our Gemfile and run bundle install.

source ENV['SIDEKIQ_PRO_GEM_URL'] do
  gem 'sidekiq-pro'
end

To enable super_fetch, we need to add following code in an initializer config/initializers/sidekiq.rb.

Sidekiq.configure_server do |config|
  config.super_fetch!
end

Well, that’s it. Sidekiq will use super_fetch instead of basic_fetch now.

$ bundle exec sidekiq

75595 TID-owsytgvqj INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
75595 TID-owsytgvqj INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-75595", :url=>nil}
75595 TID-owsytgvqj INFO: Starting processing, hit Ctrl-C to stop
75595 TID-owsys5imz INFO: SuperFetch activated

When super_fetch is activated, Sidekiq process’ graceful shutdown behavior is similar to that of basic_fetch.

>> SleepWorker.perform_async("C")
=> "f002a41393f9a79a4366d2b5"
>> Sidekiq::Queue.new.size
=> 1
$ bundle exec sidekiq

76021 TID-ow6kdcca5 INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
76021 TID-ow6kdcca5 INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-76021", :url=>nil}
76021 TID-ow6kdcca5 INFO: Starting processing, hit Ctrl-C to stop
76021 TID-ow6klq2cx INFO: SuperFetch activated
76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
>> Sidekiq::Queue.new.size
=> 0
$ kill -SIGTERM 76021
76021 TID-ow6kdcca5 INFO: Shutting down
76021 TID-ow6kdcca5 INFO: Terminating quiet workers
76021 TID-ow6kieuwh INFO: Scheduler exiting...
76021 TID-ow6kdcca5 INFO: Pausing to allow workers to finish...
76021 TID-ow6kdcca5 WARN: Terminating 1 busy worker threads
76021 TID-ow6kdcca5 WARN: Work still in progress [#<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"SleepWorker\",\"args\":[\"C\"],\"retry\":true,\"queue\":\"default\",\"jid\":\"f002a41393f9a79a4366d2b5\",\"created_at\":1525500653.404454,\"enqueued_at\":1525500653.404501}", local_queue="queue:sq|vishal.local:76021:3e64c4b08393|default">]
76021 TID-ow6kdcca5 INFO: SuperFetch: Moving job from queue:sq|vishal.local:76021:3e64c4b08393|default back to queue:default
76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: fail: 13.758 sec
76021 TID-ow6kdcca5 INFO: Bye!
>> Sidekiq::Queue.new.size
=> 1

That looks good. As we can see, Sidekiq moved busy job back from a private queue to the queue in Redis when Sidekiq received a SIGTERM signal.

Now, let’s try to kill Sidekiq process forcefully without allowing a graceful shutdown by sending a SIGKILL signal.

Since Sidekiq was gracefully shutdown before, if we restart Sidekiq again, it will re-process the pushed back job having ID f002a41393f9a79a4366d2b5.

$ bundle exec sidekiq

76890 TID-oxecurbtu INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
76890 TID-oxecurbtu INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-76890", :url=>nil}
76890 TID-oxecurbtu INFO: Starting processing, hit Ctrl-C to stop
76890 TID-oxecyhftq INFO: SuperFetch activated
76890 TID-oxecyotvm SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
[1]    76890 killed     bundle exec sidekiq
$ kill -SIGKILL 76890
>> Sidekiq::Queue.new.size
=> 0

It appears that Sidekiq didn’t get any chance to push the busy job back to the queue in Redis on receiving a SIGKILL signal.

So, where is the magic of super_fetch?

Did we lose our job again?

Let’s restart Sidekiq and see it ourself.

$ bundle exec sidekiq

77496 TID-oum04ghgw INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
77496 TID-oum04ghgw INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-77496", :url=>nil}
77496 TID-oum04ghgw INFO: Starting processing, hit Ctrl-C to stop
77496 TID-oum086w9s INFO: SuperFetch activated
77496 TID-oum086w9s WARN: SuperFetch: recovered 1 jobs
77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
Finished C
77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: done: 30.011 sec

Whoa, isn’t that cool?

See that line where it says SuperFetch: recovered 1 jobs.

Although the job wasn’t pushed back to the queue in Redis, Sidekiq somehow recovered our lost job having ID f002a41393f9a79a4366d2b5 and reprocessed that job again!

Interested to learn about how Sidekiq did that? Keep on reading.

Note that, since Sidekiq Pro is a close sourced and commercial software, we cannot explain super_fetch’s exact implementation details.

As we discussed in-depth before, Sidekiq’s basic_fetch strategy uses BRPOP Redis command to fetch a job from the queue in Redis. It works great to some extent, but it is prone to losing job if Sidekiq crashes or does not shutdown gracefully.

On the other hand, Sidekiq Pro offers super_fetch strategy which uses RPOPLPUSH Redis command to fetch a job.

RPOPLPUSH Redis command provides a unique approach towards implementing a reliable queue. RPOPLPUSH command accepts two lists namely a source list and a destination list. This command atomically returns and removes the last element from the source list, and pushes that element as the first element in the destination list. Atomically means that both pop and push operations are performed as a single operation at the same time; i.e. both should succeed, otherwise both are treated as failed.

super_fetch registers a private queue in Redis for each Sidekiq process on start-up. super_fetch atomically fetches a scheduled job from the public queue in Redis and pushes that job into the private queue (or working queue) using RPOPLPUSH Redis command. Once the job is finished processing, Sidekiq removes that job from the private queue. During a graceful shutdown, Sidekiq moves back the unfinished jobs from the private queue to the public queue. If shutdown of Sidekiq process is not graceful, the unfinished jobs of that Sidekiq process remain there in the private queue which are called as orphaned jobs. On restarting or starting another Sidekiq process, super_fetch looks for such orphaned jobs in the private queues. If Sidekiq finds orphaned jobs, Sidekiq re-enqueue them and processes again.

It may happen that we have multiple Sidekiq processes running at the same time. If a process dies among them, its unfinished jobs become orphans. This Sidekiq wiki describes in detail the criteria which super_fetch relies upon for identifying which jobs are orphaned and which jobs are not orphaned. If we don’t restart or start another process, super_fetch may take 5 minutes or 3 hours to recover such orphaned jobs. The recommended approach is to restart or start another Sidekiq process to signal super_fetch to look for orphans.

Interestingly, in the older versions of Sidekiq Pro, super_fetch performed checks for orphaned jobs and queues every 24 hours at the Sidekiq process startup. Due to this, when the Sidekiq process crashes, the orphaned jobs of that process remain unpicked for up to 24 hours until the next restart. This orphan delay check window had been later lowered to 1 hour in Sidekiq Pro 3.4.1.

Another fun thing to know is that, there existed two fetch strategies namely reliable_fetch and timed_fetch in the older versions of Sidekiq Pro. Apparently, reliable_fetch did not work with Docker and timed_fetch had asymptotic computational complexity O(log N), comparatively less efficient than super_fetch, which has asymptotic computational complexity O(1). Both of these strategies had been deprecated in Sidekiq Pro 3.4.0 in favor of super_fetch. Later, both of these strategies had been removed in Sidekiq Pro 4.0 and are not documented anywhere.

Final result

We have enabled super_fetch in our application and it seemed to be working without any major issues so far. Our Kubernetes background pods does not seem to be loosing any jobs when these pods are restarted.

Update : Mike Pheram, author of Sidekiq, posted following comment.

Faktory provides all of the beanstalkd functionality, including the same reliability, with a nicer Web UI. It’s free and OSS. https://github.com/contribsys/faktory http://contribsys.com/faktory/

Rails 5.2 sets Ruby version in Gemfile and adds .ruby-version file by default

This blog is part of our Rails 5.2 series.

For Ruby developers, it’s common to switch between multiple Ruby versions for multiple projects as per the needs of the project. Sometimes, the process of going back and forth with multiple Ruby versions could be frustrating for the developer. To avoid this we add .ruby-version files to our projects so that version manager tools such as rvm, rbenv etc. can easily determine which Ruby version should be used for that particular project.

One other case that Rails developers have to take care of is ensuring that the Ruby version used to run Rails by the deployment tools is the one that is desired. In order to ensure that we add ruby version to Gemfile. This will help bundler install dependencies scoped to the specified Ruby version.

Good News! Rails 5.2 makes our work easy.

In Rails 5.2, changes have been made to introduce .ruby-version file and also add the Ruby version to Gemfile by default after creating an app.

Let’s create a new project with Ruby 2.5 .

$ rvm list default

  Default Ruby (for new shells)

     ruby-2.5 [ x86_64 ]


$ rails new my_new_app

In our new project, we should be able to see .ruby-version in its root directory and it will contain value 2.5. Also, we should see following line in the Gemfile.

ruby "2.5"

Deploying Docker Registry on Kubernetes using S3 Storage

In today’s era of containerization, no matter what container we are using we need an image to run the container. Docker images are stored on container registries like Docker hub(cloud), Google Container Registry(GCR), AWS ECR, quay.io etc.

We can also self-host docker registry on any docker platform. In this blog post, we will see how to deploy docker registry on kubernetes using storage driver S3.

Pre-requisite:

As per docker registry documentation, We can simply start the registry using docker image registry.

Basic parameters when deploying production registry are:

  • Authentication
  • SSL
  • Storage

We will use htpasswd authentication for this post though registry image supports silly and token based authentication as well.

Docker registry requires applications to use SSL certificate and key in the registry. We will use kubernetes service, which terminates SSL on ELB level using annotations.

For registry storage, we can use filesystem, s3, azure, swift etc. For the complete list of options please visit docker site site.

We need to store the docker images pushed to the registry. We will use S3 to store these docker images.

Steps for deploying registry on kubernetes.

Get the ARN of the SSL certificate to be used for SSL.

If you don’t have SSL on AWS IAM, upload it using the following command.

$aws iam upload-server-certificate --server-certificate-name registry --certificate-body file://registry.crt --private-key file://key.pem

Get the arn for the certificate using the command.

$aws iam get-server-certificate --server-certificate-name registry  | grep Arn

Create S3 bucket which will be used to store docker images using s3cmd or aws s3.


$s3cmd mb s3://myregistry
Bucket 's3://myregistry/' created

Create a separate namespace, configmap, deployment and service for registry using following templates.

---
apiVersion: v1
kind: Namespace
metadata:
name: container-registry

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: auth
  namespace: container-registry
data:
  htpasswd: |
    admin:$2y$05$TpZPzI7U7cr3cipe6jrOPe0bqohiwgEerEB6E4bFLsUf7Bk.SEBRi

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: registry
  name: registry
  namespace: container-registry
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: registry
    spec:
      containers:
      - env:
        - name: REGISTRY_AUTH
          value: htpasswd
        - name: REGISTRY_AUTH_HTPASSWD_PATH
          value: /auth/htpasswd
        - name: REGISTRY_AUTH_HTPASSWD_REALM
          value: Registry Realm
        - name: REGISTRY_STORAGE
          value: s3
        - name: REGISTRY_STORAGE_S3_ACCESSKEY
          value: <your-s3-access-key>
        - name: REGISTRY_STORAGE_S3_BUCKET
          value: <your-registry-bucket>
        - name: REGISTRY_STORAGE_S3_REGION
          value: us-east-1
        - name: REGISTRY_STORAGE_S3_SECRETKEY
          value: <your-secret-s3-key>
        image: registry:2
        name: registry
        ports:
        - containerPort: 5000
        volumeMounts:
          - name: auth
            mountPath: /auth
      volumes:
        - name: auth
          configMap:
            name: auth
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <your-iam-certificate-arn>
    service.beta.kubernetes.io/aws-load-balancer-instance-protocol: http
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: '443'
  labels:
    app: registry
  name: registry
  namespace: container-registry
spec:
  ports:
  - name: "443"
    port: 443
    targetPort: 5000
  selector:
    app: registry
type: LoadBalancer

Let’s launch this manifest using kubectl apply.

kubectl apply -f registry-namespace.yml registry-configmap.yml registry-deployment.yaml registry-namespace.yml
namespace "registry" created
configmap "auth" created
deployment "registry" created
service "registry" created

Now that we have created registry, we should map DNS to web service ELB endpoint. We can get the webservice ELB endpoint using the following command.

$kubectl -n registry get svc registry -o wide

NAME       CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)         AGE       SELECTOR
registry   100.71.250.56   abcghccf8540698e8bff782799ca8h04-1234567890.us-east-2.elb.amazonaws.com   443:30494/TCP   1h       app=registry

We will point DNS to this ELB endpoint with domain registry.myapp.com

Once we have registry running, now it’s time to push the image to a registry.

First, pull the image or build the image locally to push.

On local machine run following commands:

$docker pull busybox
latest: Pulling from busybox
f9ea5e501ad7: Pull complete
ac3f08b78d4e: Pull complete
Digest: sha256:da268b65d710e5ca91271f161d0ff078dc63930bbd6baac88d21b20d23b427ec
Status: Downloaded newer image for busybox:latest

Now login to our registry using the following commands.

$ sudo docker login registry.myapp.com

Username: admin

Password:

Login Succeeded

Now tag the image to point it to our registry using docker tag command

$ sudo docker tag busybox registry.myapp.com/my-app:latest

Once the image is tagged we are good to push.

Using the docker push command let’s push the image.

$ sudo docker push docker.gocloudlogistics.com/my-app:latest

The push refers to a repository [registry.myapp.com/my-app]

05732a3f47b5: Pushed
30de36c4bd15: Pushed
5237590c0d08: Pushed
latest: digest: sha256:f112e608b2639b21498bd4dbca9076d378cc216a80d52287f7f0f6ea6ad739ab size: 205

We are successfully able to push image to registry running on kunbernetes and stored on S3. Let’s verify if it exists on S3.

Navigate to our s3 bucket and we can see the docker registry repository busybox has been created.

$ s3cmd ls s3://myregistry/docker/registry/v2/repositories/
DIR   s3://myregistry/docker/registry/v2/repositories/busybox/

All our image related files are stored on S3.

In this way, we self-host container registry on kubernetes backed by s3 storage.

Ruby 2.6 added option to raise exception in Kernel#system

This blog is part of our Ruby 2.6 series. Ruby 2.6.0-preview2 was recently released.

We write scripts to automate setup and deployment of Rails applications. In those scripts, at many places, we need to run system commands like bundle install, rake db:create, rake db:migrate and many more.

Let’s suppose we need to run migrations using rake db:migrate in a rails project setup script. We can use Kernel#system method.

irb> system('rake db:migrate')

Ruby 2.5.0

Executing system returns true or false. Another feature of system is that it eats up the exceptions.

Let’s suppose our migrations can run successfully. In this case system command for running migrations will return true.

irb> system('rake db:migrate')
 => true

Let’s suppose we have a migration which is trying to add a column to a table which does not exist. In this case, system command for running migrations will return false.

irb> system('rake db:migrate')
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
 => false

As we can see, even when there is a failure in executing system commands, the return value is false. Ruby does not raise an exception in those cases.

However, we can use raise explicitly to raise an exception and halt the setup script execution.

irb> system('rake db:migrate') || raise('Failed to run migrations')
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
Traceback (most recent call last):
        2: from /Users/amit/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `<main>'
        1: from (irb):4
RuntimeError (Failed to run migrations)

Ruby 2.6.0-preview1

Ruby 2.6 make our lives easier by providing an option exception: true so that we do not need to use raise explicitly to halt script execution.

irb> system('rake db:migrate', exception: true)
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
Traceback (most recent call last):
        3: from /Users/amit/.rvm/rubies/ruby-2.6.0-preview1/bin/irb:11:in `<main>'
        2: from (irb):2
        1: from (irb):2:in `system'
RuntimeError (Command failed with exit 1: rake)

Ruby 2.6 works the same way as previous ruby versions when used without exception option or used with exception set as false.

irb> system('rake db:migrate', exception: false)
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
 => false

Here is relevant commit and discussion for this change.

system is not the only way to execute such scripts. We wrote a blog 6 years back which discusses the differences between running commands using backtick, exec, sh, popen3, popen2e and Process.spawn.

The Chinese version of this blog is available here.

Ruby 2.5 enables Thread.report_on_exception by default and we get more info when threads die

This blog is part of our Ruby 2.5 series.

Let’s see what happens when an exception is raised inside a thread.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

sleep 1

puts "In the main thread"

Execution of it looks like this.

$ RBENV_VERSION=2.4.0 ruby thread_example_1.rb

Calculating 4/0 in division_thread

In the main thread

Note that the last two lines from the block were not printed. Also notice that after failing in the thread the program continued to run in main thread. That’s why we got the message “In the main thread”.

This is because the default behavior of Ruby is to silently ignore exceptions in threads and then to continue to execute in the main thread.

Enabling abort_on_exception to stop on failure

If we want an exception in a thread to stop further processing both in the thread and in the main thread then we can enable Thread[.#]abort_on_exception on that thread to achieve that.

Notice that in the below code we are using Thread.current.

division_thread = Thread.new do
  Thread.current.abort_on_exception = true

  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

sleep 1

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_2.rb

Calculating 4/0 in division_thread

thread_example_2.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_2.rb:5:in `block in <main>'

As we can see once an exception was encountered in the thread then processing stopped on both in the thread and in the main thread.

Note that Thread.current.abort_on_exception = true activates this behavior only for the current thread.

If we want this behavior globally for all the threads then we need to use Thread.abort_on_exception = true.

Running program with debug flag to stop on failure

Let’s run the original code with --debug option.

$ RBENV_VERSION=2.4.0 ruby --debug thread_example_1.rb

thread_example_1.rb:1: warning: assigned but unused variable - division_thread

Calculating 4/0 in division_thread

Exception `ZeroDivisionError' at thread_example_1.rb:3 - divided by 0
Exception `ZeroDivisionError' at thread_example_1.rb:7 - divided by 0
thread_example_1.rb:3:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_1.rb:3:in `block in <main>'

In this case the exception is printed in detail and the code in main thread was not executed.

Usually when we execute a program with --debug option then the behavior of the program does not change. We expect the program to print more stuff but we do not expect behavior to change. However in this case the --debug option changes the behavior of the program.

Running program with join on thread to stop on failure

If a thread raises an exception and abort_on_exception and $DEBUG flags are not set then that exception will be processed at the time of joining of the thread.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

division_thread.join

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_3.rb

Calculating 4/0 in division_thread

thread_example_3.rb:3:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_3.rb:3:in `block in <main>'

Both Thread#join and Thread#value will stop processing in the thread and in the main thread once an exception is encountered.

Introduction of report_on_exception in Ruby 2.4

Almost 6 years ago, Charles Nutter (headius) had proposed that the exceptions raised in threads should be automatically logged and reported, by default. To make his point, he explained issues similar to what we discussed above about the Ruby’s behavior of silently ignoring exceptions in threads. Here is the relevant discussion on his proposal.

Following are some of the notable points discussed.

  • Enabling Thread[.#]abort_on_exception, by default, is not always a good idea.
  • There should be a flag which, if enabled, would print the thread-killing exception info.
  • In many cases, people spawn one-off threads which are not hard-referenced using Thread#join or Thread#value. Such threads gets garbage collected. Should it report the thread-killing exception at the time of garbage collection if such a flag is enabled?
  • Should it warn using Warning#warn or redirect to STDERR device while reporting?

Charles Nutter suggested that a configurable global flag Thread.report_on_exception and instance-level flag Thread#report_on_exception should be implemented having its default value as true. When set to true, it should report print exception information.

Matz and other core members approved that Thread[.#]report_on_exception can be implemented having its default value set to false.

Charles Nutter, Benoit Daloze and other people demanded that it should be true by default so that programmers can be aware of the silently disappearing threads because of exceptions.

Shyouhei Urabe advised that due to some technical challenges, the default value should be set to false so as this feature could land in Ruby. Once this feature is in then the default value can be changed in a later release.

Nobuyoshi Nakada (nobu) pushed an implementation for Thread[.#]report_on_exception with a default value set to false. It was released in Ruby 2.4.0.

Let’s try enabling report_on_exception globally using Thread.report_on_exception.

Thread.report_on_exception = true

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_4.rb

Calculating 4/0 in division_thread

#<Thread:0x007fb10f018200@thread_example_4.rb:3 run> terminated with exception:
thread_example_4.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_4.rb:5:in `block in <main>'

Calculating nil+4 in addition_thread

#<Thread:0x007fb10f01aca8@thread_example_4.rb:9 run> terminated with exception:
thread_example_4.rb:11:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

It now reports the exceptions in all threads. It prints that the Thread:0x007fb10f018200 was terminated with exception: divided by 0 (ZeroDivisionError). Similarly, another thread Thread:0x007fb10f01aca8 was terminated with exception: undefined method '+' for nil:NilClass (NoMethodError).

Instead of enabling it globally for all threads, we can enable it for a particular thread using instance-level Thread#report_on_exception.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  Thread.current.report_on_exception = true

  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"

In the above case we have enabled report_on_exception flag just for addition_thread.

Let’s execute it.

$ RBENV_VERSION=2.4.0 ruby thread_example_5.rb

Calculating 4/0 in division_thread

Calculating nil+4 in addition_thread

#<Thread:0x007f8e6b007f70@thread_example_5.rb:7 run> terminated with exception:
thread_example_5.rb:11:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

Notice how it didn’t report the exception which killed thread division_thread. As expected, it reported the exception that killed thread addition_thread.

With the above changes ruby reports the exception as soon as it encounters. However if these threads are joined then they will still raise exception.

division_thread = Thread.new do
  Thread.current.report_on_exception = true

  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

begin
  division_thread.join
rescue => exception
  puts "Explicitly caught - #{exception.class}: #{exception.message}"
end

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_6.rb

Calculating 4/0 in division_thread

#<Thread:0x007f969d00d828@thread_example_6.rb:1 run> terminated with exception:
thread_example_6.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_6.rb:5:in `block in <main>'

Explicitly caught - ZeroDivisionError: divided by 0

In the main thread

See how we were still be able to handle the exception raised in division_thread above after joining it despite it reported it before due to Thread#report_on_exception flag.

report_on_exception defaults to true in Ruby 2.5

Benoit Daloze (eregon) strongly advocated that both the Thread.report_on_exception and Thread#report_on_exception should have default value as true. Here is the relevant feature request.

After approval from Matz, Benoit Daloze pushed the implementation by fixing the failing tests and silencing the unnecessary verbose warnings.

It was released as part of Ruby 2.5.

Now in ruby 2.5 we can simply write like this.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"

Let’s execute it with Ruby 2.5.

$ RBENV_VERSION=2.5.0 ruby thread_example_7.rb

Calculating 4/0 in division_thread

#<Thread:0x00007f827689a238@thread_example_7.rb:1 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
  1: from thread_example_7.rb:3:in `block in <main>'
thread_example_7.rb:3:in `/': divided by 0 (ZeroDivisionError)

Calculating nil+4 in addition_thread

#<Thread:0x00007f8276899b58@thread_example_7.rb:7 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
thread_example_7.rb:9:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

We can disable the thread exception reporting globally using Thread.report_on_exception = false or for a particular thread using Thread.current.report_on_exception = false.

Future Possibilities

In addition to this feature, Charles Nutter also suggested that it will be good if there exists a callback handler which can accept a block to be executed when a thread dies due to an exception. The callback handler can be at global level or it can be for a specific thread.

Thread.on_exception do
  # some stuff
end

In the absence of such handler libraries need to resort to custom code to handle exceptions. Here is how Sidekiq handles exceptions raised in threads.

Important thing to note is that report_on_exception does not change behavior of the code. It does more reporting when a thread dies and when it comes to thread dies more reporting is a good thing.

Rails 5.2 added Date#prev_occurring and Date#next_occurring to return specified next & previous occurring day of week

This blog is part of our Rails 5.2 series.

Before Rails 5.2, this is how we would write to find next or previous occurring day of the week.

Assume that current date is Tue, 27 Feb 2018.

# find previous thursday
>> Date.yesterday.beginning_of_week(:thursday)
=> Thu, 22 Feb 2018

# find next thursday
>> Date.tomorrow.end_of_week(:friday)
=> Thu, 01 Mar 2018

Rails 5.2 has introduced methods Date#prev_occurring and Date#next_occurring to find next & previous occurring day of the week.

# find previous thursday
>> Date.prev_occurring(:thursday)
=> Thu, 22 Feb 2018

# find next thursday
>> Date.next_occurring(:thursday)
=> Thu, 01 Mar 2018

Ruby 2.5 supports measuring branch and method coverages

Ruby comes with Coverage, a simple standard library for test coverage measurement for a long time.

Before Ruby 2.5

Before Ruby 2.5, we could measure just the line coverage using Coverage.

Line coverage tells us whether a line is executed or not. If executed, then how many times that line was executed.

We have a file called score.rb.

score = 33

if score >= 40
  p :PASSED
else
  p :FAILED
end

Now create another file score_coverage.rb.

require "coverage"

Coverage.start
load "score.rb"
p Coverage.result

We used Coverage#start method to measure the coverage of score.rb file. Coverage#result returns the coverage result.

Let’s run it with Ruby 2.4.

$ RBENV_VERSION=2.4.0 ruby score_coverage.rb
:FAILED
{ "score.rb"=> [1, nil, 1, 0, nil, 1, nil] }

Let’s look at the output. Each value in the array [1, nil, 1, 0, nil, 1, nil] denotes the count of line executions by the interpreter for each line in score.rb file.

This array is also called the “line coverage” of score.rb file.

A nil value in line coverage array means coverage is disabled for that particular line number or it is not a relevant line. Lines like else, end and blank lines have line coverage disabled.

Here’s how we can read above line coverage result.

  • Line number 1 (i.e. 0th index in the above result array) was executed once.
  • Coverage was disabled for line number 2 (i.e. index 1) as it is blank.
  • Line number 3 (i.e. index 2) was executed once.
  • Line number 4 did not execute.
  • Coverage was disabled for line number 5 as it contains only else clause.
  • Line number 6 was executed once.
  • Coverage was disabled for line number 7 as it contains just end keyword.

After Ruby 2.5

There was a pull request opened in 2014 to add method coverage and decision coverage metrics in Ruby. It was rejected by Yusuke Endoh as he saw some issues with it and mentioned that he was also working on a similar implementation.

In Ruby 2.5, Yusuke Endoh added branch coverage and method coverage feature to the Coverage library.

Let’s see what’s changed in Coverage library in Ruby 2.5.

Line Coverage

If we execute above example using Ruby 2.5, we will see no change in the result.

$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" => [1, nil, 1, 0, nil, 1, nil] }

This behavior is maintained to ensure that the Coverage#start API stays 100% backward compatible.

If we explicitly enable lines option on Coverage#start method in the above score_coverage.rb file, the coverage result will be different now.

require "coverage"

Coverage.start(lines: true)
load "score.rb"
p Coverage.result
$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" => {
    :lines => [1, nil, 1, 0, nil, 1, nil]
  }
}

We can see that the coverage result is now a hash which reads that the score.rb file has lines coverage as [1, nil, 1, 0, nil, 1, nil].

Branch Coverage

Branch coverage helps us identify which branches are executed and which ones are not executed.

Let’s see how to get branch coverage.

We will update the score_coverage.rb by enabling branches option.

require "coverage"

Coverage.start(branches: true)
load "score.rb"
p Coverage.result
$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" =>
  { :branches => {
      [:if, 0, 3, 0, 7, 3] => {
        [:then, 1, 4, 2, 4, 15] => 0,
        [:else, 2, 6, 2, 6, 15] => 1
      }
    }
  }
}

Here is how to read the data in array.

[
  BRANCH_TYPE,
  UNIQUE_ID,
  START_LINE_NUMBER,
  START_COLUMN_NUMBER,
  END_LINE_NUMBER,
  END_COLUMN_NUMBER
]

Please note that column numbers start from 0 and line numbers start from 1.

Let’s try to read above printed branch coverage result.

[:if, 0, 3, 0, 7, 3] reads that if statement starts at line 3 & column 0 and ends at line 7 & column 3.

[:then, 1, 4, 2, 4, 15] reads that then clause starts at line 4 & column 2 and ends at line 4 & column 15.

Similarly, [:else, 2, 6, 2, 6, 15] reads that else clause starts at line 6 & column 2 and ends at line 6 & column 15.

Most importantly as per the branch coverage format, we can see that the branch from if to then was never executed since COUNTER is 0. The another branch from if to else was executed once since COUNTER is 1.

Method Coverage

Measuring method coverage helps us identify which methods were invoked and which were not.

We have a file grade_calculator.rb.

students_scores = { "Sam" => [53, 91, 72],
                    "Anna" => [91, 97, 95],
                    "Bob" => [33, 69, 63] }

def average(scores)
  scores.reduce(&:+)/scores.size
end

def grade(average_score)
  case average_score
  when 90.0..100.0 then :A
  when 80.0..90.0 then :B
  when 70.0..80.0 then :C
  when 60.0..70.0 then :D
  else :F
  end
end

def greet
  puts "Congratulations!"
end

def warn
  puts "Try hard next time!"
end


students_scores.each do |student_name, scores|
  achieved_grade = grade(average(scores))

  puts "#{student_name}, you've got '#{achieved_grade}' grade."

  if achieved_grade == :A
    greet
  elsif achieved_grade == :F
    warn
  end

  puts
end

To measure method coverage of above file, let’s create grade_calculator_coverage.rb by enabling methods option on Converage#start method.

require "coverage"

Coverage.start(methods: true)
load "grade_calculator.rb"
p Coverage.result

Let’s run it using Ruby 2.5.

$ RBENV_VERSION=2.5.0 ruby grade_calculator_coverage.rb
Sam, you've got 'C' grade.

Anna, you've got 'A' grade.
Congratulations!

Bob, you've got 'F' grade.
Try hard next time!

{ "grade_calculator.rb" => {
    :methods => {
      [Object, :warn, 23, 0, 25, 3] => 1,
      [Object, :greet, 19, 0, 21, 3] => 1,
      [Object, :grade, 9, 0, 17, 3] => 3,
      [Object, :average, 5, 0, 7, 3] => 3
    }
  }
}

The format of method coverage result is defined as shown below.

[ CLASS_NAME,
  METHOD_NAME,
  START_LINE_NUMBER,
  START_COLUMN_NUMBER,
  END_LINE_NUMBER,
  END_COLUMN_NUMBER ]

Therefore, [Object, :grade, 9, 0, 17, 3] => 3 reads that the Object#grade method which starts from line 9 & column 0 to line 17 & column 3 was invoked 3 times.

Conclusion

We can measure all coverages at once also.

Coverage.start(lines: true, branches: true, methods: true)

What’s the use of these different types of coverages anyway?

Well, one use case is to integrate this in a test suite and to determine which lines, branches and methods are executed and which ones are not executed by the test. Further, we can sum up these and evaluate total coverage of a test suite.

Author of this feature, Yusuke Endoh (mame) has released coverage-helpers gem which allows further advanced manipulation and processing of coverage results obtained using Coverage#result.

Gpg decryption without pin entry pop up using GPGME

In one of our projects, we implemented GPG decryption.

What is GPG ?

GPG is a complete and free implementation of the OpenPGP standard as defined by RFC4880 (also known as PGP).

We used GPGME gem for this purpose. It provides three levels of API. In our case, we used Crypto which has the high level convenience methods to encrypt, decrypt, sign and verify signatures.

We needed to import private key for decrypting a file that was encrypted using paired public key. First let’s import the required private key.

GPGME::Key.import File.open('certs/pgp.key')

Let’s decrypt the file.

crypto = GPGME::Crypto.new
options = { output: File.open('file.csv', 'wb') }

crypto.decrypt File.open('file.csv.gpg'), options

Above code has one problem. It will open a pop up for password input that has been used when public and private keys have been generated.

To support password input without pop up, we updated the code as below.

crypto = GPGME::Crypto.new
options = {
            output: File.open('file.csv', 'wb'),
            pinentry_mode: GPGME::PINENTRY_MODE_LOOPBACK,
            password: 'welcome'
          }

crypto.decrypt File.open('file.csv.gpg'), options

Here, pinentry_mode option allows password input without pop up.

We did not use latest version of GPG since it does not support pinentry_mode option. Instead, We used 2.1.20 version which has support for this option. Here is the build instruction for that.

Practical usage of identity function

If you are learning functional programming then you can’t go far without running into “identity function”.

An identity function is a very basic function that

  • takes one argument
  • returns the argument
f(x) = x;

This seems like the most useless function in the world. We never needed any function like this while building any application. Then what’s the big deal about this identity function.

In this blog we will see how this identity concept is used in the real world.

For the implementation we will be using Ramda.js. We previously wrote about how we, at BigBinary, write JavaScript code using Ramda.js.

Again please note that in the following code R stands for Ramda and not for programming language R.

Example 1

Here is JavaScript code.

if (x) return x;
return [];

Here is same code using Ramda.js.

R.ifElse(
  R.isNil,
  () => [],
  R.identity
);

try it

Example 2

Here we will use identity as the return value in the default case.

R.cond([
  [R.equals(0), R.always("0")],
  [R.equals(10), R.always("10")],
  [R.T, R.identity]
]);

try it

Example 3

Get the unique items from the list.

R.uniqBy(R.identity, [1,1,2])

try it

Example 4

Count occurrences of items in the list.

R.countBy(R.identity, ["a","a","b","c","c","c"]);

try it

Example 5

Begin value from zero all the way to n-1.

R.times(R.identity, 5)

try it

Example 6

Filter truthy values.

R.filter(R.identity, [{a:1}, false, {b:2}, true, '', undefined, null, 0, {}, 1]);

try it

Ruby 2.5 adds Exception#full_message method

This blog is part of our Ruby 2.5 series.

Before Ruby 2.5, if we want to log a caught exception, we would need to format it ourselves.

class AverageService
  attr_reader :numbers, :coerced_numbers

  def initialize(numbers)
    @numbers = numbers
    @coerced_numbers = coerce_numbers
  end

  def average
    sum / count
  end

  private

  def coerce_numbers
    numbers.map do |number|
      begin
        Float(number)
      rescue Exception => exception
        puts "#{exception.message} (#{exception.class})\n\t#{exception.backtrace.join("\n\t")}"
        puts "Coercing '#{number}' as 0.0\n\n"

        0.0
      end
    end
  end

  def sum
    coerced_numbers.map(&:to_f).sum
  end

  def count
    coerced_numbers.size.to_f
  end
end

average = AverageService.new(ARGV).average
puts "Average is: #{average}"
$ RBENV_VERSION=2.4.0 ruby average_service.rb 5 4f 7 1s0
invalid value for Float(): "4f" (ArgumentError)
	average_service.rb:18:in `Float'
	average_service.rb:18:in `block in coerce_numbers'
	average_service.rb:16:in `map'
	average_service.rb:16:in `coerce_numbers'
	average_service.rb:6:in `initialize'
	average_service.rb:37:in `new'
	average_service.rb:37:in `<main>'

Coercing '4f' as 0.0

invalid value for Float(): "1s0" (ArgumentError)
	average_service.rb:18:in `Float'
	average_service.rb:18:in `block in coerce_numbers'
	average_service.rb:16:in `map'
	average_service.rb:16:in `coerce_numbers'
	average_service.rb:6:in `initialize'
	average_service.rb:37:in `new'
	average_service.rb:37:in `<main>'

Coercing '1s0' as 0.0

Average of [5.0, 0.0, 7.0, 0.0] is: 3.0

It was proposed that there should be a simple method to print the caught exception using the same format that ruby uses while printing an uncaught exception.

Some of the proposed method names were display, formatted, to_formatted_s, long_message, and full_message.

Matz approved the Exception#full_message method name.

In Ruby 2.5, we can re-write above example as follows.

class AverageService
  attr_reader :numbers, :coerced_numbers

  def initialize(numbers)
    @numbers = numbers
    @coerced_numbers = coerce_numbers
  end

  def average
    sum / count
  end

  private

  def coerce_numbers
    numbers.map do |number|
      begin
        Float(number)
      rescue Exception => exception
        puts exception.full_message
        puts "Coercing '#{number}' as 0.0\n\n"

        0.0
      end
    end
  end

  def sum
    coerced_numbers.map(&:to_f).sum
  end

  def count
    coerced_numbers.size.to_f
  end
end

average = AverageService.new(ARGV).average
puts "Average is: #{average}"
$ RBENV_VERSION=2.5.0 ruby average_service.rb 5 4f 7 1s0
Traceback (most recent call last):
	6: from average_service.rb:37:in `<main>'
	5: from average_service.rb:37:in `new'
	4: from average_service.rb:6:in `initialize'
	3: from average_service.rb:16:in `coerce_numbers'
	2: from average_service.rb:16:in `map'
	1: from average_service.rb:18:in `block in coerce_numbers'
average_service.rb:18:in `Float': invalid value for Float(): "4f" (ArgumentError)

Coercing '4f' as 0.0

Traceback (most recent call last):
	6: from average_service.rb:37:in `<main>'
	5: from average_service.rb:37:in `new'
	4: from average_service.rb:6:in `initialize'
	3: from average_service.rb:16:in `coerce_numbers'
	2: from average_service.rb:16:in `map'
	1: from average_service.rb:18:in `block in coerce_numbers'
average_service.rb:18:in `Float': invalid value for Float(): "1s0" (ArgumentError)

Coercing '1s0' as 0.0

Average of [5.0, 0.0, 7.0, 0.0] is: 3.0

Note that, Ruby 2.5 prints exception backtrace in reverse order if STDERR is unchanged and is a TTY as discussed in our previous blog post.

Ruby 2.5 prints backtrace and error message in reverse order

This blog is part of our Ruby 2.5 series.

Stack trace or backtrace is a sequential representation of the stack of method calls in a program which gets printed when an exception is raised. It is often used to find out the exact location in a program from where the exception was raised.

Before Ruby 2.5

Before Ruby 2.5, the printed backtrace contained the exception class and the error message at the top. Next line contained where in the program the exception was raised. Next we got more lines which contained cascaded method calls.

Consider a simple Ruby program.

class DivisionService
  attr_reader :a, :b

  def initialize(a, b)
    @a, @b = a.to_i, b.to_i
  end

  def divide
    puts a / b
  end
end

DivisionService.new(ARGV[0], ARGV[1]).divide

Let’s execute it using Ruby 2.4.

$ RBENV_VERSION=2.4.0 ruby division_service.rb 5 0

division_service.rb:9:in `/': divided by 0 (ZeroDivisionError)
	from division_service.rb:9:in `divide'
	from division_service.rb:13:in `<main>'

In the printed backtrace above, the first line shows the location, error message and the exception class name; whereas the subsequent lines shows the caller method names and their locations. Each line in the backtrace above is often considered as a stack frame placed on the call stack.

Most of the times, a backtrace has so many lines that it makes it very difficult to fit the whole backtrace in the visible viewport of the terminal.

Since the backtrace is printed in top to bottom order the meaningful information like error message, exception class and the exact location where the exception was raised is displayed at top of the backtrace. It means developers often need to scroll to the top in the terminal window to find out what went wrong.

After Ruby 2.5

Over 4 years ago an issue was created to make printing of backtrace in reverse order configurable.

After much discussion Nobuyoshi Nakada made the commit to print backtrace and error message in reverse order only when the error output device (STDERR) is a TTY (i.e. a terminal). Message will not be printed in reverse order if the original STDERR is attached to something like a File object.

Look at the code here where the check happens if STDERR is a TTY and is unchanged.

Let’s execute the same program using Ruby 2.5.

$ RBENV_VERSION=2.5.0 ruby division_service.rb 5 0

Traceback (most recent call last):
	2: from division_service.rb:13:in `<main>'
	1: from division_service.rb:9:in `divide'
division_service.rb:9:in `/': divided by 0 (ZeroDivisionError)

$

We can notice two new changes in the above backtrace.

  1. The error message and exception class is printed last (i.e. at the bottom).
  2. The stack also adds frame number when printing in reverse order.

This feature makes the debugging convenient when the backtrace size is a quite big and cannot fit in the terminal window. We can easily see the error message without scrolling up now.

Note that, the Exception#backtrace attribute still holds an array of stack frames like before in the top to bottom order.

So if we rescue the caught exception and print the backtrace manually

class DivisionService
  attr_reader :a, :b

  def initialize(a, b)
    @a, @b = a.to_i, b.to_i
  end

  def divide
    puts a / b
  end
end

begin
  DivisionService.new(ARGV[0], ARGV[1]).divide
rescue Exception => e
  puts "#{e.class}: #{e.message}"
  puts e.backtrace.join("\n")
end

we will get the old behavior.

$ RBENV_VERSION=2.5.0 ruby division_service.rb 5 0

ZeroDivisionError: divided by 0
division_service.rb:9:in `/'
division_service.rb:9:in `divide'
division_service.rb:16:in `<main>'

$

Also, note that if we assign STDERR with a File object, thus making it a non-TTY

puts "STDERR is a TTY? [before]: #{$stderr.tty?}"
$stderr = File.new("stderr.log", "w")
$stderr.sync = true
puts "STDERR is a TTY? [after]: #{$stderr.tty?}"

class DivisionService
  attr_reader :a, :b

  def initialize(a, b)
    @a, @b = a.to_i, b.to_i
  end

  def divide
    puts a / b
  end
end

DivisionService.new(ARGV[0], ARGV[1]).divide

we can get the old behavior but the backtrace would be written to the specified file and not to STDERR.

$ RBENV_VERSION=2.5.0 ruby division_service.rb 5 0

STDERR is a TTY? [before]: true
STDERR is a TTY? [after]: false

$ cat stderr.log

division_service.rb:14:in `/': divided by 0 (ZeroDivisionError)
	from division_service.rb:14:in `divide'
	from division_service.rb:18:in `<main>'

$

This feature has been tagged as experimental feature. What it means is that Ruby team is gathering feedback on this feature.

Deploying Ruby on Rails application using HAProxy Ingress with unicorn/puma and websockets

After months of testing we recently moved a Ruby on Rails application to production that is using Kubernetes cluster.

In this article we will discuss how to setup path based routing for a Ruby on Rails application in kubernetes using HAProxy ingress.

This post assumes that you have basic understanding of Kubernetes terms like pods, deployments, services, configmap and ingress.

Typically our Rails app has services like unicorn/puma, sidekiq/delayed-job/resque, Websockets and some dedicated API services. We had one web service exposed to the world using load balancer and it was working well. But as the traffic increased it became necessary to route traffic based on URLs/path.

However Kubernetes does not supports this type of load balancing out of the box. There is work in progress for alb-ingress-controller to support this but we could not rely on it for production usage as it is still in alpha.

The best way to achieve path based routing was to use ingress controller.

We researched and found that there are different types of ingress available in k8s world.

  1. nginx-ingress
  2. ingress-gce
  3. HAProxy-ingress
  4. traefik
  5. voyager

We experimented with nginx-ingress and HAProxy and decided to go with HAProxy. HAProxy has better support for Rails websockets which we needed in the project.

We will walk you through step by step on how to use haproxy ingress in a Rails app.

Configuring Rails app with HAProxy ingress controller

Here is what we are going to do.

  • Create a Rails app with different services and deployments.
  • Create tls secret for SSL.
  • Create HAProxy ingress configmap.
  • Create HAProxy ingress controller.
  • Expose ingress with service type LoadBalancer
  • Setup app DNS with ingress service.
  • Create different ingress rules specifying path based routing.
  • Test the path based routing.

Now let’s build Rails application deployment manifest for services like web(unicorn),background(sidekiq), Websocket(ruby thin),API(dedicated unicorn).

Here is our web app deployment and service template.

---
apiVersion: v1
kind: Deployment
metadata:
  name: test-production-web
  labels:
    app: test-production-web
  namespace: test
spec:
  template:
    metadata:
      labels:
        app: test-production-web
    spec:
      containers:
      - image: <your-repo>/<your-image-name>:latest
        name: test-production
        imagePullPolicy: Always
       env:
        - name: POSTGRES_HOST
          value: test-production-postgres
        - name: REDIS_HOST
          value: test-production-redis
        - name: APP_ENV
          value: production
        - name: APP_TYPE
          value: web
        - name: CLIENT
          value: test
        ports:
        - containerPort: 80
      imagePullSecrets:
        - name: registrykey
---
apiVersion: v1
kind: Service
metadata:
  name: test-production-web
  labels:
    app: test-production-web
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: test-production-web

Here is background app deployment and service template.

---
apiVersion: v1
kind: Deployment
metadata:
  name: test-production-background
  labels:
    app: test-production-background
  namespace: test
spec:
  template:
    metadata:
      labels:
        app: test-production-background
    spec:
      containers:
      - image: <your-repo>/<your-image-name>:latest
        name: test-production
        imagePullPolicy: Always
       env:
        - name: POSTGRES_HOST
          value: test-production-postgres
        - name: REDIS_HOST
          value: test-production-redis
        - name: APP_ENV
          value: production
        - name: APP_TYPE
          value: background
        - name: CLIENT
          value: test
        ports:
        - containerPort: 80
      imagePullSecrets:
        - name: registrykey
---
apiVersion: v1
kind: Service
metadata:
  name: test-production-background
  labels:
    app: test-production-background
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: test-production-background

Here is websocket app deployment and service template.

---
apiVersion: v1
kind: Deployment
metadata:
  name: test-production-websocket
  labels:
    app: test-production-websocket
  namespace: test
spec:
  template:
    metadata:
      labels:
        app: test-production-websocket
    spec:
      containers:
      - image: <your-repo>/<your-image-name>:latest
        name: test-production
        imagePullPolicy: Always
       env:
        - name: POSTGRES_HOST
          value: test-production-postgres
        - name: REDIS_HOST
          value: test-production-redis
        - name: APP_ENV
          value: production
        - name: APP_TYPE
          value: websocket
        - name: CLIENT
          value: test
        ports:
        - containerPort: 80
      imagePullSecrets:
        - name: registrykey
---
apiVersion: v1
kind: Service
metadata:
  name: test-production-websocket
  labels:
    app: test-production-websocket
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: test-production-websocket

Here is API app deployment and service info.

---
apiVersion: v1
kind: Deployment
metadata:
  name: test-production-api
  labels:
    app: test-production-api
  namespace: test
spec:
  template:
    metadata:
      labels:
        app: test-production-api
    spec:
      containers:
      - image: <your-repo>/<your-image-name>:latest
        name: test-production
        imagePullPolicy: Always
       env:
        - name: POSTGRES_HOST
          value: test-production-postgres
        - name: REDIS_HOST
          value: test-production-redis
        - name: APP_ENV
          value: production
        - name: APP_TYPE
          value: api
        - name: CLIENT
          value: test
        ports:
        - containerPort: 80
      imagePullSecrets:
        - name: registrykey
---
apiVersion: v1
kind: Service
metadata:
  name: test-production-api
  labels:
    app: test-production-api
  namespace: test
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: test-production-api

Let’s launch this manifest using kubectl apply.

$ kubectl apply -f test-web.yml -f test-background.yml -f test-websocket.yml -f test-api.yml
deployment "test-production-web" created
service "test-production-web" created
deployment "test-production-background" created
service "test-production-background" created
deployment "test-production-websocket" created
service "test-production-websocket" created
deployment "test-production-api" created
service "test-production-api" created

Once our app is deployed and running we should create HAProxy ingress. Before that let’s create a tls secret with our SSL key and certificate.

This is also used to enable HTTPS for app URL and to terminate it on L7.

$ kubectl create secret tls tls-certificate --key server.key --cert server.pem

Here server.key is our SSL key and server.pem is our SSL certificate in pem format.

Now let’s Create HAProxy controller resources.

HAProxy configmap

For all the available configuration parameters from HAProxy refer here.

apiVersion: v1
data:
    dynamic-scaling: "true"
    backend-server-slots-increment: "4"
kind: ConfigMap
metadata:
  name: haproxy-configmap
  namespace: test

HAProxy Ingress controller deployment

Deployment template for the Ingress controller with at-least 2 replicas to manage rolling deploys.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    run: haproxy-ingress
  name: haproxy-ingress
  namespace: test
spec:
  replicas: 2
  selector:
    matchLabels:
      run: haproxy-ingress
  template:
    metadata:
      labels:
        run: haproxy-ingress
    spec:
      containers:
      - name: haproxy-ingress
        image: quay.io/jcmoraisjr/haproxy-ingress:v0.5-beta.1
        args:
        - --default-backend-service=$(POD_NAMESPACE)/test-production-web
        - --default-ssl-certificate=$(POD_NAMESPACE)/tls-certificate
        - --configmap=$(POD_NAMESPACE)/haproxy-configmap
        - --ingress-class=haproxy
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        - name: stat
          containerPort: 1936
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

Notable fields in above manifest are arguments passed to controller.

--default-backend-service is the service when No rule is matched your request will be served by this app.

In our case it is test-production-web service, But it can be custom 404 page or whatever better you think.

--default-ssl-certificate is the SSL secret we just created above this will terminate SSL on L7 and our app is served on HTTPS to outside world.

HAProxy Ingress service

This is the LoadBalancer type service to allow client traffic to reach our Ingress Controller.

LoadBalancer has access to both public network and internal Kubernetes network while retaining the L7 routing of the Ingress Controller.

apiVersion: v1
kind: Service
metadata:
  labels:
    run: haproxy-ingress
  name: haproxy-ingress
  namespace: test
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  - name: stat
    port: 1936
    protocol: TCP
    targetPort: 1936
  selector:
    run: haproxy-ingress

Now let’s apply all the manifests of HAProxy.

$ kubectl apply -f haproxy-configmap.yml -f haproxy-deployment.yml -f haproxy-service.yml
configmap "haproxy-configmap" created
deployment "haproxy-ingress" created
service "haproxy-ingress" created

Once all the resources are running get the LoadBalancer endpoint using.

$ kubectl -n test get svc haproxy-ingress -o wide

NAME               TYPE           CLUSTER-IP       EXTERNAL-IP                                                            PORT(S)                                     AGE       SELECTOR
haproxy-ingress   LoadBalancer   100.67.194.186   a694abcdefghi11e8bc3b0af2eb5c5d8-806901662.us-east-1.elb.amazonaws.com   80:31788/TCP,443:32274/TCP,1936:32157/TCP   2m        run=ingress

DNS mapping with application URL

Once we have ELB endpoint of ingress service, map the DNS with URL like test-rails-app.com.

Ingress Implementation

Now after doing all the hard work it is time to configure ingress and path based rules.

In our case we want to have following rules.

https://test-rails-app.com requests to be served by test-production-web.

https://test-rails-app.com/websocket requests to be served by test-production-websocket.

https://test-rails-app.com/api requests to be served by test-production-api.

Let’s create a ingress manifest defining all the rules.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress
  namespace: test
spec:
  tls:
    - hosts:
      - test-rails-app.com
      secretName: tls-certificate
  rules:
    - host: test-rails-app.com
      http:
        paths:
          - path: /
            backend:
              serviceName: test-production-web
              servicePort: 80
          - path: /api
            backend:
              serviceName: test-production-api
              servicePort: 80
          - path: /websocket
            backend:
              serviceName: test-production-websocket
              servicePort: 80

Moreover there are Ingress Annotations for adjusting configuration changes.

As expected,now our default traffic on / is routed to test-production-web service.

/api is routed to test-production-api service.

/websocket is routed to test-production-websocket service.

Thus ingress implementation solves our purpose of path based routing and terminating SSL on L7 on Kubernetes.

Rails 5.2 adds default option to module and class attribute accessors

This blog is part of our Rails 5.2 series.

When DHH introduced support for specifying a default value for class_attribute, Genadi Samokovarov brought to notice that the module and class attribute accessor macros also support specifying a default value but using a block and not with a default option.

To have consistent and symmetrical behaviour across all the attribute extensions, it was decided to support specifying a default value using default option for all the module and class attribute macros as well.

mattr_accessor, mattr_reader and mattr_writer macros generate getter and setter methods at the module level.

Similarly, cattr_accessor, cattr_reader, and cattr_writer macros generate getter and setter methods at the class level.

Before Rails 5.2

Before Rails 5.2, this is how we would set the default values for the module and class attribute accessor macros.

module ActivityLoggerHelper
  mattr_accessor :colorize_logs
  mattr_writer :log_ip { false }

  self.colorize_logs = true
end

class ActivityLogger
  include ActivityLoggerHelper

  cattr_writer :logger { Logger.new(STDOUT) }
  cattr_accessor :level
  cattr_accessor :settings
  cattr_reader :pid { Process.pid }

  @@level = Logger::DEBUG
  self.settings = {}
end

After Rails 5.2

We can still set a default value of a module or class attribute accessor by providing a block. In this pull request, support for specifying a default value using a new default option has been introduced.

So instead of

cattr_writer :logger { Logger.new(STDOUT) }

or

cattr_writer :logger
self.logger = Logger.new(STDOUT)

or

cattr_writer :logger
@@logger = Logger.new(STDOUT)

we can now easily write

cattr_writer :logger, default: Logger.new(STDOUT)

Same applies to the other attribute accessor macros like mattr_accessor, mattr_reader, mattr_writer, cattr_accessor, and cattr_reader.

Note that, the old way of specifying a default value using the block syntax will work but will not be documented anywhere.

Also, note that if we try to set the default value by both ways i.e. by providing a block as well as by specifying a default option; the value provided by default option will always take the precedence.

mattr_accessor(:colorize_logs, default: true) { false }

Here, @@colorize_logs would be set with true as per the above precedence rule.

Here is a test which verifies this behavior.

Finally, here is simplified version using the new default option.

module ActivityLoggerHelper
  mattr_accessor :colorize_logs, default: true
  mattr_writer :log_ip, default: false
end

class ActivityLogger
  include ActivityLoggerHelper

  cattr_writer :logger, default: Logger.new(STDOUT)
  cattr_accessor :level, default: Logger::DEBUG
  cattr_accessor :settings, default: {}
  cattr_reader :pid, default: Process.pid
end

Rails 5.2 supports specifying default value for a class_attribute

This blog is part of our Rails 5.2 series.

It is very common to set a default value for a class_attribute.

Before Rails 5.2, to specify a default value for a class_attribute, we needed to write like this.

class ActivityLogger
  class_attribute :logger
  class_attribute :settings

  self.logger = Logger.new(STDOUT)
  self.settings = {}
end

As we can see above, it requires additional keystrokes to set a default value for each class_attribute.

Rails 5.2 has added support for specifying a default value for a class_attribute using default option.

class ActivityLogger
  class_attribute :logger, default: Logger.new(STDOUT)
  class_attribute :settings, default: {}
end

This enhancement was introduced in this pull request.