Our Thoughts on iOS 12

iOS 12 on iPhone 8 Red

Apple announced iOS 12 at WWDC 2018 a few days back. Being honest, it was a bit disappointing to see some of the most requested features not being there in iOS 12. Like users have been asking for Dark mode since before iOS 11, and the ability to set default apps. It’s more of an update focussed on performance improvements, usability, and compatibility. The fact that iOS 12 is also available for the iPhones Apple released 5 years back is a great effort from Apple to keep users happy. And unlike last couple of years, this time we decided to calm our curiosities and installed iOS 12 beta 1 on our devices right away after Apple released it for developers. This blog is based on our experience with iOS 12 on iPhone 8 Plus.

Installing iOS 12 on your iPhone

First things first, make sure you have iPhone 5s or newer. And before getting started, plug-in your phone to iTunes and take a full-backup in case your phone gets bricked while installing iOS 12, which is very unlikely.

Once done, download and install beta profile for iOS 12 and then download and update from “Software Update” section just like you install a regular iOS update. It’s a straightforward OTA update process which you’re already familiar with.

Note: This beta profile is from a third-party developer and is not officially from Apple. Apple will officially release public beta in around a month.

We’ve been running iOS 12 since last week now and here are our thoughts on the additions and changes introduced in iOS 12.

iOS 12 is fast

The performance improvements are significant and definitely noticeable. Previously on iOS 11, while accessing spotlight by swiping down from home screen, it used to lag. And not just that, sometimes keyboard didn’t even used to show up and we had to repeat the same action to make it work. But things are faster and better in iOS 12. The keyboard gets up as soon as spotlight shows up.

Another thing that we’ve noticed is the multitasking shortcut for switching to the last app by 3d touching on left wasn’t that reliable in iOS 11, so much that it was easy to ignore the feature altogether than to use it, but in iOS 12 the same shortcut is very smooth. Although there are times when it still doesn’t work well, but that’s very rare.

Spotlight widgets load faster in iOS 12 than they used to in iOS 11. Apart from this, 3d touch feels pretty smooth too. It’s good to see Apple squeezing out the power to improve the already good performance in iOS.

Notifications

Notifications in iOS 12

Notifications in iOS 11 are a mess, there’s no grouping, there’s no way to quickly control notifications, there’s no way to set priorities. Apple has added notifications grouping and better notification management in iOS 12, so now notifications from the same app are grouped together and you can control how often you want to get notifications from the apps right from the notification.

We think the implementation can be a whole lot better. For the notifications that are grouped, you get to see only the last notification from that app, a better way would’ve been to show two or three notifications and cut the rest. There’s no Notification pinning or snoozing which could’ve been very useful features.

Screen time and App limits

There’s a new feature in iOS 12 called Screen Time which is more like bird’s-eye view of your phone usage. There’s a saying that you can’t improve something that you can’t measure. Screen Time is a feature that’s going to be very useful for everyone who wants to cut down time on Social apps or overall phone usage. It shows you every little detail of how many apps you use and for how much time and at what times. Not only this, it also keeps track of how many times you picked up your phone, and how many notifications you receive from the apps you have on your phone.

Screen Time in iOS 12

Other useful sub-feature of Screen time is App limits, which allows you to set limit on app usage based on app or category. So let’s say you don’t want to use WhatsApp for more than 30 mins a day, you can do that through App limits. It works for app categories including Games, Social Networking, Entertainment, Creativity, Productivity, Education, Reading & Reference, Health & Fitness. So you can limit it via category which works across apps. Plus, it syncs across your other iOS devices, so you can’t cheat that way.

Siri and Shortcuts app

Siri Shortcuts in iOS 12

In iOS 12, you can assign custom shortcuts to Siri to trigger specific actions, which not only works for System apps but also with third-party apps. So now if you want to send a specific message to someone on WhatsApp, you can assign a command for that to Siri and you can trigger that action just from Siri using that command.

Apple has also introduced a new app in iOS 12 called Shortcuts. Shortcuts app lets you group actions and run those actions quickly. Although Shortcuts app isn’t there in iOS 12 beta 1 but we think it’s one of best addition in iOS 12.

Updated Photos app

Photos app now has new section called “For you”, where it shows all the new Albums, your best moments, share suggestions, photos and effect suggestions. This is more like the Assistant tab of Google Photos app. Also you can share selected photos or albums with your friends right from the Photos app.

The Album tab in Photos app is redesigned for easier navigation. Also there’s a new tab for Search which has been advanced, so you can now search photos using tags like “surfing” and “vacation”.

It’s good to see Apple paying attention to up the Photos app but we still think Google Photos is a better option for average person considering it lets you store photos in Cloud for free. Also photo organization in Google Photos is much better than in new Photos app in iOS 12.

Enhanced Do Not Disturb Mode

Do Not Disturb in iOS 12 is enhanced to be more flexible. You can now enable Do Not Disturb mode to end automatically in an hour, or at night, or according to your Calendar events, or even based on your location.

Not just that, Do Not Disturb has a new Bedtime mode enabling which will shut all your notifications during bed time and dim your display. And when you will wake up, it’ll show you a welcome back message along with weather details on the lock screen.

Conclusion

There are other updates and under the hood improvements as well, like new Measure app, redesigned iBooks app, tracking prevention, group FaceTime etc. Overall, we think it’s an okay update considering there are not as many bugs as there should be according to Apple standards. The force touch in the keyboard to drag cursor doesn’t work, Skype and some other apps crash, but for the most part, it’s good enough to be installed on your primary device.

Using Concurrent Ruby in a Ruby on Rails Application

Concurrent Ruby is a concurrency toolkit that builds on a lot of interesting ideas from many functional languages and classic concurrency patterns. When it comes to writing threaded code in Rails applications, look no further since concurrent ruby is already included in Rails via Active Support.

Using Concurrent::Future

In one of our applications, to improve performance we added threaded code using Concurrent::Future. It worked really well for us until one day it stopped working.

“Why threads?” one might ask. The code in question was a textbook threading use case. It had a few API calls, some DB requests and finally an action that was performed on all the data that was aggregated.

Let us look at what this code looks like.

Non threaded code

selected_shipping_companies.each do | carrier |
  # api calls
  distance_in_miles = find_distance_from_origin_to_destination
  historical_average_rate = historical_average_for_this_particular_carrier

  # action performed
  build_price_details_for_this_carrier(distance_in_miles,
                                       historical_average_rate)
end

Converting the above code to use Concurrent::Future is trivial.

futures = selected_shipping_companies.map do |carrier|
  Concurrent::Future.execute do
    # api calls
    distance_in_miles = find_distance_from_origin_to_destination
    historical_average_rate = historical_average_for_this_particular_carrier

    # action performed
    build_price_details_for_this_carrier(distance_in_miles,
                                         historical_average_rate)
  end
end

futures.map(&:value)

A bit more about Concurrent::Future

It is often intimidating to work with threads. They can bring in complexity and can have unpredictable behaviors due to lack of thread-safety. Ruby, being a language of mutable references, we often find it difficult to write 100% thread-safe code.

Inspired by Clojure’s Future function, Concurrent::Future is a primitive that guarantees thead safety. It takes a block of work and performs the work asynchronously using Concurrent Ruby’s global thread-pool. Once a block of work is scheduled, Concurrent Ruby gives us a handle to this future work, on which when #value (or #deref) is called block’s value is returned.

The Bug

Usually, when an exception occurs in the main thread, the interpreter stops and gathers the exception data. In the case of Ruby Threads, any unhandled exceptions are reported only when Thread#join is called. Setting Thread#abort_on_exception to true, is an better alternative which will cause all threads to exit when an exception is raised in any running thread. We published a blog recently which talks about this in great detail.

Exception handling in Concurrent Ruby

future = Concurrent::Future.execute {
            raise StandardError.new("Boom!")
          }

sleep(0.1) # giving arbitrary time for future to execute

future.value     #=> nil

Where did the exception go? This code fails silently and swallows the exceptions. How can we find out if the code executed successfully?

future = Concurrent::Future.execute {
              raise StandardError.new("Boom!")
          }

sleep(0.1) # giving arbitrary time for future to execute

future.value     #=> nil

future.rejected? #=> true
future.reason    #=> "#<StandardError: Boom!>"

How we fixed our issue

We found places in our application where Concurrent::Future was used in a way that would swallow exceptions. It is also a possibility that people might overlook the explicit need to manually report exception. We addressed these concerns with the following wrapper class.

module ConcurrentExecutor
  class Error < StandardError
    def initialize(exceptions)
      @exceptions = exceptions
      super
    end

    def message
      @exceptions.map { | e | e.message }.join "\n"
    end

    def backtrace
      traces = @exceptions.map { |e| e.backtrace }
      ["ConcurrentExecutor::Error START", traces, "END"].flatten
    end
  end

  class Future
    def initialize(pool: nil)
      @pool = pool || Concurrent::FixedThreadPool.new(20)
      @exceptions = Concurrent::Array.new
    end

    # Sample Usage
    # executor = ConcurrentExecutor::Future.new(pool: pool)
    # executor.execute(carriers) do | carrier |
    #   ...
    # end
    #
    # values = executor.resolve

    def execute array, &block
      @futures = array.map do | element |
        Concurrent::Future.execute({ executor: @pool }) do
          yield(element)
        end.rescue do | exception |
          @exceptions << exception
        end
      end

      self
    end

    def resolve
      values = @futures.map(&:value)

      if @exceptions.length > 0
        raise ConcurrentExecutor::Error.new(@exceptions)
      end

      values
    end
  end
end

Please note that using Concurrent Ruby Futures caused segmentation fault while running specs in Circle CI. As of this writing, we are using normal looping instead of Futures in Circle CI until the reason for the segfault is isolated and fixed.

Update

Concurrent::Future also gives us another API which not only returns the value of the block but also posts/raises any exceptions that occur into the main thread.

thread_pool = Concurrent::FixedThreadPool.new(20)
executors = [1, 2, 3, 4].map do |random_number|
  Concurrent::Future.execute({ executor: thread_pool }) do
    random_number / (random_number.even? ? 0 : 1)
  end
end

executors.map(&:value)
=> [1, nil, 3, nil]

executors.map(&:value!)

> ZeroDivisionError: divided by 0
> from (pry):4:in `/'

We thank Jonathan Rochkind for pointing us to this undocumented api in his reddit post.

Modelling state in Elm to reflect business logic

We recently made ApiSnapshot open source. As mentioned in that blog we ported code from React.js to Elm.

One of the features of ApiSnapshot is support for Basic Authentication.

ApiSnapshot with basic authentication

While we were rebuilding the whole application in Elm, we had to port the “Add Basic Authentication” feature. This feature can be accessed from the “More” drop-down on the right-hand side of the app and it lets user add username and password to the request.

Let’s see how the Model of our Elm app looks.

type alias Model =
    { request : Request.MainRequest.Model
    , response : Response.MainResponse.Model
    , route : Route
    }

Here is the Model in Request.MainRequest module.

type alias APISnapshotRequest =
    { url : String
    , httpMethod : HttpMethod
    , requestParameters : RequestParameters
    , requestHeaders : RequestHeaders
    , username : Maybe String
    , password : Maybe String
    , requestBody : Maybe RequestBody
    }

type alias Model =
    { request : APISnapshotRequest
    , showErrors : Bool
    }

username and password fields are optional for the users so we kept them as Maybe types.

Note that API always responds with username and password whether user clicked to add Basic Authentication or not. The API would respond with a null for both username and password when a user tries to retrieve a snapshot for which user did not fill username and password.

Here is a sample API response.

{
"url": "http://dog.ceo/api/breed/affenpinscher/images/random",
"httpMethod": "GET",
"requestParams": {},
"requestHeaders": {},
"requestBody": null,
"username": "alanturning",
"password": "welcome",
"assertions": [],
"response": {
    "response_headers": {
        "age": "0",
        "via": "1.1 varnish (Varnish/6.0), 1.1 varnish (Varnish/6.0)",
        "date": "Thu, 03 May 2018 09:43:11 GMT",
        "vary": "",
        "cf_ray": "4151c826ac834704-EWR",
        "server": "cloudflare"
    },
    "response_body": "{\"status\":\"success\",\"message\":\"https:\\/\\/images.dog.ceo\\/breeds\\/affenpinscher\\/n02110627_13221.jpg\"}",
    "response_code": "200"
  }
}

Let’s look at the view code which renders the data received from the API.

view : (Maybe String, Maybe String) -> Html Msg
view usernameAndPassword =
    case usernameAndPassword of
        (Nothing, Nothing) -> text ""
        (Just username, Nothing) -> basicAuthenticationView username ""
        (Nothing, Just password) -> basicAuthenticationView "" password
        (Just username, Just password) -> basicAuthenticationView username password


basicAuthenticationView : String -> String -> Html Msg
basicAuthenticationView username password =
    [ div [ class "form-row" ]
        [ input
            [ type_ "text"
            , placeholder "Username"
            , value username
            , onInput (UpdateUsername)
            ]
            []
        , input
            [ type_ "password"
            , placeholder "Password"
            , value password
            , onInput (UpdatePassword)
            ]
            []
        , a
            [ href "javascript:void(0)"
            , onClick (RemoveBasicAuthentication)
            ]
            [ text "×" ]
        ]
    ]

To get the desired view we apply following rules.

  1. Check if both the values are string.
  2. Check if either of the values is string.
  3. Assume that both the values are null.

This works but we can do a better job of modelling it.

What’s happening here is that we were trying to translate our API responses directly to the Model . Let’s try to club username and password together into a new type called BasicAuthentication.

In the model add a parameter called basicAuthentication which would be of type Maybe BasicAuthentication. This way if user has opted to use basic authentication fields then it is a Just BasicAuthentication and we can show the input boxes. Otherwise it is Nothing and we show nothing!

Here is what the updated Model for Request.MainRequest would look like.

type alias BasicAuthentication =
    { username : String
    , password : String
    }


type alias APISnapshotRequest =
    { url : String
    , httpMethod : HttpMethod
    , requestParameters : RequestParameters
    , requestHeaders : RequestHeaders
    , basicAuthentication : Maybe BasicAuthentication
    , requestBody : Maybe RequestBody
    }


type alias Model =
    { request : APISnapshotRequest
    , showErrors : Bool
    }

Elm compiler is complaining that we need to make changes to JSON decocding for APISnapshotRequest type because of this change.

Before we fix that let’s take a look at how JSON decoding is currently being done.

import Json.Decode as JD
import Json.Decode.Pipeline as JP

decodeAPISnapshotRequest : Response -> APISnapshotRequest
decodeAPISnapshotRequest hitResponse =
    let
        result =
            JD.decodeString requestDecoder hitResponse.body
    in
        case result of
            Ok decodedValue ->
                decodedValue

            Err err ->
                emptyRequest


requestDecoder : JD.Decoder APISnapshotRequest
requestDecoder =
    JP.decode Request
        |> JP.optional "username" (JD.map Just JD.string) Nothing
        |> JP.optional "password" (JD.map Just JD.string) Nothing

Now we need to derive the state of the application from our API response .

Let’s introduce a type called ReceivedAPISnapshotRequest which would be the shape of our old APISnapshotRequest with no basicAuthentication field. And let’s update our requestDecoder function to return a Decoder of type ReceivedAPISnapshotRequest instead of APISnapshotRequest.

type alias ReceivedAPISnapshotRequest =
    { url : String
    , httpMethod : HttpMethod
    , requestParameters : RequestParameters
    , requestHeaders : RequestHeaders
    , username : Maybe String
    , password : Maybe String
    , requestBody : Maybe RequestBody
    }


requestDecoder : JD.Decoder ReceivedAPISnapshotRequest

We need to now move our earlier logic that checks to see if a user has opted to use the basic authentication fields or not from the view function to the decodeAPISnapshotRequest function.

decodeAPISnapshotRequest : Response -> APISnapshotRequest
decodeAPISnapshotRequest hitResponse =
    let
        result =
            JD.decodeString requestDecoder hitResponse.body
    in
        case result of
            Ok value ->
                let
                    extractedCreds =
                        ( value.username, value.password )

                    derivedBasicAuthentication =
                        case extractedCreds of
                            ( Nothing, Nothing ) ->
                                Nothing

                            ( Just receivedUsername, Nothing ) ->
                                Just { username = receivedUsername, password = "" }

                            ( Nothing, Just receivedPassword ) ->
                                Just { username = "", password = receivedPassword }

                            ( Just receivedUsername, Just receivedPassword ) ->
                                Just { username = receivedUsername, password = receivedPassword }
                in
                    { url = value.url
                    , httpMethod = value.httpMethod
                    , requestParameters = value.requestParameters
                    , requestHeaders = value.requestHeaders
                    , basicAuthentication = derivedBasicAuthentication
                    , requestBody = value.requestBody
                    }

            Err err ->
                emptyRequest

We extract the username and password into extractedCreds as a Pair from ReceivedAPISnapshotRequest after decoding and construct our APISnapshotRequest from it.

And now we have a clean view function which just takes a BasicAuthentication type and returns us a Html Msg type.

view : BasicAuthentication -> Html Msg
view b =
    [ div [ class "form-row" ]
        [ input
            [ type_ "text"
            , placeholder "Username"
            , value b.username
            , onInput (UpdateUsername)
            ]
            []
        , input
            [ type_ "password"
            , placeholder "Password"
            , value b.password
            , onInput (UpdatePassword)
            ]
            []
        , a
            [ href "javascript:void(0)"
            , onClick (RemoveBasicAuthentication)
            ]
            [ text "×" ]
        ]
    ]

We now have a Model that better captures the business logic. And should we change the logic of basic authentication parameter selection in the future, We do not have to worry about updating the logic in the view .

Using Logtrail to tail log with Elasticsearch and Kibana on Kubernetes

Monitoring and Logging are important aspects of deployments. Centralized logging is always useful in helping us identify the problems.

EFK (Elasticsearch, Fluentd, Kibana) is a beautiful combination of tools to store logs centrally and visualize them on a single click. There are many other open-source logging tools available in the market but EFK (ELK if Logstash is used) is one of the most widely used centralized logging tools.

This blog post shows how to integrate Logtrail which has a papertrail like UI to tail the logs. Using Logtrail we can also apply filters to tail the logs centrally.

As EFK ships as an addon with Kubernetes, all we have to do is deploy the EFK addon on our k8s cluster.

Pre-requisite:

Installing EFK addon from kubernetes upstream is simple. Deploy EFK using following command.

$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/logging-elasticsearch/v1.6.0.yaml
serviceaccount "elasticsearch-logging" created
clusterrole "elasticsearch-logging" created
clusterrolebinding "elasticsearch-logging" created
serviceaccount "fluentd-es" created
clusterrole "fluentd-es" created
clusterrolebinding "fluentd-es" created
daemonset "fluentd-es" created
service "elasticsearch-logging" created
statefulset "elasticsearch-logging" created
deployment "kibana-logging" created
service "kibana-logging" created

Once k8s resources are created access the Kibana dashboard. To access the dashboard get the URL using kubectl cluster-info

$ kubectl cluster-info | grep Kibana
Kibana is running at https://api.k8s-test.com/api/v1/proxy/namespaces/kube-system/services/kibana-logging

Now goto Kibana dashboard and we should be able to see the logs on our dashboard.

Kibana dashboard

Above dashboard shows the Kibana UI. We can create metrics and graphs as per our requirement.

We also want to view logs in tail style. We will use logtrail to view logs in tail format. For that, we need docker image having logtrail plugin pre-installed.

Note: If upstream Kibana version of k8s EFK addon is 4.x, use kibana 4.x image for installing logtrail plugin in your custom image. If addon ships with kibana version 5.x, make sure you pre-install logtrail on kibana 5 image.

Check the kibana version for addon here.

We will replace default kibana image with kubernetes-logtrail image.

To replace docker image update the kibana deployment using below command.

$ kubectl -n kube-system set image deployment/kibana-logging kibana-logging=rahulmahale/kubernetes-logtrail:latest
deployment "kibana-logging" image updated

Once the image is deployed go to the kibana dashboard and click on logtrail as shown below.

Switch to logtrail

After switching to logtrail we will start seeing all the logs in real time as shown below.

Logs in Logtrail

This centralized logging dashboard with logtrail allows us to filter on several parameters.

For example let’s say we want to check all the logs for namespace myapp. We can use filter kubernetes.namespace_name:"myapp". We can user filter kubernetes.container_name:"mycontainer" to monitor log for a specific container.

RubyKaigi 2018 Day two

RubyKaigi is happening at Sendai, Japan from 31st May to 2nd June. It is perhaps the only conference where one can find almost all the core Ruby team members in attendance.

This is Prathamesh. I bring you live details about what is happening at the Kaigi over the next two days. If you are at the conference please come and say “Hi” to me.

Check out what happened on day 1 .

Faster Apps, No Memory Thrash: Get Your Memory Config Right by Noah Gibbs

Noah gave an awesome talk on techniques to manage the memory used by Ruby applications. One of the main point while dealing with GC is to make it run less, which means don’t create too many objects. He also mentioned that if application permits then destructive operations such as gsub! or concat should be used since they save CPU cycles and memory. Ruby allows setting up environment variables for managing the heap memory but it is really hard to choose values for these environment variables blindly.

Noah has built a tool which uses GC.stat results from applications to estimate the values of the memory related environment variables. Check out the EnvMem gem.

In the end, he discussed some advanced debugging methods like checking fragmentation percentage. The formula is prepared by Nate Berkopec.

s = GC.stat
used_ratio = s[:heap_live_slots].to_f / (s[:heap_eden_pages] * 408)
fragmentation = 1 - used_ratio

We can also use GC::Profiler to profile the code in real time to see how GC is behaving.

Benchmark used for this talk can be found here. Slides for this talk can be found here.

Guild prototype

Next I attended talk by Koichi Sasada on Guild prototype. He discussed the proposed design spec for Guild with a demo of Fibbonacci number program with 40 core CPU and 40 guilds. One of the interesting observations is that performance drops as number of guilds increases because of the global locking.

Guild performance

He discussed the concept of sharable and non-sharable objects. Sharable objects can be shared across multiple Guilds whereas non-sharable objects can only be used one Guild. This also means, you can’t make thread unsafe program with Guilds “by design”. He discussed about the challanges in specifying the sharable objects for Guilds.

Overall, there is stil a lot of work left to be done for Guilds to become a part of Ruby. It includes defining protocols for sharable and non-sharable objects, making sure GC runs properly in case of Guilds, synchronization between different Guilds.

The slides for this talk can be found here.

Ruby programming with type checking

Soutaro from SideCI gave a talk on Steep, a gradual type checker for Ruby.

In the past, Matz has said that he doesn’t like type definitions to be present in the Ruby code. Steep requires type definitions to be present in separate files with extension .rbi. The Ruby source code needs little amount of annotations. Steep also has a scaffold generator to generate the basic type definitions for existing code.

Steep v/s Sorbet

As of now, Steep runs slower than Sorbet which was discussed yesterday by Stripe team. Soutaro also discssued issues in type definitions due to meta programming in libraries such as Active Record. That looks like a challange for Steep as of now.

Web console

After the tea break, I attended talk by Genadi on how web console works.

He discussed the implementation of web-console in detail with references to Ruby internals related to bindings. He compared the web-console interface with IRB and pry and explained the difference. As of now, web console has to monkey patch some of the Rails internals. Genadi has added support for registering interceptors which will prevent this monkey patching in Rails 6. He is also mentoring a Google summer of code student to work on Actionable errors project where the user can take actions like running pending migrations via the webpage itself when the error is shown.

Ruby committers v/s the World

Ruby Committers v/s the World

RubyKaigi offers this unique event where all the Ruby committers come on stage and face the questions from the audience. This year the format was slightly different and it was run in the style of the Ruby core developer meeting. The agenda was predecided with some questions from the people and some tickets to discuss. The session started with discussion of featurs coming up in Ruby 2.6.

After that, the questions and the tickets in the agenda were discussed. It was good to see how the Ruby team takes decisions about features and suggestions.

Aprt from this, there were talks on SciRuby, mRuby, linting, gem upgrades, c extensions and more which I could not attend.

That’s all for the day two. Looking forward to the day three already!

Oh and here is the world map of all the attendees from RubyKaigi.

World map of all attendees

RubyKaigi 2018 Day one

RubyKaigi is happening at Sendai, Japan from 31st May to 2nd June. It is perhaps the only conference where one can find almost all the core Ruby team members in attendance.

This is Prathamesh. I bring you live details about what is happening at the Kaigi over the next three days. If you are at the conference please come and say “Hi” to me.

Matz’s keynote

RubyKaigi started with Matz’s keynote. He used lot of proverbs applying them to the Ruby language and software development.

He talked about one of the hardest problems in programming - naming with an example of yield_self. Matz added alias then to the yield_self method yesterday. He also discussed about googlability of the names. Ironically, Ruby was named in 1993 which was before Google had started.

Matz also touched upon JIT option being introduced in Ruby 2.6 and guild as the ways the language continues to improve in performance and concurrency. There is a talk on Guild by Koichi Sasada on second day of RubyKaigi which will have further details about it.

Matz ended the keynote talking about the need of maintaining backward compatibility and not running into the situation like Ruby 1.9 or Python 3 where the compatibility was not maintained. He also stressed upon the community aspect of the Ruby language and its importance in the success of Ruby.

ETL processing in Ruby using Kiba

Thibaut Barrère gave a talk on Kiba - a data processing ETL framework for Ruby. He discussed about the design decisions that went into the version 1 and how it evolved to version 2 which was recently released.

Kiba provides programmatic API which can be used in the background jobs instead of shelling out. It also has support for multistep batch processing.

Thibaut also explained how it can be used for data migration, reusing the components and big rewrites. He observed that the performance has been gradually increasing with each Ruby release over the years.

The slides for this talk can be found here.

Architecture of Hanami applications

Next I attended talk from Anton Davydov on architecture patterns in Hanami apps. He discussed about the problems typical Rails applications face and how abstractions can address those issues. He explained how Hanami tries to achieve business logic isolation, avoid global state, sequential logic and test coverage. Functional callable objects, containers, dry-containers, dry-inject and event sourcing are some of the abstractions that can be used in Hanami apps that help in achieving this.

Lightning talks

The last session of the day was lightning talks.

The talk on Rib(wordplay on IRB) was an interesting one. Rib is yet another interactive Ruby shell but lightweight compared to IRB and pry. It has some nice features like auto indent, multiline history, filtering of callers. It can also beep when the console starts, so you know it is time to get back to work.

I liked another talk where Watson had worked on improving the performance of JSON gem. He achieved this by using CRuby API whereever applicable and avoiding heavy calls like rbfuncall. Check these two pull requests for benchmark and more discussions.

Apart from these talks, there were lot of other talks as well which I could not attend. Stripe team is building a type checker for Ruby which looks very interesting and is is extremely fast.

Bozhidar Batsov gave a talk on Rubocop project and how it has evolved over the years. There was also a talk on Karafka - event driven architecture in Ruby. This talk was a good precursor to the Hanami talk where event driven architecture was mentioned again.

Other talks from day one ranged from memory management, playing with Ruby synatx, code highlighter, deep learning, C extensions to Rubygems.

That’s all for the day one. Looking forward to the day two already!

Rails 5.2 introduces allow_other_host option to redirect_back method

Rails 5.0 had introduced redirect_back method to perform redirection to path present in HTTP_REFERER. If there is no HTTP_REFERER present, then site is redirected to fallback_location.

Now consider the following scenario.

In one of the searches on google.com, we see a link to bigbinary.com. On clicking the link, we are navigated to bigbinary.com.

When somebody gets redirected to bigbinary.com from google.com, the HTTP REFERER is set to google.com

If bigbinary.com uses redirect_back in its code then the user will get redirected to google.com which might be undesired behavior for some applications.

To avoid such cases, Rails 5.2 has added a flag allow_other_host to not allow redirecting to a different host other than the current site.

By default, allow_other_host option is set to true. So if you do not want users to go back to google.com then you need to explicitly set allow_other_host: false.

> request.host
#=> "http://www.bigbinary.com"

> request.headers["Referer"]
#=> "http://www.google.com"

# This will redirect back to google.com
redirect_back(fallback_path: "/")

# This will not redirect back to google.com
redirect_back(fallback_path: "/", allow_other_host: false)

Faster JSON generation using PostgreSQL JSON function

There are various ways to generate JSON in Rails. There is to_json method built in Rails. We can also use jbuilder gem or active model serializer gem which can help us achieve the same.

As the number of records in the database grow, Rails can take a very long time to generate a response. The bottleneck can generally be traced back to JSON generation.

Recently, in one of our applications, we faced this issue where a page was taking too long to load. The load time was critical for us as this was the most visited page on the site. The page was loading a race, its racers and their lap details.

The page was loading fine for short races, with 10-15 racers and each racer having 30-50 laps. But for endurance races, with around 50-80 racers and each racer having around 700-800 laps, we were hitting the bottleneck with load times.

After benchmarking, JSON generation at the backend was found to be the culprit.

Looking out for solutions to fix the problem, we came across PostgreSQL JSON functions.

PostgreSQL 9.2 and above have built in support for generating JSON using functions row_to_json and array_to_json. Let’s look into both of them in detail.

row_to_json

row_to_json returns each of the rows as JSON object.

select row_to_json(laps) from laps;

{"id":1,
 "number":1,
 "position":4,
 "time":"628.744",
 "flag_type":"Green"
}
.
.
.

We could use a subquery to only fetch the attributes/columns which we require.

select row_to_json(lap)
from (
  select id, number, position, time, flag_type from laps
) lap;

{"id":1,"number":1,"position":4,"time":"628.744","flag_type":"Green"}
{"id":2,"number":2,"position":4,"time":"614.424","flag_type":"Green"}
.
.
.

array_to_json

To understand array_to_json function, we must first look into array_agg. array_agg is an aggregate function. Aggregate functions compute a single result from a set of input values. sum, min, max are some other examples of aggregate functions. array_agg concatenates all the input values into a PostgreSQL array.

select array_agg(lap)
from (
  select id, number, position, time, flag_type from laps
) lap;

{"(1,1,4,\"628.744\",\"Green\")","(2,2,4,\"614.424\",\"Green\")", ... }

To convert this PostgreSQL array into JSON, we can use the array_to_json function.

select array_to_json(array_agg(lap))
from (
  select id, number, position, time, flag_type from laps
) lap;

[{"id":1,
  "number":1,
  "position":4,
  "time":"628.744",
  "flag_type":"Green"},
  ...]

A more complex example

We can use the above two functions together to generate custom JSON response.

select row_to_json(u)
from (
  select first_name, last_name,
    (
      select array_to_json(array_agg(b))
      from (
        select number, position, time, flag_type
        from laps
        inner join racer_laps
        on laps.id = racer_laps.lap_id
        where racer_laps.racer_id = racers.id
      ) b
    ) as laps
  from racers
  where first_name = 'Jack'
) u;

{
  "first_name": "Jack",
  "last_name": "Altenwerth",
  "laps": [
    {
      "number": 1,
      "position": 4,
      "time": "628.744",
      "flag_type": "Green"
    },
    {
      "number": 2,
      "position": 4,
      "time": "614.424",
      "flag_type": "Green"
    },
    ...
  ]
}

Using the functions in Rails

We can use the above mentioned functions in Rails as shown here.

query = <<~EOQ
select row_to_json(u)
from (
  select first_name, last_name,
    (
      select array_to_json(array_agg(b))
      from (
        select number, position, time, flag_type
        from laps
        inner join racer_laps
        on laps.id = racer_laps.lap_id
        where racer_laps.racer_id = racers.id
      ) b
    ) as laps
  from racers
  where first_name = 'Jack'
) u;
EOQ

generated_json = ActiveRecord::Base.connection.execute(query).values;
puts generated_json

{
  "first_name": "Jack",
  "last_name": "Altenwerth",
  "laps": [
    {
      "number": 1,
      "position": 4,
      "time": "628.744",
      "flag_type": "Green"
    },
    {
      "number": 2,
      "position": 4,
      "time": "614.424",
      "flag_type": "Green"
    },
    ...
  ]
}

Although the code to generate the JSON using the above way is more verbose and less readable compared to other ways to generate JSON in Rails, it is more performant.

Observations

On the racer’s page, generating the JSON using the PostgreSQL functions, gave us the following improvements.

For short races (10-15 racers and each racer having 30-50 laps), the average API response time decreased from 40ms to 15ms.

For endurance races (50-80 racers and each racer having around 700-800 laps), the average API response time decreased from 1200ms to 20ms.

Conclusion

Use Rails way of generating JSON as long as you can. If performance starts to be an issue then don’t be afraid of using the features made available by the database. In this case we would be trading performance for complexity in code. However sometimes this trade is worth it.

APISnapshot built using Elm and Ruby on Rails is open source

APISnapshot is built using Elm and Ruby on Rails. Today we’re happy to announce that the code is publicly available on GitHub.

We built APISnapshot for two reasons.

We wanted to work with Elm.

We wanted to have a tool that is easy to use and that will help us capture what response we are getting from the API in a format that is easy to share in github issue, in slack or in an email. As a consulting company we work with various teams around the world and during development phase either API is unstable or they do not do what they should be doing.

We originally built this tool using React. Elm compiler is quite strict and forced us to take into consideration all possibilities. This lead us to notice a few bugs which were still present in React code. In this way Elm compiler helped us produce “correct” software by eliminating some of the bugs that we would have found later.

JSON encoding/decoding is a hard problem in Elm in general. In most of the cases we know the shape of the API response we are going to get.

In the case of APISnapshot we do not know the shape of the JSON response we will get. Because of that it took us a bit longer to build the application. However, this forced us to really dig deep into JSON encoding/decoding issue in Elm and we learned a lot.

We would like to thank all the contributors to the project. Special shout out goes to Jasim for the discussion and the initial work on parsing the JSON file.

We like the combination of Elm and Ruby on Rails so much so that we are building AceHelp using the same technologies. AceHelp is open source from day one.

Generating filmstrip using puppeteer for better debugging

We are writing a lot of automation tests using Puppeteer.

Since puppeteer scripts execute so fast certain tests fail when they should be passing. Debugging those tests can be a challenge.

Chrome devtools comes with filmstrip feature. In “Performance” tab we can see screenshots of the site as they change over time.

We wanted puppeteer to generate similar filmstrip so that we can visually identify the source of the problem.

It turns out that puppeteer makes it very easy. Here is the full code.

import puppeteer from 'puppeteer';

(async () => {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.setViewport({ width: 1280, height: 1024 });

  await page.tracing.start({path: 'trace.json', screenshots: true});

  await page.goto('https://www.bigbinary.com');
  await Promise.all([
    page.waitForNavigation(),
    page.click('#navbar > ul > li:nth-child(1) > a')
  ]);

  await page.tracing.stop();

  await browser.close();
})();

If we execute this script then it will generate a file called trace.json. This file has images embedded in it which are base64 encoded.

To see the the filmstrip drag the trace.json file to “Performance” tab in the Chrome devtool. Here is a quick video explaining this.

Ruby 2.5 added lazy proc allocation for block parameters

This blog is part of our Ruby 2.5 series.

irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb> greet_with_welcome { p 'BigBinary' }
Welcome
"BigBinary"
  => "BigBinary"

In Ruby 2.4 when we pass a block to a method which further passes to another method, ruby creates a new Proc object by the given block before passing this proc to the another method.

This creates unnecessary objects even when block parameter is not accessed. It also creates a chain of Proc objects when the block parameter is passed through various methods.

Proc creation is one of heavyweight operation because we need to store all of local variables (represented by Env objects in MRI internal) to the heap.

Ruby 2.5 introduced lazy proc allocation. Ruby 2.5 does not create Proc object while passing a block to another method. It rather passes the block information. If the block is accessed somewhere then it creates a Proc object by the given block.

This resulted in lesser memory allocation and faster execution.

Ruby 2.4

irb> require 'benchmark'
  => true
irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb>
irb> Benchmark.measure { 1000.times { greet_with_welcome { 'BigBinary' } } }
Welcome
Welcome
...
...
...
  => #<Benchmark::Tms:0x007fe6ab929de0 @label="", @real=0.022295999999187188, @cstime=0.0, @cutime=0.0, @stime=0.01, @utime=0.0, @total=0.01>

Ruby 2.5

irb> require 'benchmark'
  => true
irb> def greet
irb>   yield
irb> end
  => :greet
irb>
irb> def greet_with_welcome(&block)
irb>   puts 'Welcome'
irb>   greet(&block)
irb> end
  => :greet_with_welcome
irb>
  irb> Benchmark.measure { 1000.times { greet_with_welcome { 'BigBinary' } } }
Welcome
Welcome
...
...
...
  => #<Benchmark::Tms:0x00007fa4400871b8 @label="", @real=0.004612999997334555, @cstime=0.0, @cutime=0.0, @stime=0.001524000000000001, @utime=0.0030690000000000023, @total=0.004593000000000003>

As we can see there is considerable improvement in execution time in Ruby 2.5 when a block param is passed.

Here is relevant commit and discussion.

Rails 5.2 fixes query caching in MySQL and PostgreSQL adapters

Prior to Rails 5.2, MySQL and PostgreSQL adapters had select_value, select_values & select_rows select_{value,values,rows} methods. They improve the performance by not instantiating ActiveRecord::Result.

However these methods broke query caching of ActiveRecord::FinderMethods#exists? method. Let’s check the issue.

>> User.cache do
>>   2.times { User.exists?(1) }
>> end

User Exists (2.1ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

User Exists (2ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

As we can see, query was not cached and sql was executed second time.

From Rails 5.2, MySQL and PostgreSQL adapters are no longer override select_{value,values,rows} methods which fix this query caching issue.

Also, the performance improvement provided by these methods was marginal and not a hotspot in Active Record, so this change was accepted.

Let’s check query caching of ActiveRecord::FinderMethods#exists? after the change.

>> User.cache do
>>   2.times { User.exists?(1) }
>> end

User Exists (2.1ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

CACHE User Exists (0.0ms)  SELECT  1 AS one FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

Now, query has been cached as expected.

This change has been backported in rails 5.1 from version 5.1.2 as well.

How to mitigate DDoS using Rack::Attack

Recently, we faced a DDoS attack in one of the clients’ projects. There were many requests from different IPs to root and login paths, and we were running thrice the usual number of servers to keep the system alive.

We are using Cloudflare’s HTTP proxy and it was doing a great job preventing malicious requests, but we wanted to check if we can avoid the loading/captcha pages which Cloudflare uses to filter requests. We came to a conclusion that we would be able to mitigate the ongoing attack if we could throttle requests by IP.

Cloudflare has an inbuilt Rate Limiting feature to throttle requests, but it would be a little expensive in our case since Cloudflare charges by the number of good requests and it was a high traffic website. On further analysis, we found that throttling at application level would be enough in that situation and the gem Rack::Attack helped us with that.

Rack::Attack is a Rack middleware from Kickstarter. It can be configured to throttle requests based on IP or any other parameter.

To use Rack::Attack, include the gem in Gemfile.

gem "rack-attack"

After bundle install, configure the middleware in config/application.rb:

config.middleware.use Rack::Attack

Now we can create the initializer config/initializers/rack_attack.rb to configure Rack::Attack.

By default, Rack::Attack uses Rails.cache to store requests information. In our case, we wanted a separate cache for Rack::Attack and it was configured as follows.

redis_client = Redis.connect(url: ENV["REDIS_URL"])
Rack::Attack.cache.store = Rack::Attack::StoreProxy::RedisStoreProxy.new(redis_client)

If the web server is behind a proxy like Cloudflare, we have to configure a method to fetch the correct remote_ip address. Otherwise, it would block based on proxy’s IP address and would result in blocking a lot of legit requests.

class Rack::Attack
  class Request < ::Rack::Request
    def remote_ip
      # Cloudflare stores remote IP in CF_CONNECTING_IP header
      @remote_ip ||= (env['HTTP_CF_CONNECTING_IP'] ||
                      env['action_dispatch.remote_ip'] ||
                      ip).to_s
    end
  end
end

Requests can be throttled based on IP address or any other parameter. In the following example, we are setting a limit of 40rpm/IP for “/” path.

class Rack::Attack
  throttle("req/ip", :limit => 40, :period => 1.minute) do |req|
    req.remote_ip if req.path == "/"
  end
end

The downside of this configuration is that after the 1 minute period, the attacker can launch another 40 requests/IP simultaneously and it would exert pressure on the servers. This can be solved using exponential backoff.

class Rack::Attack
  # Exponential backoff for all requests to "/" path
  #
  # Allows 240 requests/IP in ~8 minutes
  #        480 requests/IP in ~1 hour
  #        960 requests/IP in ~8 hours (~2,880 requests/day)
  (3..5).each do |level|
    throttle("req/ip/#{level}",
               :limit => (30 * (2 ** level)),
               :period => (0.9 * (8 ** level)).to_i.seconds) do |req|
      req.remote_ip if req.path == "/"
    end
  end
end

If we want to turn off throttling for some IPs (Eg.: Health check services), then those IPs can be safelisted.

class Rack::Attack
  class Request < ::Rack::Request
    def allowed_ip?
      allowed_ips = ["127.0.0.1", "::1"]
      allowed_ips.include?(remote_ip)
    end
  end

  safelist('allow from localhost') do |req|
    req.allowed_ip?
  end
end

We can log blocked requests separately and this is helpful for analyzing the attack.

ActiveSupport::Notifications.subscribe('rack.attack') do |name, start, finish, request_id, req|
  if req.env["rack.attack.match_type"] == :throttle
    request_headers = { "CF-RAY" => req.env["HTTP_CF_RAY"],
                        "X-Amzn-Trace-Id" => req.env["HTTP_X_AMZN_TRACE_ID"] }

    Rails.logger.info "[Rack::Attack][Blocked]" <<
                      "remote_ip: \"#{req.remote_ip}\"," <<
                      "path: \"#{req.path}\", " <<
                      "headers: #{request_headers.inspect}"
  end
end

A sample initializer with these configurations can be downloaded from here.

Application will now throttle requests and will respond with HTTP 429 Too Many Requests response for the throttled requests.

We now block a lot of malicious requests using Rack::Attack. Here’s a graph with % of blocked requests over a week.

Blocked requests %

EDIT: Updated the post to add more context to the situation.

Increase reliability of background job processing using super_fetch of Sidekiq Pro

Sidekiq is a background job processing library for Ruby. Sidekiq offers three versions: OSS, Pro and Enterprise.

OSS is free and open source and has basic features. Pro and Enterprise versions are closed source and paid, thus comes with more advanced features. To compare the list of features offered by each of these versions, please visit Sidekiq website.

Sidekiq Pro 3.4.0 introduced super_fetch strategy to reliably fetch jobs from the queue in Redis.

In this post, we will discuss the benefits of using super_fetch strategy.

Problem

Open source version of Sidekiq comes with basic_fetch strategy. Let’s see an example to understand how it works.

Let’s add Sidekiq to our Gemfile and run bundle install to install it.

gem 'sidekiq'

Add following Sidekiq worker in app/workers/sleep_worker.rb.

class SleepWorker
  include Sidekiq::Worker

  def perform(name)
    puts "Started #{name}"
    sleep 30
    puts "Finished #{name}"
  end
end

This worker does nothing great but sleeps for 30 seconds.

Let’s open Rails console and schedule this worker to run as a background job asynchronously.

>> require "sidekiq/api"
=> true
>> Sidekiq::Queue.new.size
=> 0
>> SleepWorker.perform_async("A")
=> "5d8bf898c36a60a1096cf4d3"
>> Sidekiq::Queue.new.size
=> 1

As we can see, queue now has 1 job scheduled to be processed.

Let’s start Sidekiq in another terminal tab.

$ bundle exec sidekiq

40510 TID-owu1swr1i INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-40510", :url=>nil}
40510 TID-owu1swr1i INFO: Starting processing, hit Ctrl-C to stop
40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: start
Started A

As we can see, the job with ID 5d8bf898c36a60a1096cf4d3 was picked up by Sidekiq and it started processing the job.

If we check the Sidekiq queue size in the Rails console, it will be zero now.

>> Sidekiq::Queue.new.size
=> 0

Let’s shutdown the Sidekiq process gracefully while Sidekiq is still in the middle of processing our scheduled job. Press either Ctrl-C or run kill -SIGINT <PID> command.

$ kill -SIGINT 40510
40510 TID-owu1swr1i INFO: Shutting down
40510 TID-owu1swr1i INFO: Terminating quiet workers
40510 TID-owu1x00rm INFO: Scheduler exiting...
40510 TID-owu1swr1i INFO: Pausing to allow workers to finish...
40510 TID-owu1swr1i WARN: Terminating 1 busy worker threads
40510 TID-owu1swr1i WARN: Work still in progress [#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="{\"class\":\"SleepWorker\",\"args\":[\"A\"],\"retry\":true,\"queue\":\"default\",\"jid\":\"5d8bf898c36a60a1096cf4d3\",\"created_at\":1525427293.956314,\"enqueued_at\":1525427293.957355}">]
40510 TID-owu1swr1i INFO: Pushed 1 jobs back to Redis
40510 TID-owu1tr5my SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: fail: 19.576 sec
40510 TID-owu1swr1i INFO: Bye!

As we can see, Sidekiq pushed back the unfinished job back to Redis queue when Sidekiq received a SIGINT signal.

Let’s verify it.

>> Sidekiq::Queue.new.size
=> 1

Before we move on, let’s learn some basics about signals such as SIGINT.

A crash course on POSIX signals

SIGINT is an interrupt signal. It is an alternative to hitting Ctrl-C from the keyboard. When a process is running in foreground, we can hit Ctrl-C to signal the process to shut down. When the process is running in background, we can use kill command to send a SIGINT signal to the process’ PID. A process can optionally catch this signal and shutdown itself gracefully. If the process does not respect this signal and ignores it, then nothing really happens and the process keeps running. Both INT and SIGINT are identical signals.

Another useful signal is SIGTERM. It is called a termination signal. A process can either catch it and perform necessary cleanup or just ignore it. Similar to a SIGINT signal, if a process ignores this signal, then the process keeps running. Note that, if no signal is supplied to the kill command, SIGTERM is used by default. Both TERM and SIGTERM are identical signals.

SIGTSTP or TSTP is called terminal stop signal. It is an alternative to hitting Ctrl-Z on the keyboard. This signal causes a process to suspend further execution.

SIGKILL is known as kill signal. This signal is intended to kill the process immediately and forcefully. A process cannot catch this signal, therefore the process cannot perform cleanup or graceful shutdown. This signal is used when a process does not respect and respond to both SIGINT and SIGTERM signals. KILL, SIGKILL and 9 are identical signals.

There are a lot of other signals besides these, but they are not relevant for this post. Please check them out here.

A Sidekiq process pays respect to all of these signals and behaves as we expect. When Sidekiq receives a TERM or SIGTERM signal, Sidekiq terminates itself gracefully.

Back to our example

Coming back to our example from above, we had sent a SIGINT signal to the Sidekiq process.

$ kill -SIGINT 40510

On receiving this SIGINT signal, Sidekiq process having PID 40510 terminated quiet workers, paused the queue and waited for a while to let busy workers finish their jobs. Since our busy SleepWorker did not finish quickly, Sidekiq terminated that busy worker and pushed it back to the queue in Redis. After that, Sidekiq gracefully terminated itself with an exit code 0. Note that, the default timeout is 8 seconds until which Sidekiq can wait to let the busy workers finish otherwise it pushes the unfinished jobs back to the queue in Redis. This timeout can be changed with -t option given at the startup of Sidekiq process.

Sidekiq recommends to send a TSTP and a TERM together to ensure that the Sidekiq process shuts down safely and gracefully. On receiving a TSTP signal, Sidekiq stops pulling new work and finishes the work which is in-progress. The idea is to first send a TSTP signal, wait as much as possible (by default for 8 seconds as discussed above) to ensure that busy workers finish their jobs and then send a TERM signal to shutdown the process.

Sidekiq pushes back the unprocessed job in Redis when terminated gracefully. It means that Sidekiq pulls the unfinished job and starts processing again when we restart the Sidekiq process.

$ bundle exec sidekiq

45916 TID-ovfq8ll0k INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-45916", :url=>nil}
45916 TID-ovfq8ll0k INFO: Starting processing, hit Ctrl-C to stop
45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: start
Started A
Finished A
45916 TID-ovfqajol4 SleepWorker JID-5d8bf898c36a60a1096cf4d3 INFO: done: 30.015 sec

We can see that Sidekiq pulled the previously terminated job with ID 5d8bf898c36a60a1096cf4d3 and processed that job again.

So far so good.

This behavior is implemented using basic_fetch strategy which is present in the open sourced version of Sidekiq.

Sidekiq uses BRPOP Redis command to fetch a scheduled job from the queue. When a job is fetched, that job gets removed from the queue and that job no longer exists in Redis. If this fetched job is processed, then all is good. Also, if the Sidekiq process is terminated gracefully on receiving either a SIGINT or a SIGTERM signal, Sidekiq will push back the unfinished jobs back to the queue in Redis.

But what if the Sidekiq process crashes in the middle while processing that fetched job?

A process is considered as crashed if the process does not shutdown gracefully. As we discussed before, when we send a SIGKILL signal to a process, the process cannot receive or catch this signal. Because the process cannot shutdown gracefully and nicely, it gets crashed.

When a Sidekiq process is crashed, the fetched jobs by that Sidekiq process which are not yet finished get lost forever.

Let’s try to reproduce this scenario.

We will schedule another job.

>> SleepWorker.perform_async("B")
=> "37a5ab4139796c4b9dc1ea6d"

>> Sidekiq::Queue.new.size
=> 1

Now, let’s start Sidekiq process and kill it using SIGKILL or 9 signal.

$ bundle exec sidekiq

47395 TID-ow8q4nxzf INFO: Starting processing, hit Ctrl-C to stop
47395 TID-ow8qba0x7 SleepWorker JID-37a5ab4139796c4b9dc1ea6d INFO: start
Started B
[1]    47395 killed     bundle exec sidekiq
$ kill -SIGKILL 47395

Let’s check if Sidekiq had pushed the busy (unprocessed) job back to the queue in Redis before terminating.

>> Sidekiq::Queue.new.size
=> 0

No. It does not.

Actually, the Sidekiq process did not get a chance to shutdown gracefully when it received the SIGKILL signal.

If we restart the Sidekiq process, it cannot fetch that unprocessed job since the job was not pushed back to the queue in Redis at all.

$ bundle exec sidekiq

47733 TID-ox1lau26l INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-47733", :url=>nil}
47733 TID-ox1lau26l INFO: Starting processing, hit Ctrl-C to stop

Therefore, the job having name argument as B or ID as 37a5ab4139796c4b9dc1ea6d is completely lost. There is no way to get that job back.

Losing job like this may not be a problem for some applications but for some critical applications this could be a huge issue.

We faced a similar problem like this. One of our clients’ application is deployed on a Kubernetes cluster. Our Sidekiq process runs in a Docker container in the Kubernetes pods which we call background pods.

Here’s our stripped down version of Kubernetes deployment manifest which creates a Kubernetes deployment resource. Our Sidekiq process runs in the pods spawned by that deployment resource.

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: background
spec:
  replicas: 2
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: background
        image: <%= ENV['IMAGE'] %>
        env:
        - name: POD_TYPE
          value: background
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/bash
              - -l
              - -c
              - for pid in tmp/pids/sidekiq*.pid; do bin/bundle exec sidekiqctl stop $pid 60; done

When we apply an updated version of this manifest ,for say, changing the Docker image, the running pods are terminated and new pods are created.

Before terminating the only container in the pod, Kubernetes executes sidekiqctl stop $pid 60 command which we have defined using the preStop event handler. Note that, Kubernetes already sends SIGTERM signal to the container being terminated inside the pod before invoking the preStop event handler. The default termination grace period is 30 seconds and it is configurable. If the container doesn’t terminate within the termination grace period, a SIGKILL signal will be sent to forcefully terminate the container.

The sidekiqctl stop $pid 60 command executed in the preStop handler does three things.

  1. Sends a SIGTERM signal to the Sidekiq process running in the container.
  2. Waits for 60 seconds.
  3. Sends a SIGKILL signal to kill the Sidekiq process forcefully if the process has not terminated gracefully yet.

This worked for us when the count of busy jobs was relatively small.

When the number of processing jobs is higher, Sidekiq does not get enough time to quiet the busy workers and fails to push some of them back on the Redis queue.

We found that some of the jobs were getting lost when our background pod restarted. We had to restart our background pod for reasons such as updating the Kubernetes deployment manifest, pod being automatically evicted by Kubernetes due to host node encountering OOM (out of memory) issue, etc.

We tried increasing both terminationGracePeriodSeconds in the deployment manifest as well as the sidekiqctl stop command’s timeout. Despite that, we still kept facing the same issue of losing jobs whenever pod restarts.

We even tried sending TSTP and then TERM after a timeout relatively longer than 60 seconds. But the pod was getting harshly terminated without gracefully terminating Sidekiq process running inside it. Therefore we kept losing the busy jobs which were running during the pod termination.

Sidekiq Pro’s super_fetch

We were looking for a way to stop losing our Sidekiq jobs or a way to recover them reliably when our background Kubernetes pod restarts.

We realized that the commercial version of Sidekiq, Sidekiq Pro offers an additional fetch strategy, super_fetch, which seemed more efficient and reliable compared to basic_fetch strategy.

Let’s see what difference super_fetch strategy makes over basic_fetch.

We will need to use sidekiq-pro gem which needs to be purchased. Since Sidekiq Pro gem is close sourced, we cannot fetch it from the default public gem registry, https://rubygems.org. Instead, we will have to fetch it from a private gem registry which we get after purchasing it. We add following code to our Gemfile and run bundle install.

source ENV['SIDEKIQ_PRO_GEM_URL'] do
  gem 'sidekiq-pro'
end

To enable super_fetch, we need to add following code in an initializer config/initializers/sidekiq.rb.

Sidekiq.configure_server do |config|
  config.super_fetch!
end

Well, that’s it. Sidekiq will use super_fetch instead of basic_fetch now.

$ bundle exec sidekiq

75595 TID-owsytgvqj INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
75595 TID-owsytgvqj INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-75595", :url=>nil}
75595 TID-owsytgvqj INFO: Starting processing, hit Ctrl-C to stop
75595 TID-owsys5imz INFO: SuperFetch activated

When super_fetch is activated, Sidekiq process’ graceful shutdown behavior is similar to that of basic_fetch.

>> SleepWorker.perform_async("C")
=> "f002a41393f9a79a4366d2b5"
>> Sidekiq::Queue.new.size
=> 1
$ bundle exec sidekiq

76021 TID-ow6kdcca5 INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
76021 TID-ow6kdcca5 INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-76021", :url=>nil}
76021 TID-ow6kdcca5 INFO: Starting processing, hit Ctrl-C to stop
76021 TID-ow6klq2cx INFO: SuperFetch activated
76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
>> Sidekiq::Queue.new.size
=> 0
$ kill -SIGTERM 76021
76021 TID-ow6kdcca5 INFO: Shutting down
76021 TID-ow6kdcca5 INFO: Terminating quiet workers
76021 TID-ow6kieuwh INFO: Scheduler exiting...
76021 TID-ow6kdcca5 INFO: Pausing to allow workers to finish...
76021 TID-ow6kdcca5 WARN: Terminating 1 busy worker threads
76021 TID-ow6kdcca5 WARN: Work still in progress [#<struct Sidekiq::Pro::SuperFetch::Retriever::UnitOfWork queue="queue:default", job="{\"class\":\"SleepWorker\",\"args\":[\"C\"],\"retry\":true,\"queue\":\"default\",\"jid\":\"f002a41393f9a79a4366d2b5\",\"created_at\":1525500653.404454,\"enqueued_at\":1525500653.404501}", local_queue="queue:sq|vishal.local:76021:3e64c4b08393|default">]
76021 TID-ow6kdcca5 INFO: SuperFetch: Moving job from queue:sq|vishal.local:76021:3e64c4b08393|default back to queue:default
76021 TID-ow6kiesnp SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: fail: 13.758 sec
76021 TID-ow6kdcca5 INFO: Bye!
>> Sidekiq::Queue.new.size
=> 1

That looks good. As we can see, Sidekiq moved busy job back from a private queue to the queue in Redis when Sidekiq received a SIGTERM signal.

Now, let’s try to kill Sidekiq process forcefully without allowing a graceful shutdown by sending a SIGKILL signal.

Since Sidekiq was gracefully shutdown before, if we restart Sidekiq again, it will re-process the pushed back job having ID f002a41393f9a79a4366d2b5.

$ bundle exec sidekiq

76890 TID-oxecurbtu INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
76890 TID-oxecurbtu INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-76890", :url=>nil}
76890 TID-oxecurbtu INFO: Starting processing, hit Ctrl-C to stop
76890 TID-oxecyhftq INFO: SuperFetch activated
76890 TID-oxecyotvm SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
[1]    76890 killed     bundle exec sidekiq
$ kill -SIGKILL 76890
>> Sidekiq::Queue.new.size
=> 0

It appears that Sidekiq didn’t get any chance to push the busy job back to the queue in Redis on receiving a SIGKILL signal.

So, where is the magic of super_fetch?

Did we lose our job again?

Let’s restart Sidekiq and see it ourself.

$ bundle exec sidekiq

77496 TID-oum04ghgw INFO: Sidekiq Pro 4.0.2, commercially licensed.  Thanks for your support!
77496 TID-oum04ghgw INFO: Booting Sidekiq 5.1.3 with redis options {:id=>"Sidekiq-server-PID-77496", :url=>nil}
77496 TID-oum04ghgw INFO: Starting processing, hit Ctrl-C to stop
77496 TID-oum086w9s INFO: SuperFetch activated
77496 TID-oum086w9s WARN: SuperFetch: recovered 1 jobs
77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: start
Started C
Finished C
77496 TID-oum08eu3o SleepWorker JID-f002a41393f9a79a4366d2b5 INFO: done: 30.011 sec

Whoa, isn’t that cool?

See that line where it says SuperFetch: recovered 1 jobs.

Although the job wasn’t pushed back to the queue in Redis, Sidekiq somehow recovered our lost job having ID f002a41393f9a79a4366d2b5 and reprocessed that job again!

Interested to learn about how Sidekiq did that? Keep on reading.

Note that, since Sidekiq Pro is a close sourced and commercial software, we cannot explain super_fetch’s exact implementation details.

As we discussed in-depth before, Sidekiq’s basic_fetch strategy uses BRPOP Redis command to fetch a job from the queue in Redis. It works great to some extent, but it is prone to losing job if Sidekiq crashes or does not shutdown gracefully.

On the other hand, Sidekiq Pro offers super_fetch strategy which uses RPOPLPUSH Redis command to fetch a job.

RPOPLPUSH Redis command provides a unique approach towards implementing a reliable queue. RPOPLPUSH command accepts two lists namely a source list and a destination list. This command atomically returns and removes the last element from the source list, and pushes that element as the first element in the destination list. Atomically means that both pop and push operations are performed as a single operation at the same time; i.e. both should succeed, otherwise both are treated as failed.

super_fetch registers a private queue in Redis for each Sidekiq process on start-up. super_fetch atomically fetches a scheduled job from the public queue in Redis and pushes that job into the private queue (or working queue) using RPOPLPUSH Redis command. Once the job is finished processing, Sidekiq removes that job from the private queue. During a graceful shutdown, Sidekiq moves back the unfinished jobs from the private queue to the public queue. If shutdown of Sidekiq process is not graceful, the unfinished jobs of that Sidekiq process remain there in the private queue which are called as orphaned jobs. On restarting or starting another Sidekiq process, super_fetch looks for such orphaned jobs in the private queues. If Sidekiq finds orphaned jobs, Sidekiq re-enqueue them and processes again.

It may happen that we have multiple Sidekiq processes running at the same time. If a process dies among them, its unfinished jobs become orphans. This Sidekiq wiki describes in detail the criteria which super_fetch relies upon for identifying which jobs are orphaned and which jobs are not orphaned. If we don’t restart or start another process, super_fetch may take 5 minutes or 3 hours to recover such orphaned jobs. The recommended approach is to restart or start another Sidekiq process to signal super_fetch to look for orphans.

Interestingly, in the older versions of Sidekiq Pro, super_fetch performed checks for orphaned jobs and queues every 24 hours at the Sidekiq process startup. Due to this, when the Sidekiq process crashes, the orphaned jobs of that process remain unpicked for up to 24 hours until the next restart. This orphan delay check window had been later lowered to 1 hour in Sidekiq Pro 3.4.1.

Another fun thing to know is that, there existed two fetch strategies namely reliable_fetch and timed_fetch in the older versions of Sidekiq Pro. Apparently, reliable_fetch did not work with Docker and timed_fetch had asymptotic computational complexity O(log N), comparatively less efficient than super_fetch, which has asymptotic computational complexity O(1). Both of these strategies had been deprecated in Sidekiq Pro 3.4.0 in favor of super_fetch. Later, both of these strategies had been removed in Sidekiq Pro 4.0 and are not documented anywhere.

Final result

We have enabled super_fetch in our application and it seemed to be working without any major issues so far. Our Kubernetes background pods does not seem to be loosing any jobs when these pods are restarted.

Update : Mike Pheram, author of Sidekiq, posted following comment.

Faktory provides all of the beanstalkd functionality, including the same reliability, with a nicer Web UI. It’s free and OSS. https://github.com/contribsys/faktory http://contribsys.com/faktory/

Rails 5.2 sets Ruby version in Gemfile and adds .ruby-version file by default

This blog is part of our Rails 5.2 series.

For Ruby developers, it’s common to switch between multiple Ruby versions for multiple projects as per the needs of the project. Sometimes, the process of going back and forth with multiple Ruby versions could be frustrating for the developer. To avoid this we add .ruby-version files to our projects so that version manager tools such as rvm, rbenv etc. can easily determine which Ruby version should be used for that particular project.

One other case that Rails developers have to take care of is ensuring that the Ruby version used to run Rails by the deployment tools is the one that is desired. In order to ensure that we add ruby version to Gemfile. This will help bundler install dependencies scoped to the specified Ruby version.

Good News! Rails 5.2 makes our work easy.

In Rails 5.2, changes have been made to introduce .ruby-version file and also add the Ruby version to Gemfile by default after creating an app.

Let’s create a new project with Ruby 2.5 .

$ rvm list default

  Default Ruby (for new shells)

     ruby-2.5 [ x86_64 ]


$ rails new my_new_app

In our new project, we should be able to see .ruby-version in its root directory and it will contain value 2.5. Also, we should see following line in the Gemfile.

ruby "2.5"

Deploying Docker Registry on Kubernetes using S3 Storage

In today’s era of containerization, no matter what container we are using we need an image to run the container. Docker images are stored on container registries like Docker hub(cloud), Google Container Registry(GCR), AWS ECR, quay.io etc.

We can also self-host docker registry on any docker platform. In this blog post, we will see how to deploy docker registry on kubernetes using storage driver S3.

Pre-requisite:

As per docker registry documentation, We can simply start the registry using docker image registry.

Basic parameters when deploying production registry are:

  • Authentication
  • SSL
  • Storage

We will use htpasswd authentication for this post though registry image supports silly and token based authentication as well.

Docker registry requires applications to use SSL certificate and key in the registry. We will use kubernetes service, which terminates SSL on ELB level using annotations.

For registry storage, we can use filesystem, s3, azure, swift etc. For the complete list of options please visit docker site site.

We need to store the docker images pushed to the registry. We will use S3 to store these docker images.

Steps for deploying registry on kubernetes.

Get the ARN of the SSL certificate to be used for SSL.

If you don’t have SSL on AWS IAM, upload it using the following command.

$aws iam upload-server-certificate --server-certificate-name registry --certificate-body file://registry.crt --private-key file://key.pem

Get the arn for the certificate using the command.

$aws iam get-server-certificate --server-certificate-name registry  | grep Arn

Create S3 bucket which will be used to store docker images using s3cmd or aws s3.


$s3cmd mb s3://myregistry
Bucket 's3://myregistry/' created

Create a separate namespace, configmap, deployment and service for registry using following templates.

---
apiVersion: v1
kind: Namespace
metadata:
name: container-registry

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: auth
  namespace: container-registry
data:
  htpasswd: |
    admin:$2y$05$TpZPzI7U7cr3cipe6jrOPe0bqohiwgEerEB6E4bFLsUf7Bk.SEBRi

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: registry
  name: registry
  namespace: container-registry
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: registry
    spec:
      containers:
      - env:
        - name: REGISTRY_AUTH
          value: htpasswd
        - name: REGISTRY_AUTH_HTPASSWD_PATH
          value: /auth/htpasswd
        - name: REGISTRY_AUTH_HTPASSWD_REALM
          value: Registry Realm
        - name: REGISTRY_STORAGE
          value: s3
        - name: REGISTRY_STORAGE_S3_ACCESSKEY
          value: <your-s3-access-key>
        - name: REGISTRY_STORAGE_S3_BUCKET
          value: <your-registry-bucket>
        - name: REGISTRY_STORAGE_S3_REGION
          value: us-east-1
        - name: REGISTRY_STORAGE_S3_SECRETKEY
          value: <your-secret-s3-key>
        image: registry:2
        name: registry
        ports:
        - containerPort: 5000
        volumeMounts:
          - name: auth
            mountPath: /auth
      volumes:
        - name: auth
          configMap:
            name: auth
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <your-iam-certificate-arn>
    service.beta.kubernetes.io/aws-load-balancer-instance-protocol: http
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: '443'
  labels:
    app: registry
  name: registry
  namespace: container-registry
spec:
  ports:
  - name: "443"
    port: 443
    targetPort: 5000
  selector:
    app: registry
type: LoadBalancer

Let’s launch this manifest using kubectl apply.

kubectl apply -f registry-namespace.yml registry-configmap.yml registry-deployment.yaml registry-namespace.yml
namespace "registry" created
configmap "auth" created
deployment "registry" created
service "registry" created

Now that we have created registry, we should map DNS to web service ELB endpoint. We can get the webservice ELB endpoint using the following command.

$kubectl -n registry get svc registry -o wide

NAME       CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)         AGE       SELECTOR
registry   100.71.250.56   abcghccf8540698e8bff782799ca8h04-1234567890.us-east-2.elb.amazonaws.com   443:30494/TCP   1h       app=registry

We will point DNS to this ELB endpoint with domain registry.myapp.com

Once we have registry running, now it’s time to push the image to a registry.

First, pull the image or build the image locally to push.

On local machine run following commands:

$docker pull busybox
latest: Pulling from busybox
f9ea5e501ad7: Pull complete
ac3f08b78d4e: Pull complete
Digest: sha256:da268b65d710e5ca91271f161d0ff078dc63930bbd6baac88d21b20d23b427ec
Status: Downloaded newer image for busybox:latest

Now login to our registry using the following commands.

$ sudo docker login registry.myapp.com

Username: admin

Password:

Login Succeeded

Now tag the image to point it to our registry using docker tag command

$ sudo docker tag busybox registry.myapp.com/my-app:latest

Once the image is tagged we are good to push.

Using the docker push command let’s push the image.

$ sudo docker push docker.gocloudlogistics.com/my-app:latest

The push refers to a repository [registry.myapp.com/my-app]

05732a3f47b5: Pushed
30de36c4bd15: Pushed
5237590c0d08: Pushed
latest: digest: sha256:f112e608b2639b21498bd4dbca9076d378cc216a80d52287f7f0f6ea6ad739ab size: 205

We are successfully able to push image to registry running on kunbernetes and stored on S3. Let’s verify if it exists on S3.

Navigate to our s3 bucket and we can see the docker registry repository busybox has been created.

$ s3cmd ls s3://myregistry/docker/registry/v2/repositories/
DIR   s3://myregistry/docker/registry/v2/repositories/busybox/

All our image related files are stored on S3.

In this way, we self-host container registry on kubernetes backed by s3 storage.

Ruby 2.6 added option to raise exception in Kernel#system

This blog is part of our Ruby 2.6 series. Ruby 2.6.0-preview1 was recently released.

We write scripts to automate setup and deployment of Rails applications. In those scripts, at many places, we need to run system commands like bundle install, rake db:create, rake db:migrate and many more.

Let’s suppose we need to run migrations using rake db:migrate in a rails project setup script. We can use Kernel#system method.

irb> system('rake db:migrate')

Ruby 2.5.0

Executing system returns true or false. Another feature of system is that it eats up the exceptions.

Let’s suppose our migrations can run successfully. In this case system command for running migrations will return true.

irb> system('rake db:migrate')
 => true

Let’s suppose we have a migration which is trying to add a column to a table which does not exist. In this case, system command for running migrations will return false.

irb> system('rake db:migrate')
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
 => false

As we can see, even when there is a failure in executing system commands, the return value is false. Ruby does not raise an exception in those cases.

However, we can use raise explicitly to raise an exception and halt the setup script execution.

irb> system('rake db:migrate') || raise('Failed to run migrations')
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
Traceback (most recent call last):
        2: from /Users/amit/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `<main>'
        1: from (irb):4
RuntimeError (Failed to run migrations)

Ruby 2.6.0-preview1

Ruby 2.6 make our lives easier by providing an option exception: true so that we do not need to use raise explicitly to halt script execution.

irb> system('rake db:migrate', exception: true)
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
Traceback (most recent call last):
        3: from /Users/amit/.rvm/rubies/ruby-2.6.0-preview1/bin/irb:11:in `<main>'
        2: from (irb):2
        1: from (irb):2:in `system'
RuntimeError (Command failed with exit 1: rake)

Ruby 2.6 works the same way as previous ruby versions when used without exception option or used with exception set as false.

irb> system('rake db:migrate', exception: false)
== 20180311211836 AddFirstNameToAdmins: migrating =============================
-- add_column(:admins, :first_name, :string)
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:

PG::UndefinedTable: ERROR:  relation "admins" does not exist
: ALTER TABLE "admins" ADD "first_name" character varying
.
.
.
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
 => false

Here is relevant commit and discussion for this change.

system is not the only way to execute such scripts. We wrote a blog 6 years back which discusses the differences between running commands using backtick, exec, sh, popen3, popen2e and Process.spawn.

Ruby 2.5 enables Thread.report_on_exception by default and we get more info when threads die

This blog is part of our Ruby 2.5 series.

Let’s see what happens when an exception is raised inside a thread.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

sleep 1

puts "In the main thread"

Execution of it looks like this.

$ RBENV_VERSION=2.4.0 ruby thread_example_1.rb

Calculating 4/0 in division_thread

In the main thread

Note that the last two lines from the block were not printed. Also notice that after failing in the thread the program continued to run in main thread. That’s why we got the message “In the main thread”.

This is because the default behavior of Ruby is to silently ignore exceptions in threads and then to continue to execute in the main thread.

Enabling abort_on_exception to stop on failure

If we want an exception in a thread to stop further processing both in the thread and in the main thread then we can enable Thread[.#]abort_on_exception on that thread to achieve that.

Notice that in the below code we are using Thread.current.

division_thread = Thread.new do
  Thread.current.abort_on_exception = true

  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

sleep 1

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_2.rb

Calculating 4/0 in division_thread

thread_example_2.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_2.rb:5:in `block in <main>'

As we can see once an exception was encountered in the thread then processing stopped on both in the thread and in the main thread.

Note that Thread.current.abort_on_exception = true activates this behavior only for the current thread.

If we want this behavior globally for all the threads then we need to use Thread.abort_on_exception = true.

Running program with debug flag to stop on failure

Let’s run the original code with --debug option.

$ RBENV_VERSION=2.4.0 ruby --debug thread_example_1.rb

thread_example_1.rb:1: warning: assigned but unused variable - division_thread

Calculating 4/0 in division_thread

Exception `ZeroDivisionError' at thread_example_1.rb:3 - divided by 0
Exception `ZeroDivisionError' at thread_example_1.rb:7 - divided by 0
thread_example_1.rb:3:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_1.rb:3:in `block in <main>'

In this case the exception is printed in detail and the code in main thread was not executed.

Usually when we execute a program with --debug option then the behavior of the program does not change. We expect the program to print more stuff but we do not expect behavior to change. However in this case the --debug option changes the behavior of the program.

Running program with join on thread to stop on failure

If a thread raises an exception and abort_on_exception and $DEBUG flags are not set then that exception will be processed at the time of joining of the thread.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

division_thread.join

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_3.rb

Calculating 4/0 in division_thread

thread_example_3.rb:3:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_3.rb:3:in `block in <main>'

Both Thread#join and Thread#value will stop processing in the thread and in the main thread once an exception is encountered.

Introduction of report_on_exception in Ruby 2.4

Almost 6 years ago, Charles Nutter (headius) had proposed that the exceptions raised in threads should be automatically logged and reported, by default. To make his point, he explained issues similar to what we discussed above about the Ruby’s behavior of silently ignoring exceptions in threads. Here is the relevant discussion on his proposal.

Following are some of the notable points discussed.

  • Enabling Thread[.#]abort_on_exception, by default, is not always a good idea.
  • There should be a flag which, if enabled, would print the thread-killing exception info.
  • In many cases, people spawn one-off threads which are not hard-referenced using Thread#join or Thread#value. Such threads gets garbage collected. Should it report the thread-killing exception at the time of garbage collection if such a flag is enabled?
  • Should it warn using Warning#warn or redirect to STDERR device while reporting?

Charles Nutter suggested that a configurable global flag Thread.report_on_exception and instance-level flag Thread#report_on_exception should be implemented having its default value as true. When set to true, it should report print exception information.

Matz and other core members approved that Thread[.#]report_on_exception can be implemented having its default value set to false.

Charles Nutter, Benoit Daloze and other people demanded that it should be true by default so that programmers can be aware of the silently disappearing threads because of exceptions.

Shyouhei Urabe advised that due to some technical challenges, the default value should be set to false so as this feature could land in Ruby. Once this feature is in then the default value can be changed in a later release.

Nobuyoshi Nakada (nobu) pushed an implementation for Thread[.#]report_on_exception with a default value set to false. It was released in Ruby 2.4.0.

Let’s try enabling report_on_exception globally using Thread.report_on_exception.

Thread.report_on_exception = true

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_4.rb

Calculating 4/0 in division_thread

#<Thread:0x007fb10f018200@thread_example_4.rb:3 run> terminated with exception:
thread_example_4.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_4.rb:5:in `block in <main>'

Calculating nil+4 in addition_thread

#<Thread:0x007fb10f01aca8@thread_example_4.rb:9 run> terminated with exception:
thread_example_4.rb:11:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

It now reports the exceptions in all threads. It prints that the Thread:0x007fb10f018200 was terminated with exception: divided by 0 (ZeroDivisionError). Similarly, another thread Thread:0x007fb10f01aca8 was terminated with exception: undefined method '+' for nil:NilClass (NoMethodError).

Instead of enabling it globally for all threads, we can enable it for a particular thread using instance-level Thread#report_on_exception.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  Thread.current.report_on_exception = true

  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"

In the above case we have enabled report_on_exception flag just for addition_thread.

Let’s execute it.

$ RBENV_VERSION=2.4.0 ruby thread_example_5.rb

Calculating 4/0 in division_thread

Calculating nil+4 in addition_thread

#<Thread:0x007f8e6b007f70@thread_example_5.rb:7 run> terminated with exception:
thread_example_5.rb:11:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

Notice how it didn’t report the exception which killed thread division_thread. As expected, it reported the exception that killed thread addition_thread.

With the above changes ruby reports the exception as soon as it encounters. However if these threads are joined then they will still raise exception.

division_thread = Thread.new do
  Thread.current.report_on_exception = true

  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

begin
  division_thread.join
rescue => exception
  puts "Explicitly caught - #{exception.class}: #{exception.message}"
end

puts "In the main thread"
$ RBENV_VERSION=2.4.0 ruby thread_example_6.rb

Calculating 4/0 in division_thread

#<Thread:0x007f969d00d828@thread_example_6.rb:1 run> terminated with exception:
thread_example_6.rb:5:in `/': divided by 0 (ZeroDivisionError)
  from thread_example_6.rb:5:in `block in <main>'

Explicitly caught - ZeroDivisionError: divided by 0

In the main thread

See how we were still be able to handle the exception raised in division_thread above after joining it despite it reported it before due to Thread#report_on_exception flag.

report_on_exception defaults to true in Ruby 2.5

Benoit Daloze (eregon) strongly advocated that both the Thread.report_on_exception and Thread#report_on_exception should have default value as true. Here is the relevant feature request.

After approval from Matz, Benoit Daloze pushed the implementation by fixing the failing tests and silencing the unnecessary verbose warnings.

It was released as part of Ruby 2.5.

Now in ruby 2.5 we can simply write like this.

division_thread = Thread.new do
  puts "Calculating 4/0 in division_thread"
  puts "Result is: #{4/0}"
  puts "Exiting from division_thread"
end

addition_thread = Thread.new do
  puts "Calculating nil+4 in addition_thread"
  puts "Result is: #{nil+4}"
  puts "Exiting from addition_thread"
end

sleep 1

puts "In the main thread"

Let’s execute it with Ruby 2.5.

$ RBENV_VERSION=2.5.0 ruby thread_example_7.rb

Calculating 4/0 in division_thread

#<Thread:0x00007f827689a238@thread_example_7.rb:1 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
  1: from thread_example_7.rb:3:in `block in <main>'
thread_example_7.rb:3:in `/': divided by 0 (ZeroDivisionError)

Calculating nil+4 in addition_thread

#<Thread:0x00007f8276899b58@thread_example_7.rb:7 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
thread_example_7.rb:9:in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)

In the main thread

We can disable the thread exception reporting globally using Thread.report_on_exception = false or for a particular thread using Thread.current.report_on_exception = false.

Future Possibilities

In addition to this feature, Charles Nutter also suggested that it will be good if there exists a callback handler which can accept a block to be executed when a thread dies due to an exception. The callback handler can be at global level or it can be for a specific thread.

Thread.on_exception do
  # some stuff
end

In the absence of such handler libraries need to resort to custom code to handle exceptions. Here is how Sidekiq handles exceptions raised in threads.

Important thing to note is that report_on_exception does not change behavior of the code. It does more reporting when a thread dies and when it comes to thread dies more reporting is a good thing.

Rails 5.2 added Date#prev_occurring and Date#next_occurring to return specified next & previous occurring day of week

This blog is part of our Rails 5.2 series.

Before Rails 5.2, this is how we would write to find next or previous occurring day of the week.

Assume that current date is Tue, 27 Feb 2018.

# find previous thursday
>> Date.yesterday.beginning_of_week(:thursday)
=> Thu, 22 Feb 2018

# find next thursday
>> Date.tomorrow.end_of_week(:friday)
=> Thu, 01 Mar 2018

Rails 5.2 has introduced methods Date#prev_occurring and Date#next_occurring to find next & previous occurring day of the week.

# find previous thursday
>> Date.prev_occurring(:thursday)
=> Thu, 22 Feb 2018

# find next thursday
>> Date.next_occurring(:thursday)
=> Thu, 01 Mar 2018

Ruby 2.5 supports measuring branch and method coverages

Ruby comes with Coverage, a simple standard library for test coverage measurement for a long time.

Before Ruby 2.5

Before Ruby 2.5, we could measure just the line coverage using Coverage.

Line coverage tells us whether a line is executed or not. If executed, then how many times that line was executed.

We have a file called score.rb.

score = 33

if score >= 40
  p :PASSED
else
  p :FAILED
end

Now create another file score_coverage.rb.

require "coverage"

Coverage.start
load "score.rb"
p Coverage.result

We used Coverage#start method to measure the coverage of score.rb file. Coverage#result returns the coverage result.

Let’s run it with Ruby 2.4.

$ RBENV_VERSION=2.4.0 ruby score_coverage.rb
:FAILED
{ "score.rb"=> [1, nil, 1, 0, nil, 1, nil] }

Let’s look at the output. Each value in the array [1, nil, 1, 0, nil, 1, nil] denotes the count of line executions by the interpreter for each line in score.rb file.

This array is also called the “line coverage” of score.rb file.

A nil value in line coverage array means coverage is disabled for that particular line number or it is not a relevant line. Lines like else, end and blank lines have line coverage disabled.

Here’s how we can read above line coverage result.

  • Line number 1 (i.e. 0th index in the above result array) was executed once.
  • Coverage was disabled for line number 2 (i.e. index 1) as it is blank.
  • Line number 3 (i.e. index 2) was executed once.
  • Line number 4 did not execute.
  • Coverage was disabled for line number 5 as it contains only else clause.
  • Line number 6 was executed once.
  • Coverage was disabled for line number 7 as it contains just end keyword.

After Ruby 2.5

There was a pull request opened in 2014 to add method coverage and decision coverage metrics in Ruby. It was rejected by Yusuke Endoh as he saw some issues with it and mentioned that he was also working on a similar implementation.

In Ruby 2.5, Yusuke Endoh added branch coverage and method coverage feature to the Coverage library.

Let’s see what’s changed in Coverage library in Ruby 2.5.

Line Coverage

If we execute above example using Ruby 2.5, we will see no change in the result.

$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" => [1, nil, 1, 0, nil, 1, nil] }

This behavior is maintained to ensure that the Coverage#start API stays 100% backward compatible.

If we explicitly enable lines option on Coverage#start method in the above score_coverage.rb file, the coverage result will be different now.

require "coverage"

Coverage.start(lines: true)
load "score.rb"
p Coverage.result
$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" => {
    :lines => [1, nil, 1, 0, nil, 1, nil]
  }
}

We can see that the coverage result is now a hash which reads that the score.rb file has lines coverage as [1, nil, 1, 0, nil, 1, nil].

Branch Coverage

Branch coverage helps us identify which branches are executed and which ones are not executed.

Let’s see how to get branch coverage.

We will update the score_coverage.rb by enabling branches option.

require "coverage"

Coverage.start(branches: true)
load "score.rb"
p Coverage.result
$ RBENV_VERSION=2.5.0 ruby score_coverage.rb
:FAILED
{ "score.rb" =>
  { :branches => {
      [:if, 0, 3, 0, 7, 3] => {
        [:then, 1, 4, 2, 4, 15] => 0,
        [:else, 2, 6, 2, 6, 15] => 1
      }
    }
  }
}

Here is how to read the data in array.

[
  BRANCH_TYPE,
  UNIQUE_ID,
  START_LINE_NUMBER,
  START_COLUMN_NUMBER,
  END_LINE_NUMBER,
  END_COLUMN_NUMBER
]

Please note that column numbers start from 0 and line numbers start from 1.

Let’s try to read above printed branch coverage result.

[:if, 0, 3, 0, 7, 3] reads that if statement starts at line 3 & column 0 and ends at line 7 & column 3.

[:then, 1, 4, 2, 4, 15] reads that then clause starts at line 4 & column 2 and ends at line 4 & column 15.

Similarly, [:else, 2, 6, 2, 6, 15] reads that else clause starts at line 6 & column 2 and ends at line 6 & column 15.

Most importantly as per the branch coverage format, we can see that the branch from if to then was never executed since COUNTER is 0. The another branch from if to else was executed once since COUNTER is 1.

Method Coverage

Measuring method coverage helps us identify which methods were invoked and which were not.

We have a file grade_calculator.rb.

students_scores = { "Sam" => [53, 91, 72],
                    "Anna" => [91, 97, 95],
                    "Bob" => [33, 69, 63] }

def average(scores)
  scores.reduce(&:+)/scores.size
end

def grade(average_score)
  case average_score
  when 90.0..100.0 then :A
  when 80.0..90.0 then :B
  when 70.0..80.0 then :C
  when 60.0..70.0 then :D
  else :F
  end
end

def greet
  puts "Congratulations!"
end

def warn
  puts "Try hard next time!"
end


students_scores.each do |student_name, scores|
  achieved_grade = grade(average(scores))

  puts "#{student_name}, you've got '#{achieved_grade}' grade."

  if achieved_grade == :A
    greet
  elsif achieved_grade == :F
    warn
  end

  puts
end

To measure method coverage of above file, let’s create grade_calculator_coverage.rb by enabling methods option on Converage#start method.

require "coverage"

Coverage.start(methods: true)
load "grade_calculator.rb"
p Coverage.result

Let’s run it using Ruby 2.5.

$ RBENV_VERSION=2.5.0 ruby grade_calculator_coverage.rb
Sam, you've got 'C' grade.

Anna, you've got 'A' grade.
Congratulations!

Bob, you've got 'F' grade.
Try hard next time!

{ "grade_calculator.rb" => {
    :methods => {
      [Object, :warn, 23, 0, 25, 3] => 1,
      [Object, :greet, 19, 0, 21, 3] => 1,
      [Object, :grade, 9, 0, 17, 3] => 3,
      [Object, :average, 5, 0, 7, 3] => 3
    }
  }
}

The format of method coverage result is defined as shown below.

[ CLASS_NAME,
  METHOD_NAME,
  START_LINE_NUMBER,
  START_COLUMN_NUMBER,
  END_LINE_NUMBER,
  END_COLUMN_NUMBER ]

Therefore, [Object, :grade, 9, 0, 17, 3] => 3 reads that the Object#grade method which starts from line 9 & column 0 to line 17 & column 3 was invoked 3 times.

Conclusion

We can measure all coverages at once also.

Coverage.start(lines: true, branches: true, methods: true)

What’s the use of these different types of coverages anyway?

Well, one use case is to integrate this in a test suite and to determine which lines, branches and methods are executed and which ones are not executed by the test. Further, we can sum up these and evaluate total coverage of a test suite.

Author of this feature, Yusuke Endoh (mame) has released coverage-helpers gem which allows further advanced manipulation and processing of coverage results obtained using Coverage#result.