San Francisco, USA

5214F Diamond Heights Blvd #553
San Francisco, CA 94131

Pune, India

203, Jewel Towers, 2nd Floor
Lane Number 5, Koregaon Park
Pune 411001, India

301 - 275 - 3997
hello@BigBinary.com

Rails 6 raises ActiveModel::MissingAttributeError when update_columns is used with non-existing attribute

This blog is part of our Rails 6 series. Rails 6.0.0.beta2 was recently released.

Rails 6 raises ActiveModel::MissingAttributeError when update_columns is used with a non-existing attribute. Before Rails 6, update_columns raises ActiveRecord::StatementInvalid error.

Rails 5.2

>> User.first.update_columns(email: 'amit@bigbinary.com')
SELECT  "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 1]]
UPDATE "users" SET "email" = $1 WHERE "users"."id" = $2  [["email", "amit@bigbinary.com"], ["id", 1]]

=> Traceback (most recent call last):
        1: from (irb):8
ActiveRecord::StatementInvalid (PG::UndefinedColumn: ERROR:  column "email" of relation "users" does not exist)
LINE 1: UPDATE "users" SET "email" = $1 WHERE "users"."id" = $2
                           ^
: UPDATE "users" SET "email" = $1 WHERE "users"."id" = $2

Rails 6.0.0.beta2

>> User.first.update_columns(email: 'amit@bigbinary.com')
SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ?  [["LIMIT", 1]]

Traceback (most recent call last):
        1: from (irb):1
ActiveModel::MissingAttributeError (can't write unknown attribute `email`)

Here is the relevant commit.


Rails 6 changed ActiveRecord::Base.configurations result to an object

This blog is part of our Rails 6 series. Rails 6.0.0.beta2 was recently released.

Rails 6 changed return value of ActiveRecord::Base.configurations to an object of ActiveRecord::DatabaseConfigurations. Before Rails 6, ActiveRecord::Base.configurations returned a hash with all the database configurations. We can call to_h on object of ActiveRecord::DatabaseConfigurations to get a hash.

A method named as configs_for has also been added on to fetch configurations for a particular environment.

Rails 5.2

>> ActiveRecord::Base.configurations

=> {"development"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/development.sqlite3"}, "test"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/test.sqlite3"}, "production"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/production.sqlite3"}}

Rails 6.0.0.beta2

>> ActiveRecord::Base.configurations

=> #<ActiveRecord::DatabaseConfigurations:0x00007fc18274f9f0 @configurations=[#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fc18274f680 @env_name="development", @spec_name="primary", @config={"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/development.sqlite3"}>, #<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fc18274f608 @env_name="test", @spec_name="primary", @config={"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/test.sqlite3"}>, #<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fc18274f590 @env_name="production", @spec_name="primary", @config={"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/production.sqlite3"}>]>

>> ActiveRecord::Base.configurations.to_h

=> {"development"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/development.sqlite3"}, "test"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/test.sqlite3"}, "production"=>{"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/production.sqlite3"}}

>> ActiveRecord::Base.configurations['development']

=> {"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/development.sqlite3"}

>> ActiveRecord::Base.configurations.configs_for(env_name: "development")

=> [#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fc18274f680 @env_name="development", @spec_name="primary", @config={"adapter"=>"sqlite3", "pool"=>5, "timeout"=>5000, "database"=>"db/development.sqlite3"}>]

Here is the relevant pull request.


Rails 6 shows unpermitted params in logs in color

Strong parameters allow us to control the user input in our Rails app. In development environment the unpermitted parameters are shown in the log as follows.

Unpermitted params before Rails 6

It is easy to miss this message in the flurry of other messages.

Rails 6 has added a change to show these params in red color for better visibility.

Unpermitted params after Rails 6


Marketing strategy at BigBinary

BigBinary started in 2011. Here are our revenue numbers for the last 7 years.

BigBinary revenue

We achieved this to date without having any outbound marketing and sales strategy.

  • We have never sent a cold email.
  • We have never sent a cold LinkedIn message.
  • The only time we advertised was a period of two months when we tried Google advertisements, with no outcomes.
  • We do not sponsor any podcast.
  • We have not had a sales person.
  • We have not had a marketing person.

We have kept our head down and have focused on what we do best, such as designing, developing, debugging, devops, and blogging.

This is what has worked out for us so far:

  • We contribute to the community through blog posts and open source.
  • We sponsor community events like Rails Girls and Ruby Conf India.
  • We sponsor many React and Ruby meetups.
  • We focus on keeping our existing clients happy.

Over the years I have come across many people who aspire to be freelancers. While it is not for everyone, I encourage them to give freelancing a try.

The greatest hindrance I have seen is that they stress over sales and marketing, and as it should be. Being a freelancer means constant need to find your next client.

I’m not here to say what others ought to do. I’m here to say what has worked out for BigBinary over the last 7 years.

While we plan to experiment with new forms of marketing, networking, and sales channel as we grow, it is not the end-all-be-all for freelancers. While marketing, networking, and sales may be effective for some, it was not how we started BigBinary and may not be how you want to start as well.

For us at BigBinary, it has been writing blogs. When we come across a potentially intriguing blog topic, we save the topic by creating a Github issue. When we have downtime, we pick up a topic from our issues list. It’s as simple as that and has been our primary driver of growth thus far.

While you should experiment to find out what works best for you, you need to find out what suits your personality. If you are good at teaching through videos, consider creating your own YouTube channel. If you contribute to open source, try creating a blog about your efforts and learnings. If you are good at concentrating on a niche technology, build your marketing and business around that.

I can confidently say that majority of people I met and who want to be freelancer would do fine if they simply share what they are learning. Most of these people do technical work. Some of them already blog and others can blog. A blog is a decent start nearly everybody will say. I’m saying that it is a good end too.

If you do not want to do any other form of marketing then that’s fine too. Just blogging will work out fine for you just like it has worked out fine for us at BigBinary.

Just because you are going to be a freelancer you don’t have to change who you are. If you don’t like sending cold emails then don’t. If you do not like networking then that’s alright as well. Write personal emails, dump corporate talk, show compassion and be genuine.

So go on and do some freelancing. It would teach you a lot about software development, business, life, managing money, creating value and capturing value. It will be rough at times. And it would be hard at times. But it would also be a ton of fun.


Rails 6 adds delete_by and destroy_by as ActiveRecord::Relation methods

This blog is part of our Rails 6 series. Rails 6.0.0.beta2 was recently released.

As described by DHH in the issue, Rails has find_or_create_by, find_by and similar methods to create and find the records matching the specified conditions. Rails was missing similar feature for deleting/destroying the record(s).

Before Rails 6, deleting/destroying the record(s) which are matching the given condition was done as shown below.

  # Example to destroy all authors matching the given condition
  Author.find_by(email: "abhay@example.com").destroy
  Author.where(email: "abhay@example.com", rating: 4).destroy_all

  # Example to delete all authors matching the given condition
  Author.find_by(email: "abhay@example.com").delete
  Author.where(email: "abhay@example.com", rating: 4).delete_all

The above examples were missing the symmetry like find_or_create_by and find_by methods.

In Rails 6, the new delete_by and destroy_by methods have been added as ActiveRecord::Relation methods. ActiveRecord::Relation#delete_by is short-hand for relation.where(conditions).delete_all. Similarly, ActiveRecord::Relation#destroy_by is short-hand for relation.where(conditions).destroy_all.

Here is how it can be used.

  # Example to destroy all authors matching the given condition using destroy_by
  Author.destroy_by(email: "abhay@example.com")
  Author.destroy_by(email: "abhay@example.com", rating: 4)

  # Example to destroy all authors matching the given condition using delete_by
  Author.delete_by(email: "abhay@example.com")
  Author.delete_by(email: "abhay@example.com", rating: 4)

Check out the pull request for more details on this.


Rails 6 adds ActiveRecord::Relation#touch_all

This blog is part of our Rails 6 series. Rails 6.0.0.beta2 was recently released.

Before moving forward, we need to understand what touch method is. touch is used to update updated_at timestamp by default to current time. It also takes custom time or different columns as parameters.

Rails 6 has added touch_all on ActiveRecord::Relation to touch multiple records in a go. Before Rails 6, we needed to iterate on all records using an iterator to achieve this.

Let’s take an example in which we call touch_all on all User records.

Rails 5.2

>> User.count
SELECT COUNT(\*) FROM "users"

=> 3

>> User.all.touch_all

=> Traceback (most recent call last):1: from (irb):2
NoMethodError (undefined method 'touch_all' for #<User::ActiveRecord_Relation:0x00007fe6261f9c58>)

>> User.all.each(&:touch)
SELECT "users".* FROM "users"
begin transaction
  UPDATE "users" SET "updated_at" = ? WHERE "users"."id" = ?  [["updated_at", "2019-03-05 17:45:51.495203"], ["id", 1]]
commit transaction
begin transaction
  UPDATE "users" SET "updated_at" = ? WHERE "users"."id" = ?  [["updated_at", "2019-03-05 17:45:51.503415"], ["id", 2]]
commit transaction
begin transaction
  UPDATE "users" SET "updated_at" = ? WHERE "users"."id" = ?  [["updated_at", "2019-03-05 17:45:51.509058"], ["id", 3]]
commit transaction

=> [#<User id: 1, name: "Sam", created_at: "2019-03-05 16:09:29", updated_at: "2019-03-05 17:45:51">, #<User id: 2, name: "John", created_at: "2019-03-05 16:09:43", updated_at: "2019-03-05 17:45:51">, #<User id: 3, name: "Mark", created_at: "2019-03-05 16:09:45", updated_at: "2019-03-05 17:45:51">]

Rails 6.0.0.beta2

>> User.count
SELECT COUNT(*) FROM "users"

=> 3

>> User.all.touch_all
UPDATE "users" SET "updated_at" = ?  [["updated_at", "2019-03-05 16:08:47.490507"]]

=> 3

touch_all returns count of the records on which it is called.

touch_all also takes a custom time or different columns as parameters.

Rails 6.0.0.beta2

>> User.count
SELECT COUNT(*) FROM "users"

=> 3

>> User.all.touch_all(time: Time.new(2019, 3, 2, 1, 0, 0))
UPDATE "users" SET "updated_at" = ?  [["updated_at", "2019-03-02 00:00:00"]]

=> 3

>> User.all.touch_all(:created_at)
UPDATE "users" SET "updated_at" = ?, "created_at" = ?  [["updated_at", "2019-03-05 17:55:41.828347"], ["created_at", "2019-03-05 17:55:41.828347"]]

=> 3

Here is the relevant pull request.


Rails 6 adds negative scopes on enum

This blog is part of our Rails 6 series. Rails 6.0.0.beta2 was recently released.

When an enum attribute is defined on a model, Rails adds some default scopes to filter records based on values of enum on enum field.

Here is how enum scope can be used.

class Post < ActiveRecord::Base
  enum status: %i[drafted active trashed]
end

Post.drafted # => where(status: :drafted)
Post.active  # => where(status: :active)

In Rails 6, negative scopes are added on the enum values.

As mentioned by DHH in the pull request,

these negative scopes are convenient when you want to disallow access in controllers

Here is how they can be used.

class Post < ActiveRecord::Base
  enum status: %i[drafted active trashed]
end

Post.not_drafted # => where.not(status: :drafted)
Post.not_active  # => where.not(status: :active)

Check out the pull request for more details on this.


MJIT Support in Ruby 2.6

This blog is part of our Ruby 2.6 series. Ruby 2.6.0 was released on Dec 25, 2018.

What is JIT?

JIT stands for Just-In-Time compiler. It converts repeatedly used code to bytecode which can be sent to processor directly hence saving time of compiling same piece of code again and again.

Ruby 2.6

MJIT is introduced in Ruby 2.6. It is most commonly known as MRI JIT or Method Based JIT.

It’s part of Ruby 3x3 project that was started by Matz. The name was to signify that Ruby 3.0 will be 3 times faster than Ruby 2.0 and it focused mainly on performance. Apart from performance it also aims for following things.

  1. Portability
  2. Stability
  3. Security

MJIT is still in development and hence MJIT is optional in Ruby 2.6. If you are running Ruby 2.6 then execute following commnad.

ruby --help

You will see following options.

--Jit-wait # Wait program execution until code compiles.
--jit-verbose=num # Level information MJIT compiler prints for Ruby program.
--jit-min-calls=num # Minimum count in loops for which MJIT should work.
--jit-max-cache
--jit-save-temps # Save compiled library to the file.

Vladimir Makarov proposed to improve performance by replacing VM instructions with RTL(Register Transfer Language) and introducing Method based JIT compiler.

Vladimir explained MJIT architecture in his RubyKaigi 2017 conference keynote.

Ruby’s compiler converts the code to YARV(Yet Another Ruby VM) instructions and then these instructions are run by the Ruby Virtual Machine. Code that is executed too often are converted to RTL instructions which runs faster.

Let’s take a look at how MJIT works.

# mjit.rb

require 'benchmark'

puts Benchmark.measure {
  def test_while
    start_time = Time.now
    i = 0

    while i < 4
      i += 1
    end

    i
    puts Time.now - start_time
  end

  4.times { test_while }
}

Let’s run this code with MJIT options and check what we got.

ruby --jit --jit-verbose=1 --jit-wait --disable-gems mjit.rb
Time taken is 4.0e-06
Time taken is 0.0
Time taken is 0.0
Time taken is 0.0
  0.000082   0.000032   0.000114 (  0.000105)
Successful MJIT finish

Nothing interesting right? And why is that? because we are iterating the loop for 4 times and default value for MJIT to work is 5. We can always decide after how many calls MJIT should work by providing --jit-min-calls=#number option.

Let’s tweak the program a bit so that MJIT gets some work to do.

require 'benchmark'

puts Benchmark.measure {
  def test_while
    start_time = Time.now
    i = 0

    while i < 4_00_00_000
      i += 1
    end

    puts "Time taken is #{Time.now - start_time}"
  end

  10.times { test_while }
}

After running above code we can see some work done by MJIT.

Time taken is 0.457916
Time taken is 0.455921
Time taken is 0.454672
Time taken is 0.452823
JIT success (72.5ms): block (2 levels) in <main>@mjit.rb:15 -> /var/folders/v6/_6sh53vn5gl3lct18w533gr80000gn/T//_ruby_mjit_p66220u0.c
JIT success (140.9ms): test_while@mjit.rb:4 -> /var/folders/v6/_6sh53vn5gl3lct18w533gr80000gn/T//_ruby_mjit_p66220u1.c
JIT compaction (23.0ms): Compacted 2 methods -> /var/folders/v6/_6sh53vn5gl3lct18w533gr80000gn/T//_ruby_mjit_p66220u2.bundle
Time taken is 0.463703
Time taken is 0.102852
Time taken is 0.103335
Time taken is 0.103299
Time taken is 0.103252
Time taken is 0.103261
  2.797843   0.005357   3.141944 (  2.801391)
Successful MJIT finish

Here’s whats happening. Method ran 4 times and on 5th call it found that it is running same code again. So MJIT started a separate thread to convert the code into RTL instructions which created shared object library. Next threads took that shared code and executed directly. As we passed option --jit-verbose=1 we can see what MJIT did.

What we are seeing in output are followings.

  1. Time took to compile.
  2. What block of code is compiled by JIT.
  3. Location of compiled code.

We can open the file and see how MJIT converted piece of code to binary instructions but for that we need to pass another option which is --jit-save-temps and then just inspect those files.

After compiling code to RTL instructions, take look at the execution time. It dropped down to 0.10 ms from 0.46 ms. That’s a neat speed bump.

Here is comparation across some of the Ruby versions for some basic operations.

Ruby time comparison in different versions

Rails comparison on Ruby 2.5, Ruby 2.6 and Ruby 2.6 with JIT

Create a rails application with different Ruby versions and start a server. We can start rails server with JIT option as shown below.

RUBYOPT="--jit" bundle exec rails s

Now we can start testing performance on servers. What we found is that Ruby 2.6 is faster than Ruby 2.5, but enabling JIT in Ruby 2.6 is not adding more value to Rails application.

MJIT status and future directions

  • It’s in early development stage.
  • Doesn’t work on windows.
  • Needs more time to mature.
  • Needs more optimisations.
  • MJIT can use GCC or LLVM in the future C Compilers.

Further reading

  1. Ruby 3x3 Performance Goal
  2. The method JIT compiler for Ruby2.6
  3. Vladimir Makarov’s Ruby Edition

Resolve foreign key constraint conflict while copying data using topological sort

We have a client that uses multi-tenant database where each database holds data for each of their customers. Whenever a new customer is added, a service dynamically creates a new database. In order to seed this new database we were tasked to implement a feature to copy data from existing “demo” database.

The “demo” database is actually a live client where sales team does demo. This ensures that the data that is copied is fresh and not stale.

We implemented a solution where we simply listed all the tables in namespace and used activerecord-import to copy the table data. We used activerecord-import gem to keep code agnostic of underlying database as we used different databases in development from production. Production is “SQL Server” and development database is “PostgreSQL”. Why this project ended up having different database in development and in production is worthy of a separate blog.

When we started using the above mentioned strategy then we quickly ran into a problem. Inserts for some tables were failing.

insert or update on table "dependent_table" violates foreign key constraint "fk_rails"
Detail: Key (column)=(1) is not present in table "main_table".

The issue was we had foreign key constraints on some tables and “dependent” table was being processed before the “main” table.

So initially we thought of simply hard coding the sequence in which to process the tables. It means if any new table is added then we will have to update the service to include the newly added table. So we needed a way to identify the foreign key dependencies and determine the sequence to copy the tables at runtime. To resolve this issue, we thought of using Topological Sorting.

Topological Sorting

To get started we need the list of dependencies of “main” and “dependent” tables. In Postgresql, this sql query fetches the table dependencies.

SELECT
    tc.table_name AS dependent_table,
    ccu.table_name AS main_table
FROM
    information_schema.table_constraints AS tc
    JOIN information_schema.key_column_usage AS kcu
      ON tc.constraint_name = kcu.constraint_name
      AND tc.table_schema = kcu.table_schema
    JOIN information_schema.constraint_column_usage AS ccu
      ON ccu.constraint_name = tc.constraint_name
      AND ccu.table_schema = tc.table_schema
WHERE constraint_type = 'FOREIGN KEY'
and (tc.table_name like 'namespace_%' or ccu.table_name like 'namespace_%');

=> dependent_table  | main_table
-----------------------------------
   dependent_table1 | main_table1
   dependent_table2 | main_table2

The above query fetches all the dependencies for only the tables have namespace or the tables we are interested in. The output of above query was [[dependent_table1, main_table1], [dependent_table2, main_table2]].

Ruby has a TSort module that for implementing topological sorts. So we needed to run the topological sort on the dependencies. So we inserted the dependencies into a hash and included the TSort functionality into the hash. Following is the way to include the TSort module into hash by subclassing the Hash.

require "tsort"

class TsortableHash < Hash
  include TSort

  alias tsort_each_node each_key

  def tsort_each_child(node, &block)
    fetch(node).each(&block)
  end
end
# Borrowed from https://www.viget.com/articles/dependency-sorting-in-ruby-with-tsort/

Then we simply added all the tables to dependency hash, as below

tables_to_sort = ["dependent_table1", "dependent_table2", "main_table1"]
dependency_graph = tables_to_sort.inject(TsortableHash.new) {|hash, table| hash[table] = []; hash }

table_dependency_map = fetch_table_dependencies_from_database
=> [["dependent_table1", "main_table1"], ["dependent_table2", "main_table2"]]

# Add missing tables to dependency graph
table_dependency_map.flatten.each {|table| dependency_graph[table] ||= [] }

table_dependency_map.each {|constraint| dependency_graph[constraint[0]] << constraint[1] }

dependency_graph.tsort
=> ["main_table1", "dependent_table1", "main_table2", "dependent_table2"]

The output above, is the dependency resolved sequence of tables.

Topological sorting is pretty useful in situations where we need to resolve dependencies and Ruby provides a really helpful tool TSort to implement it without going into implementation details. Although I did spend time in understanding the underlying algorithm for fun.


How to cache all files using Cloudflare worker along with HMAC authentication

Cloudflare is a Content Delivery Network (CDN) company that provides various network and security services. In March 2018, they released “Cloudflare Workers” feature for public. Cloudflare Workers allow us to write JavaScript code and run them in Cloudflare edges. This is helpful when we want to pre-process requests before forwarding them to the origin. In this post, we will explain how we implemented HMAC authentication while caching all files in Cloudflare edges.

We have a bunch of files hosted in S3 which are served through CloudFront. To reduce the CloudFront bandwith cost and to make use of a global CDN (we use Price Class 100 in CloudFront), we decided to use Cloudflare for file downloads. This would help us cache files in Cloudflare edges and will eventually reduce the bandwidth costs at origin (CloudFront). But to do this, we had to solve a few problems.

We had been signing CloudFront download URLs to restrict their usage after a period of time. This means the file download URLs are always unique. Since Cloudflare caches files based on URLs, caching will not work when the URLs are signed. We had to remove the URL signing to get it working with Cloudflare, but we can’t allow people to continuously use the same download URL. Cloudflare Workers helped us with this.

We negotiated a deal with Cloudflare and upgraded the subscription to Enterprise plan. Enterprise plan helps us define a Custom Cache Key using which we can configure Cloudflare to cache based on user defined key. Enterprise plan also increased cache file size limits. We wrote following Worker code which configures a custom cache key and authenticates URLs using HMAC.

Cloudflare worker starts with attaching a method to "fetch" event.

addEventListener("fetch", event => {
  event.respondWith(verifyAndCache(event.request));
});

verifyAndCache function can be defined as follows.

async function verifyAndCache(request) {
  /**
  source:

  https://jameshfisher.com/2017/10/31/web-cryptography-api-hmac.html
  https://github.com/diafygi/webcrypto-amples#hmac-verify
  https://stackoverflow.com/questions/17191945/conversion-between-utf-8-arraybuffer-and-string
  **/

  // Convert the string to array of its ASCII values
  function str2ab(str) {
    let uintArray = new Uint8Array(
      str.split("").map(function(char) {
        return char.charCodeAt(0);
      })
    );
    return uintArray;
  }

  // Retrieve to token from query string which is in the format "<time>-<auth_code>"
  function getFullToken(url, query_string_key) {
    let full_token = url.split(query_string_key)[1];
    return full_token
  }

  // Fetch the authentication code from token
  function getAuthCode(full_token) {
    let token = full_token.split("-");
    return token[1].split("/")[0];
  }

  // Fetch timestamp from token
  function getExpiryTimestamp(full_token) {
    let timestamp = full_token.split("-");
    return timestamp[0];
  }

  // Fetch file path from URL
  function getFilePath(url) {
    let url_obj = new URL(url);
    return decodeURI(url_obj.pathname)
  }

  const full_token = getFullToken(request.url, "&verify=")
  const token      = getAuthCode(full_token);
  const str        = getFilePath(encodeURI(request.url)) + "/" + getExpiryTimestamp(full_token);
  const secret     = "< HMAC KEY >";

  // Generate the SHA-256 hash from the secret string
  let key = await crypto.subtle.importKey(
    "raw",
    str2ab(secret),
    { name: "HMAC", hash: { name: "SHA-256" } },
    false,
    ["sign", "verify"]
  );

  // Sign the "str" with the key generated previously
  let sig = await crypto.subtle.sign({ name: "HMAC" }, key, str2ab(str));

  // convert the Arraybuffer "sig" in string and then, in Base64 digest, and then URLencode it
  let verif = encodeURIComponent(
    btoa(String.fromCharCode.apply(null, new Uint8Array(sig)))
  );

  // Get time in Unix epoch
  let time = Math.floor(Date.now() / 1000);

  if (time > getExpiryTimestamp(full_token) || verif != token) {
   // Render error response
    const init = {
      status: 403
    };
    const modifiedResponse = new Response(
      `Invalid token`,
      init
    );
    return modifiedResponse;
  } else {
    let url = new URL(request.url);

    // Generate a cache key from URL excluding the unique query string
    let cache_key = url.host + url.pathname;

    let headers = new Headers(request.headers)

    /**
    Set an optional header/auth token for additional security in origin.
    For example, using AWS Web Application Firewall (WAF), it is possible to create a filter
    that allows requests only with a custom header to pass through CloudFront distribution.
    **/
    headers.set("X-Auth-token", "< Optional Auth Token >")

    /**
    Fetch the file using cache_key. File will be served from cache if it's already there,
    or it will send the request to origin. Please note 'cacheKey' is available only in
    Enterprise plan.
    **/

    const response = await fetch(request, { cf: { cacheKey: cache_key }, headers: headers })
    return response;
  }
}

Once the worker is added, configure an associated route in "Workers -> Routes -> Add Route" in Cloudflare.

Add Cloudflare Worker route %

Now, all requests will go through the configured Cloudflare worker. Each request will be verified using HMAC authentication and all files will be cached in Cloudflare edges. This would reduce bandwidth costs at the origin.


Replacing PhantomJS with headless Chrome

We recently replaced PhantomJS with ChromeDriver for system tests in a project since PhantomJS is no longer maintained. Many modern browser features required workarounds and hacks to work on PhantomJS. For example the Element.trigger('click') method does not actually click an element but simulates a DOM click event. These workarounds meant that code was not being tested as the code would behave in real production environment.

ChromeDriver Installation & Configuration

ChromeDriver is needed to use Chrome as the browser for system tests. It can be installed on macOS using homebrew.

brew cask install chromedriver

Remove poltergeist from Gemfile and add selenium-webdriver.

#Gemfile

- gem "poltergeist"
+ gem "selenium-webdriver"

Configure Capybara to use ChromeDriver by adding following snippet.

require 'selenium-webdriver'

Capybara.register_driver(:chrome_headless) do |app|
  args = []
  args << 'headless' unless ENV['CHROME_HEADLESS']

  capabilities = Selenium::WebDriver::Remote::Capabilities.chrome(
    chromeOptions: { args: args }
  )

  Capybara::Selenium::Driver.new(
    app,
    browser: :chrome,
    desired_capabilities: capabilities
  )
end

Capybara.default_driver = :chrome_headless

Above code would run tests in headless mode by default. For debugging purpose we would like to see the actual browser. That can be easily done by executing following command.

CHROME_HEADLESS=false bin/rails test:system

After switching from Phantom.js to “headless chrome”, we ran into many test failures due to the differences in implementation of Capybara API when using ChromeDriver. Here are solutions to some of the issues we faced.

1. Element.trigger(‘click’) does not exist

Element.trigger('click') simulates a DOM event to click instead of actually clicking the element. This is a bad practice because the element might be obscured behind another element and still trigger the click. Selenium does not support this method, Element.click works as the solution but it is not a replacement. We can replace Element.trigger('click') with Element.send_keys(:return) or by executing javascript to trigger click event.

#example

find('.foo-link').trigger('click')

# solutions

find('.foo-link').click

# or

find('.foo-link').send_keys(:return)

# or
# if the link is not visible or is overlapped by another element

execute_script("$('.foo-link').click();")

2. Element is not visible to click

When we switched to Element.click, some tests were failing because the element was not visible as it was behind another element. The easiest solution to fix these failing test was using Element.send_keys(:return) but purpose of the test is to simulate a real user clicking the element. So we had to make sure the element is visible. We fixed the UI issues which prevented the element from being visible.

3. Setting value of hidden fields do not work

When we try to set the value of a hidden input field using the set method of an element, Capybara throws a element not interactable error.

#example
find(".foo-field", visible: false).set("some text")
#Error: element not interactable

#solution
page.execute_script('$(".foo-field").val("some text")')

4. Element.visible? returns false if the element is empty

ignore_hidden_elements option of Capybara is false by default. If ignore_hidden_elements is true, Capybara will find elements which are only visible on the page. Let’s say we have <div class="empty-element"></div> on our page. find(".empty-element").visible? returns false because selenium considers empty elements as invisible. This issue can be resolved by using visible: :any.

#example

#ignore hidden elements
Capybara.ignore_hidden_elements = true

find(".empty-element").visible?
# returns false

#solution
find('.empty-element', visible: :any)

#or

find('.empty-element', visible: :all)

#or

find('.empty-element', visible: false)

Rails 6 adds ActiveRecord::Relation#pick

Before Rails 6, selecting only the first value for a column from a set of records was cumbersome. Let’s say we want only the first name from all the posts with category “Rails 6”.

>> Post.where(category: "Rails 6").limit(1).pluck(:name).first
   SELECT "posts"."name"
   FROM "posts"
   WHERE "posts"."category" = ?
   LIMIT ?  [["category", "Rails 6"], ["LIMIT", 1]]
=> "Rails 6 introduces awesome shiny features!"

In Rails 6, the new ActiveRecord::Relation#pick method has been added which provides a shortcut to select the first value.

>> Post.where(category: "Rails 6").pick(:name)
   SELECT "posts"."name"
   FROM "posts"
   WHERE "posts"."category" = ?
   LIMIT ?  [["category", "Rails 6"], ["LIMIT", 1]]
=> "Rails 6 introduces awesome shiny features!"

This method internally applies limit(1) on the relation before picking up the first value. So it is useful when the relation is already reduced to a single row.

It can also select values for multiple columns.

>> Post.where(category: "Rails 6").pick(:name, :author)
   SELECT "posts"."name", "posts"."author"
   FROM "posts"
   WHERE "posts"."category" = ?
   LIMIT ?  [["category", "Rails 6"], ["LIMIT", 1]]
=> ["Rails 6.0 new features", "prathamesh"]

Target Tracking Policy for Auto Scaling

In July 2017, AWS introduced Target Tracking Policy for Auto Scaling in EC2. It helps to autoscale based on the metrics like Average CPU Utilization, Load balancer request per target, and so on. Simply stated it scales up and down the resources to keep the metric at a fixed value. For example, if the configured metric is Average CPU Utilization and the value is 60%, the Target Tracking Policy will launch more instances if the Average CPU Utilization goes beyond 60%. It will automatically scale down when the usage decreases. Target Tracking Policy works using a set of CloudWatch alarms which are automatically set when the policy is configured.

It can be configured in EC2 -> Auto Scaling Groups -> Scaling Policies.

EC2 Target Tracking Policy

We can also configure a warm-up period so that it would wait before it launches more instances to keep the metric at the configured value.

Internally, we use terraform to manage AWS resources. We can configure Target Tracking Policy using terraform as follows.

resource "aws_launch_configuration" "web_cluster" {
  name_prefix     = "staging-web-cluster"
  image_id        = "<image ID>"
  instance_type   = "<instance type>"
  key_name        = "<ssh key name>"
  security_groups = ["<security group>"]
  user_data       = "<user_data script>"

  root_block_device {
    volume_size = "<volume size>"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "web_cluster" {
  name                      = "staging-web-cluster-asg"
  min_size                  = "<min ASG size>"
  max_size                  = "<max ASG size>"
  default_cooldown          = "300"
  launch_configuration      = "${ aws_launch_configuration.web_cluster.name }"
  vpc_zone_identifier       = ["<subnet ID>"]
  health_check_type         = "EC2"
  health_check_grace_period = 300

  target_group_arns = ["<target group arn>"]
}

resource "aws_autoscaling_policy" "web_cluster_target_tracking_policy" {
  name                      = "staging-web-cluster-target-tracking-policy"
  policy_type               = "TargetTrackingScaling"
  autoscaling_group_name    = "${aws_autoscaling_group.web_cluster.name}"
  estimated_instance_warmup = 200

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }

    target_value = "60"
  }
}

Target Tracking Policy allows us to easily configure and manage autoscaling in EC2. It’s particularly helpful while running services like web servers.


Migrating Gumroad from RequireJS to webpack

BigBinary has been working with Gumroad for a while. Following blog post has been posted with permission from Gumroad and we are very grateful to Sahil for allowing us to discuss the work in such an open environment.

This application is a JavaScript-heavy application as most consumer-oriented applications are these days. We recently changed the JavaScript build system for Gumroad from RequireJS to webpack. We’d like to talk about how we went about doing this.

Gumroad’s web application is built using Ruby on Rails. The project was started way back in 2011 as this hacker news post suggests. When we began working on the code it was building JavaScript assets through two systems Sprockets and RequireJS. From what we could tell, all the code which was using a new(at the time) frontend framework was processed by RequireJS first and then sprockets whereas the JavaScript files which are usually present under app/javascrips/assets and vendor/assets/javascripts in a typical Rails application were present as well but they were not being processed by RequireJS. Also, there were some libraries which were sourced using Bower.

We were tasked with the work of migrating the RequireJS build system over to webpack and replacing Bower with NPM. The reason behind this was that we wanted to use newer tools with wider community support. Another reason was that we wanted to be able to take advantage of all the goodies that webpack comes with though that was not a strong motivation at that point.

We decided to break down the task into small pieces which could be worked on in iterations and, more importantly, could be shipped in iterations. This would enable us to work on other tasks in the application in parallel and not be blocked on a big chunk of work. Keeping that in mind we split the task in three different steps.

Step 1: Migrate from RequireJS to webpack with the minimal amount of changes in the actual code.

Step 2: Use NPM packages in place of Bower components.

Step 3: Use NPM packages in place of libraries present under vendor/assets/javascripts.

Step 1: Migrate from RequireJS to webpack with the minimal amount of changes in the actual code

The first thing we did here was create a new webpack.config.js configuration file which would be used by webpack. We did our best to accurately translate the configuration from the RequireJS configuration file using multiple resources available online.

Here is how most JavaScript files which were to be processed by RequireJS looked like.

'use strict';

define([
      'braintree'
    , '$app/ui/product/edit'
    , '$app/data/product'
  ],

  function(Braintree, ProductEditUI, ProductData) {
    // Do something with Braintree, ProductEditUI, and ProductData
  }
);

As you can see, the code did not use the newer import statements which you’d see in comparatively newer JavaScript code. As we’ve mentioned earlier, our goal was to have minimal code changes so we did not want to change to import just yet. Luckily for us, webpack supports the define API for specifying dependencies. This meant that we would not need to change how dependencies were specified in any of the JavaScript files.

In this step we also changed the build system configuration (The webpack.config.js file in this case) to use NPM packages where possible instead of using libraries from the vendor/ directory. This meant that we would need to have aliases in place for instances where the package name was different from the names we had aliased the libraries to.

For example, this is how the ‘braintree’ alias was set earlier in order to refer to the Braintree SDK. Now all the code had to do was to mention that braintree was a dependency.

require.config({
  paths: {
    braintree: '/vendor/assets/javascripts/braintree-2.16.0'
  }
});

With the change to use the NPM package in place of the JavaScript file the dependency sourcing did not work as expected because the NPM package name was ‘braintree-web’ and the source code was trying to load ‘braintree’ which was not known to the build system(webpack). In order to avoid making changes to source code we used the “alias” feature provided by webpack as shown below.

module.exports = {
  resolve: {
    alias: {
      braintree: 'braintree-web',
    }
  }
};

We did this for all the dependencies which had been given an alias in the RequireJS configuration and we got dependency resolution to work as expected.

As a part of this step, we also created a new common chunk and used it to improve caching. You can read more about this feature here. Note that we would tweak this iteratively later but we thought it would be good to get started with the basic configuration right away.

Step 2: Use NPM packages in place of Bower components

Another goal of the migration was to remove Bower so as to make the build system simpler. The first reason behind this was that all Bower packages which we were using were available as NPM packages. The second reason was that Bower itself is recommending users to migrate to Yarn/webpack for a while now.

What we did here was simple. We removed Bower and the Bower configuration file. Then, we sourced the required Bower components as NPM packages instead by adding them to package.json. We also removed the aliases added to source them from the webpack configuration.

For example, here’s the change required to the configuration file after sourcing clipboard as an NPM package instead of a Bower component.

resolve: {
  alias: {
    // Other Code

    $app:           path.resolve(__dirname, '../../app/javascript'),
    $lib:           path.resolve(__dirname, '../../lib/assets/javascripts')
-   clipboard:      path.resolve(__dirname, '../../vendor/assets/javascripts/clipboard.min.js')
  }
}

Step 3: Use NPM packages in place of libraries present under vendor/assets/javascripts

We had a lot of javascript libraries present under vendor/assets/javascripts which were sourced in the required javascript files. We deleted those files from the project and sourced them as NPM packages instead. This way we could have better visibility and control over the versions of these packages.

As part of this migration we also did some asset-related cleanups. These included removing unused JavaScript files, including JavaScript files only where required instead of sourcing them into the global scope, etc.

We were continuously measuring the performance of the application before and after applying changes to make sure that we were not worsening the performance during the migration. In the end, we found that we had improved the page load speeds by an average of 2%. Note that this task was not undertaken to improve the performance of the application. We are now planning to leverage webpack features and try to improve on this metric further.


Rails 5 Active Record attributes API

This blog is part of our Rails 5 series.

Rails 5 was a major release with a lot of new features like Action Cable, API Applications, etc. Active Record attribute API was also one of the features of Rails 5 release which did not receive much attention.

Active Record attributes API is used by Rails internally for a long time. In Rails 5 release, attributes API was made public and allowed support for custom types.

What is attribute API?

Attribute API converts the attribute value to an appropriate Ruby type. Here is how the syntax looks like.

attribute(name, cast_type, options)

The first argument is the name of the attribute and the second argument is the cast type. Cast type can be string, integer or custom type object.

# db/schema.rb

create_table :movie_tickets, force: true do |t|
  t.float :price
end

# without attribute API

class MovieTicket < ActiveRecord::Base
end

movie_ticket = MovieTicket.new(price: 145.40)
movie_ticket.save!

movie_ticket.price   # => Float(145.40)

# with attribute API

class MovieTicket < ActiveRecord::Base
  attribute :price, :integer
end

movie_ticket.price   # => 145

Before using attribute API, movie ticket price was a float value, but after applying attribute on price, the price value was typecast as integer.

The database still stores the price as float and this conversion happens only in Ruby land.

Now, we will typecast movie release_date from datetime to date type.

# db/schema.rb

create_table :movies, force: true do |t|
  t.datetime :release_date
end

class Movie < ActiveRecord::Base
  attribute :release_date, :date
end

movie.release_date # => Thu, 01 Mar 2018

We can also add default value for an attribute.

# db/schema.rb

create_table :movies, force: true do |t|
  t.string :license_number, :string
end

class Movie < ActiveRecord::Base
  attribute :license_number,
            :string,
            default: "IN00#{Date.current.strftime('%Y%m%d')}00#{rand(100)}"
end

# without attribute API with default value on license number

Movie.new.license_number  # => nil

# with attribute API with default value on license number

Movie.new.license_number  # => "IN00201805250068"

Custom Types

Let’s say we want the people to rate a movie in percentage. Traditionally, we would do something like this.

class MovieRating < ActiveRecord::Base

  TOTAL_STARS = 5

  before_save :convert_percent_rating_to_stars

  def convert_percent_rating_to_stars
    rating_in_percentage = value.gsub(/\%/, '').to_f

    self.rating = (rating_in_percentage * TOTAL_STARS) / 100
  end
end

With attributes API we can create a custom type which will be responsible to cast to percentage rating to number of stars.

We have to define the cast method in the custom type class which casts the given value to the expected output.

# db/schema.rb

create_table :movie_ratings, force: true do |t|
  t.integer :rating
end

# app/types/star_rating_type.rb

class StarRatingType < ActiveRecord::Type::Integer
  TOTAL_STARS = 5

  def cast(value)
    if value.present? && !value.kind_of?(Integer)
      rating_in_percentage = value.gsub(/\%/, '').to_i

      star_rating = (rating_in_percentage * TOTAL_STARS) / 100
      super(star_rating)
    else
      super
    end
  end
end

# config/initializers/types.rb

ActiveRecord::Type.register(:star_rating, StarRatingType)

# app/models/movie.rb

class MovieRating < ActiveRecord::Base
  attribute :rating, :star_rating
end

Querying

The attributes API also supports where clause. Query will be converted to SQL by calling serialize method on the type object.

class StarRatingType < ActiveRecord::Type::Integer
  TOTAL_STARS = 5

  def serialize(value)
    if value.present? && !value.kind_of?(Integer)
      rating_in_percentage = value.gsub(/\%/, '').to_i

      star_rating = (rating_in_percentage * TOTAL_STARS) / 100
      super(star_rating)
    else
      super
    end
  end
end


# Add new movie rating with rating as 25.6%.
# So the movie rating in star will be 1 of 5 stars.
movie_rating = MovieRating.new(rating: "25.6%")
movie_rating.save!

movie_rating.rating   # => 1

# Querying with rating in percentage 25.6%
MovieRating.where(rating: "25.6%")

# => #<ActiveRecord::Relation [#<MovieRating id: 1000, rating: 1 ... >]>

Passing current_user by default in Sidekiq

In one of our projects we need to capture user activity throughout the application. For example when a user updates projected distance of a delivery, the aplication should create an activity for that action. To create an activity we need the currently logged in user id since we need to associate the activity with that user.

We are using devise gem for authentication which provides current_user method by default to controllers. Any business logic residing at controller level can use current_user to associate the activity with the logged in user. However, some business logics reside in Sidekiq where current_user is not available.

Passing current_user to Sidekiq job

One way to solve this issue is to pass the current_user directly to the Sidekiq job. Here’s how we can do it.

  class DeliveryController < ApplicationController
    def update
      # update attributes
      DeliveryUpdateWorker.
        perform_async(params[:delivery], current_user.login)
      # render delivery
    end
  end
  class DeliveryUpdateWorker
    include Sidekiq::Worker

    def perform(delivery, user_login)
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end

That works. Now let’s say we add another endpoint in which we need to track when delivery is deleted. Here’s the updated code.

  class DeliveryController < ApplicationController
    def update
      # update attributes
      DeliveryUpdateWorker.
        perform_async(params[:delivery], current_user.login)
      # render delivery
    end

    def destroy
      # delete attributes
      DeliveryDeleteWorker.
        perform_async(params[:delivery], current_user.login)
      # render :ok
    end
  end
  class DeliveryDeleteWorker
    include Sidekiq::Worker

    def perform(delivery, user_login)
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end

Again we needed to pass current_user login in the new endpoint. You can notice a pattern here. For each endpoint which needs to track activity we need to pass current_user. What if we could pass current_user info by default.

The main reason we want to pass current_user by default is because we’re tracking model attribute changes in the model’s before_save callbacks.

For this we store current_user info in Thread.current and access it in before_save callbacks of the model which generated relevant activity.

This will work fine for model attribute changes made in controllers and services where Thread.current is accessible and persisted. However, for Sidekiq jobs which changes the model attributes whose activity is generated, we need to pass the current_user manually since Thread.current is not available in Sidekiq jobs.

Again we can argue here that we don’t need to pass the current_user by default. Instead we can pass it in each Sidekiq job as an argument. This will work in simple cases, although for more complex cases this will require extra effort.

For eg. let’s say we’re tracking delivery’s cost. We’ve three sidekiq jobs, DeliveryDestinationChangeWorker, DeliveryRouteChangeWorker and DeliveryCostChangeWorker. We call DeliveryDestinationChangeWorker which changes the destination of a delivery. This calls DeliveryRouteChangeWorker which calculates the new route and calls DeliveryCostChangeWorker. Now DeliveryCostChangeWorker changes the delivery cost where the before_save callback is called.

In this example you can see that we need to pass current_user through all three Sidekiq workers and initialize Thread.current in DeliveryCostChangeWorker. The nesting can go much deeper.

Passing current_user by default will make sure if the activity is being generated in a model’s before_save callback then it can access the current_user info from Thread.current no matter how much nested the Sidekiq call chain is.

Also it makes sure that if a developer adds another Sidekiq worker class in the future which changes a model whose attribute change needs to be tracked. Then the developer need not remember to pass current_user explicitly to the Sidekiq worker.

Note the presented problem in this blog is an oversimplified version in order to better present the solution.

Creating a wrapper module to include current_user by default

The most basic solution to pass current_user by default is to create a wrapper module. This module will be responsible for adding the current_user when perform_async is invoked. Here’s an example.

  module SidekiqMediator
    def perform_async(klass, *args)
      args.push(current_user.login)
      klass.send(:perform_async, *args)
    end
  end
  class DeliveryController < ApplicationController
    include SidekiqMediator

    def update
      # update attributes
      perform_async(DeliveryUpdateWorker, params[:delivery])
      # render delivery
    end

    def destroy
      # delete attributes
      perform_async(DeliveryDeleteWorker, params[:delivery])
      # render :ok
    end
  end
  class DeliveryDeleteWorker
    include Sidekiq::Worker

    def perform(delivery, user_login)
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end

Now we don’t need to pass current_user login in each call. However we still need to remember including SidekiqMediator whenever we need to use current_user in the Sidekiq job for activity generation. Another way to solve this problem is to intercept the Sidekiq job before it is pushed to redis. Then we can include current_user login by default.

Using Sidekiq client middleware to pass current_user by default

Sidekiq provides a client middleware to run custom logic before pushing the job in redis. We can use the client middleware to push current_user as default argument in the Sidekiq arguments. Here’s an example of Sidekiq client middleware.

  class SidekiqClientMiddleware
    def call(worker_class, job, queue, redis_pool = nil)
      # Do something before pushing the job in redis
      yield
    end
  end

We need a way to introduce current_user in the Sidekiq arguments. The job payload contains the arguments passed to the Sidekiq worker. Here’s what the job payload looks like.

  {
    "class": "DeliveryDeleteWorker",
    "jid": "b4a577edbccf1d805744efa9",
    "args": [1, "arg", true],
    "created_at": 1234567890,
    "enqueued_at": 1234567890
  }

Notice here the args key which is an array containing the arguments passed to the Sidekiq worker. We can push the current_user in the args array. This way each Sidekiq job will have current_user by default as the last argument. Here’s the modified version of the client middleware which includes current_user by default.

  class SidekiqClientMiddleware
    def call(_worker_class, job, _queue, _redis_pool = nil)
      # Push current user login as the last argument by default
      job['args'].push(current_user.login)
      yield
    end
  end

Now we don’t need to pass current_user login to Sidekiq workers in the controller. Here’s how our controller logic looks like now.

  class DeliveryController < ApplicationController
    def update
      # update attributes
      DeliveryUpdateWorker.perform_async(params[:data])
      # render delivery
    end

    def destroy
      # delete attributes
      DeliveryDeleteWorker.perform_async(params[:data])
      # render :ok
    end
  end

We don’t need SidekiqMediator anymore. The current_user will automatically be included as the last argument in every Sidekiq job.

Although there’s one issue here. We included current_user by default to every Sidekiq worker. This means workers which does not expect current_user as an argument will also have current_user as their last argument. This will raise ArgumentError: wrong number of arguments (2 for 1). Here’s an example.

  class DeliveryCreateWorker
    include Sidekiq::Worker

    def perform(data)
      # doesn't use current_user login to track activity when called
      # this will get data, current_user_login as the arguments
    end
  end

To solve this we need to extract current_user argument from job['args'] before the worker starts processing.

Using Sidekiq server middleware to extract current_user login

Sidekiq also provides server middleware which runs before processing any Sidekiq job. We used this to extract current_user from job['args'] and saved it in a global state.

This global state should persist when the server middleware execution is complete and the actual Sidekiq job processing has started. Here’s the server middleware.

  class SidekiqServerMiddleware
    def call(_worker, job, _queue)
      set_request_user(job['args'].pop)
      yield
    end

    private
    def set_request_user(request_user_login)
      RequestStore.store[:request_user_login] = request_user_login
    end
  end

Notice here we used pop to extract the last argument. Since we’re setting the last argument to current_user in the client middleware, the last argument will always be the current_user in server middleware.

Using pop also removes current_user from job['args'] which ensures the worker does not get current_user as an extra argument.

We used request_store to persist a global state. RequestStore provides a per request global storage using Thread.current which stores info as a key value pair. Here’s how we used it in Sidekiq workers to access the current_user info.

  class DeliveryDeleteWorker
    include Sidekiq::Worker

    def perform(delivery)
      user_login = RequestStore.store[:request_user_login]
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end

Now we don’t need to pass current_user in the controller when calling the Sidekiq worker. Also we don’t need to add user_login as an extra argument in each Sidekiq worker every time we need to access current_user.

Configure server middleware for Sidekiq test cases

By default Sidekiq does not run server middleware in inline and fake mode.

Because of this current_user was being added in the client middleware but it’s not being extracted in the server middleware since it’s never called.

This resulted in ArgumentError: wrong number of arguments (2 for 1) failures in our test cases which used Sidekiq in inline or fake mode. We solved this by adding following config:

  Sidekiq::Testing.server_middleware do |chain|
    chain.add SidekiqServerMiddleware
  end

This ensures that SidekiqServerMiddleware is called in inline and fake mode in our test cases.

However, we found an alternative to this which was much simpler and cleaner. We noticed that job payload is a simple hash which is pushed to redis as it is and is available in the server middleware as well.

Instead of adding the current_user as an argument in job['args'] we could add another key in job payload itself which will hold the current_user. Here’s the modified logic.

  class SidekiqClientMiddleware
    def call(_worker_class, job, _queue, _redis_pool = nil)
      # Set current user login in job payload
      job['request_user_login'] = current_user.login if defined?(current_user)
      yield
    end
  end
  class SidekiqServerMiddleware
    def call(_worker, job, _queue)
      if job.key?('request_user_login')
        set_request_user(job['request_user_login'])
      end
      yield
    end

    private
    def set_request_user(request_user_login)
      RequestStore.store[:request_user_login] = request_user_login
    end
  end

We used a unique key request_user_login which would not conflict with the other keys in the job payload. Additionally we added a check if request_user_login key is present in the job payload. This is necessary because if the user calls the worker from console then it’ll not have current_user set.

Apart from this we noticed that we had multiple api services talking to each other. These services also generated user activity. Few of them didn’t use Devise for authentication, instead the requesting user info was passed to them in each request as param.

For these services we set the request user info in RequestStore.store in our BaseApiController and changed the client middleware to use RequestStore.store instead of current_user method.

We also initialized RequestStore.store in services where we used Devise to make it completely independent of current_user. Here’s how our client middleware looks now.

  class SidekiqClientMiddleware
    def call(_worker_class, job, _queue, _redis_pool = nil)
      # Set current user login in job payload
      if RequestStore.store[:request_user_login]
        job['request_user_login'] = RequestStore.store[:request_user_login]
      end
      yield
    end
  end

Lastly we needed to register the client and server middleware in Sidekiq.

Configuring Sidekiq middleware

To enable the middleware with Sidekiq, we need to register the client middleware and the server middleware in config/initializers/sidekiq.rb. Here’s how we did it.

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add SidekiqClientMiddleware
  end
end

Sidekiq.configure_server do |config|
  config.client_middleware do |chain|
    chain.add SidekiqClientMiddleware
  end
  config.server_middleware do |chain|
    chain.add SidekiqServerMiddleware
  end
end

Notice that we added SidekiqClientMiddleware in both configure_server block and configure_client block, this is because a Sidekiq job can call another Sidekiq job in which case the Sidekiq server itself will act as the client.

To sum it up, here’s how our client middleware and server middleware finally looked like.

  class SidekiqClientMiddleware
    def call(_worker_class, job, _queue, _redis_pool = nil)
      # Set current user login in job payload
      if RequestStore.store[:request_user_login]
        job['request_user_login'] = RequestStore.store[:request_user_login]
      end
      yield
    end
  end
  class SidekiqServerMiddleware
    def call(_worker, job, _queue)
      if job.key?('request_user_login')
        set_request_user(job['request_user_login'])
      end
      yield
    end

    private
    def set_request_user(request_user_login)
      RequestStore.store[:request_user_login] = request_user_login
    end
  end

The controller example we mentioned initially looks like:

  class DeliveryController < ApplicationController
    def update
      # update attributes
      DeliveryUpdateWorker.perform_async(params[:delivery])
      # render delivery
    end

    def destroy
      # delete attributes
      DeliveryDeleteWorker.perform_async(params[:delivery])
      # render :ok
    end
  end
  class DeliveryDeleteWorker
    include Sidekiq::Worker

    def perform(delivery)
      user_login = RequestStore.store[:request_user_login]
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end
  class DeliveryUpdateWorker
    include Sidekiq::Worker

    def perform(delivery)
      user_login = RequestStore.store[:request_user_login]
      user = User.find_by(login: user_login)
      ActivityGenerationService.new(delivery, user) if user
    end
  end

Now we don’t need to explicitly pass current_user to each Sidekiq job. It’s available out of the box without any changes in all Sidekiq jobs.

As an alternative we can also use ActiveSupport::CurrentAttributes.

Discuss it on Reddit


Optimize loading multiple routes on Google map using B-spline

Applications use Google maps for showing routes from point A to B. For one of our clients we needed to show delivery routes on Google maps so that user can select multiple deliveries and then consolidate them as one single delivery. This meant we needed to show around 30 to 500 deliveries on a single map.

Using Google Map polylines

We used polylines to draw individual routes on Google maps.

Polyline is composed of line segments connecting a list of points on the map. The more points we use for drawing a polyline the more detailed the final curve will be. Here’s how we added route points to the map.

// List of latitude and longitude
let path = points.map((point) => [point.lat, point.lng]);

let route_options = {
  path: path,
  strokeColor: color,
  strokeOpacity: 1.0,
  strokeWeight: mapAttributes.strokeWeight || 3,
  map: map, // google.maps.Map
};

new google.maps.Polyline(route_options);

Here’s an example of a polyline on a Google map. We used 422 latitude and longitude points to draw these routes which makes it look more contiguous.

Polyline example

We needed to show 200 deliveries in that map. On an average a delivery contains around 500 route points. This means we need to load 1,00,000 route points. Let’s measure and see how much time the whole process takes.

Loading multiple routes on a map

Plotting a single route on a map can be done in less than a second. However as we increase the number of routes to plot, the payload size increases which affects the load time. This is because we’ve around 500 route points per delivery. If we want to show 500 deliveries on the map then we need to load 500 * 500 = 2,50,000 routes points. Let’s benchmark the load time it takes to show deliveries on a map.

No. of deliveries Load Time Payload Size
500 8.77s 12.3MB
400 7.76s 10.4MB
300 6.68s 7.9MB
200 5.88s 5.3MB
100 5.47s 3.5MB

The load time is more than 5 seconds which is high. What if we could decrease the payload size and still be able to plot the routes.

Sampling route points for decreased payload size

For each delivery we’ve around 500 route points. If we drop a few route points in between on a regular interval then we’ll be able to decrease the payload size. Latitude and longitude have atleast 5 decimal points. We rounded them off to 1 decimal point and then we picked unique values.

  def route_lat_lng_points
    return '' unless delivery.route_lat_lng_points

    delivery.route_lat_lng_points.
        chunk{ |point| [point.first.round(1), point.second.round(1)] }.
        map(&:first).join(',')
  end

Now let’s check the payload size and the load time.

No. of deliveries Load Time Payload Size
500 6.52s 6.7MB
400 5.97s 5.5MB
300 5.68s 4.2MB
200 4.88s 2.9MB
100 4.07s 2.0MB

The payload size decreased by 50 percent. However since we sampled the data the routes are not contiguous anymore. Here’s how it looks now.

Sampled routes Contiguous routes

Note that we sampled route points till single decimal point. Notice that the routes in which we did sampling appears to be jagged instead of contiguous. We can solve this by using a curve fitting method to create a curve from the discrete points we have.

Curve fitting using B-spline function

B-spline or basis spline is a spline function which can be used for creating smooth curves best fitted to a set of control points. Here’s an example of a B-spline curve created from a set of control points.

Bspline example

We changed our previous example to use B-spline function to generate latitude and longitude points.

// List of latitude and longitude
let lats = points.map((point) => point.lat);
let lngs = points.map((point) => point.lng);
let path = bspline(lats, lngs);

let route_options = {
  path: path,
  strokeColor: color,
  strokeOpacity: 1.0,
  strokeWeight: mapAttributes.strokeWeight || 3,
  map: map, // instance of google.maps.Map
};

new google.maps.Polyline(route_options);
 function bspline(lats, lngs) {
    let i, t, ax, ay, bx, by, cx, cy, dx, dy, lat, lng, points;
    points = [];

    for (i = 2; i < lats.length - 2; i++) {
      for (t = 0; t < 1; t += 0.2) {
        ax = (-lats[i - 2] + 3 * lats[i - 1] - 3 * lats[i] + lats[i + 1]) / 6;
        ay = (-lngs[i - 2] + 3 * lngs[i - 1] - 3 * lngs[i] + lngs[i + 1]) / 6;

        bx = (lats[i - 2] - 2 * lats[i - 1] + lats[i]) / 2;
        by = (lngs[i - 2] - 2 * lngs[i - 1] + lngs[i]) / 2;

        cx = (-lats[i - 2] + lats[i]) / 2;
        cy = (-lngs[i - 2] + lngs[i]) / 2;

        dx = (lats[i - 2] + 4 * lats[i - 1] + lats[i]) / 6;
        dy = (lngs[i - 2] + 4 * lngs[i - 1] + lngs[i]) / 6;

        lat = (ax * Math.pow(t + 0.1, 3)) +
              (bx * Math.pow(t + 0.1, 2)) +
              (cx * (t + 0.1)) + dx;

        lng = (ay * Math.pow(t + 0.1, 3)) +
              (by * Math.pow(t + 0.1, 2)) +
              (cy * (t + 0.1)) + dy;

        points.push(new google.maps.LatLng(lat, lng));
      }
    }
    return points;
  }

Source: https://johan.karlsteen.com/2011/07/30/improving-google-maps-polygons-with-b-splines

After the change the plotted routes are much better. Here’s how it looks now.

Bspline routes Contiguous routes

The only downside here is that if we zoom in the map we’ll notice that the routes are not exactly overlapping the Google map paths. Otherwise we’re able to plot almost same routes with sampled route points. However we still need 6.5 seconds to load 500 deliveries. How do we fix that ?

Loading deliveries in batches

Sometimes users have upto 500 deliveries but they want to change only a few deliveries and then use the application. Right now the way application is setup users have no choice but to wait until all 500 deliveries are loaded and then only they would be able to change a few deliveries. This is not ideal.

We want to show deliveries as soon as they’re loaded. We added a polling mechanism that would load batches of 20 deliveries and as soon as a batch is loaded we would plot them on the map. This way user could interact with the loaded deliveries while the remaining deliveries are still being loaded.

  loadDeliveriesWindow(updatedState = {}, lastPage = 0, currentWindow = 1) {
    // windowSize: Size of the batch to be loaded
    // perPage: No of deliveries per page
    const { perPage, windowSize } = this.state;

    if (currentWindow > perPage / windowSize) {
      // Streaming deliveries ended
      this.setState($.extend(updatedState, { windowStreaming: false }));
      return;
    }

    if (currentWindow === 1) {
      // Streaming deliveries started
      this.setState({ windowStreaming: true });
    }

    // Gets delivery data from backend
    this.fetchDeliveries(currentWindow + (lastPage * windowSize), queryParams).complete(() => {
      // Plots deliveries on map
      this.loadDeliveries();

      // Load the next batch of deliveries
      setTimeout((() => {
        this.loadDeliveriesWindow(updatedState, lastPage, currentWindow + 1);
      }).bind(this, currentWindow, updatedState, lastPage), 100);
    });
  }

Here’s a comparison of how the user experience changed.

Streaming deliveries Normal loading

Notice that loaded deliveries are instantly plotted and user can start interacting with them. While if we load all the deliveries before plotting them the user has to wait for all of them to be loaded. This made the user experience much better if the user loaded more than 100 deliveries.

Serializing route points list without brackets

One more optimization we did was to change how route points are being serialized.

The route points after serialization contained opening and closing square brackets. So let’s say the route points are

[[25.57167, -80.421271], [25.676544, -80.388611], [25.820025, -80.386488],...].

After serialization they looked like

[[25.57167,-80.421271], [25.676544,-80.388611], [25.820025,-80.386488],...].

For each route point we’ve an extra opening and closing square bracket which can be avoided.

We could get rid of the brackets by concatenating the route points array and converting it to a string. After conversion it looked like this.

"25.57167,-80.421271|25.676544,-80.388611|25.820025,-80.386488|..."

At the client side we converted it back to an array. This reduced the payload size by 0.2MB for dense routes.

Note this is a trade off between client side processing and network bandwidth. In modern computers the client side processing will be negligible. For our clients, network bandwidth was a crucial resource so we optimized it for network bandwidth.


Deploying feature branches to have a review app

BigBinary has been working with Gumroad for a while. Following blog post has been posted with permission from Gumroad and we are very grateful to Sahil for allowing us to discuss the work in such an open environment.

Staging environment helps us in testing the code before pushing the code to production. However it becomes hard to manage the staging environment when more people work on different parts of the application. This can be solved by implementing a system where feature branch can have its own individual staging environment.

Heroku has Review Apps feature which can deploy different branches separately. Gumroad, doesn’t use Heroku so we built a custom in-house solution.

The first step was to build the infrastructure. We created a new Auto Scaling Group, Application Load Balancer and route in AWS for the review apps. Load balancer and route are common for all review apps, but a new EC2 instance is created in the ASG when a new review app is commissioned.

review app architecture

The main challange was to forward the incoming requests to the correct server running the review app. This was made possible using Lua in nginx and consul. When a review app is deployed, it writes its IP and port to consul along with the hostname. Each review app server runs an instance of OpenResty (Nginx + Lua modules) with the following configuration.

server {
  listen                   80;
  server_name              _;
  server_name_in_redirect  off;
  port_in_redirect         off;

  try_files $uri/index.html $uri $uri.html @app;

  location @app {
    set $upstream "";
    rewrite_by_lua '
      http   = require "socket.http"
      json   = require "json"
      base64 = require "base64"

      -- read upstream from consul
      host          = ngx.var.http_host
      body, c, l, h = http.request("http://172.17.0.1:8500/v1/kv/" .. host)
      data          = json.decode(body)
      upstream      = base64.decode(data[1].Value)

      ngx.var.upstream = upstream
    ';

    proxy_buffering   off;
    proxy_set_header  Host $host;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_redirect    off;
    proxy_pass        http://$upstream;
  }
}

It forwards all incoming requests to the correct IP:PORT after looking up in consul with the hostname.

The next task was to build a system to deploy the review apps to this infrastructure. We were already using docker in both production and staging environments. We decided to extend it to deploy branches by building docker image for every branch with deploy- prefix in the branch name. When such a branch is pushed to GitHub, a CircleCI job is run to build a docker image with the code and all the necessary packages. This can be configured using a configuration template like this.

jobs:
  build_image:
    <<: *defaults
    parallelism: 2
    steps:
      - checkout
      - setup_remote_docker:
          version: 17.09.0-ce
      - run:
          command: |
            ci_scripts/2.0/build_docker_image.sh
          no_output_timeout: 20m

workflows:
  version: 2

  web_app:
    jobs:
      - build_image:
          filters:
            branches:
              only:
                - /deploy-.*/

It also pushes static assets like JavaScript, CSS and images to an S3 bucket from where they are served directly through CDN. After building the docker image, another CircleCI job is run to do the following tasks.

  • Create a new database in RDS and configure the required credentials.
  • Scale up Review App’s Auto Scaling Group by increasing the number of desired instances by 1.
  • Run redis, database migration, seed-data population, unicorn and resque instances using nomad.

The ease of deploying a review app helped increase our productivity.


Skip devise trackable module for API calls to avoid users table getting locked

We use devise gem for authentication in one of our applications. This application provides an API which uses token authentication provided by the devise gem.

We were authenticating the user using auth token for every API call.

class Api::V1::BaseController < ApplicationController
  before_action :authenticate_user_using_x_auth_token
  before_action :authenticate_user!

  def authenticate_user_using_x_auth_token
    user_email = params[:email].presence || request.headers['X-Auth-Email']
    auth_token = request.headers['X-Auth-Token'].presence

    @user = user_email && User.find_by(email: user_email)

    if @user && Devise.secure_compare(@user.authentication_token, auth_token)
      sign_in @user, store: false
    else
      render_errors('Could not authenticate with the provided credentials', 401)
    end
  end
end

Everything was working smoothly initially, but we started noticing significant reduction in the response times during peak hours after a few months.

Because of the nature of the business, the application gets API calls for every user after every minute. Sometimes the application also get concurrent API calls for the same user. We noticed that in such cases, the users table was getting locked during the authentication process. This was resulting into cascading holdups and timeouts as it was affecting other API calls which were also accessing the users table.

After looking at the monitoring information, we found that the problem was happening due to the trackable module of devise gem. The trackable module keeps track of the user by storing the sign in time, sign in count and IP address information. Following queries were running for every API call and were resulting into exclusive locks on the users table.

UPDATE users SET last_sign_in_at = '2018-01-09 04:55:04',
current_sign_in_at = '2018-01-09 04:55:05',
sign_in_count = 323,
updated_at = '2018-01-09 04:55:05'
WHERE users.id = $1

To fix this issue, we decided to skip the user tracking for the API calls. We don’t need to track the user as every call is stateless and every request authenticates the user.

Devise provides a hook to achieve this for certain requests through the environment of the request. As we were already using a separate base controller for API requests, it was easy to skip it for all API calls at once.

class Api::V1::BaseController < ApplicationController
  before_action :skip_trackable
  before_action :authenticate_user_using_x_auth_token
  before_action :authenticate_user!

  def skip_trackable
    request.env['warden'].request.env['devise.skip_trackable'] = '1'
  end
end

This fixed the issue of exclusive locks on the users table caused by the trackable module.


Ruby 2.6 Range#cover? now accepts Range object as an argument

This blog is part of our Ruby 2.6 series. Ruby 2.6.0 was released on Dec 25, 2018.

Range#cover? returns true if the object passed as argument is in the range.

(1..10).cover?(5)
=> true

Range#cover? returns false if the object passed as an argument is non-comparable or is not in the range.

Before Ruby 2.6, Range#cover? used to return false if a Range object is passed as an argument.

>> (1..10).cover?(2..5)
=> false

Ruby 2.6

In Ruby 2.6 Range#cover? can accept a Range object as an argument. It returns true if the argument Range is equal to or a subset of the Range.

(1..100).cover?(10..20)
=> true

(1..10).cover?(2..5)
=> true

(5..).cover?(4..)
=> false

("a".."d").cover?("x".."z")
=> false

Here is relevant commit and discussion for this change.