Ruby 2.4 introduces liberal_parsing option for parsing bad CSV data

This blog is part of our Ruby 2.4 series.

Comma-Separated Values (CSV) is a widely used data format and almost every langauge has a module to parse it. In Ruby, we have CSV class to do that.

According to RFC 4180, we cannot have unescaped double quotes in CSV input since such data can’t be parsed.

We get MalformedCSVError error when the CSV data does not conform to RFC 4180.

Ruby 2.4 has added a liberal parsing option to parse such bad data. When it is set to true, Ruby will try to parse the data even when the data does not conform to RFC 4180.

# Before Ruby 2.4

> CSV.parse_line('one,two",three,four')

CSV::MalformedCSVError: Illegal quoting in line 1.


# With Ruby 2.4

> CSV.parse_line('one,two",three,four', liberal_parsing: true)

=> ["one", "two\"", "three", "four"]

Passing block with Enumerable#chunk is not mandatory in Ruby 2.4

This blog is part of our Ruby 2.4 series.

Enumerable#chunk method can be used on enumerator object to group consecutive items based on the value returned from the block passed to it.

[1, 4, 7, 10, 2, 6, 15].chunk { |item| item > 5 }.each { |values| p values }

=> [false, [1, 4]]
[true, [7, 10]]
[false, [2]]
[true, [6, 15]]

Prior to Ruby 2.4, passing a block to chunk method was must.

array = [1,2,3,4,5,6]
array.chunk

=> ArgumentError: no block given

Enumerable#chunk without block in Ruby 2.4

In Ruby 2.4, we will be able to use chunk without passing block. It just returns the enumerator object which we can use to chain further operations.

array = [1,2,3,4,5,6]
array.chunk

=> <Enumerator: [1, 2, 3, 4, 5, 6]:chunk>

Reasons for this change

Let’s take the case of listing consecutive integers in an array of ranges.

# Before Ruby 2.4

integers = [1,2,4,5,6,7,9,13]

integers.enum_for(:chunk).with_index { |x, idx| x - idx }.map do |diff, group|
  [group.first, group.last]
end

=> [[1,2],[4,7],[9,9],[13,13]]

We had to use enum_for here as chunk can’t be called without block.

enum_for creates a new enumerator object which will enumerate by calling the method passed to it. In this case the method passed was chunk.

With Ruby 2.4, we can use chunk method directly without using enum_for as it does not require a block to be passed.

# Ruby 2.4

integers = [1,2,4,5,6,7,9,13]

integers.chunk.with_index { |x, idx| x - idx }.map do |diff, group|
  [group.first, group.last]
end

=> [[1,2],[4,7],[9,9],[13,13]]

Ruby 2.4 unifies Fixnum and Bignum into Integer

This blog is part of our Ruby 2.4 series.

Ruby uses Fixnum class for representing small numbers and Bignum class for big numbers.

# Before Ruby 2.4

1.class         #=> Fixnum
(2 ** 62).class #=> Bignum

In general routine work we don’t have to worry about whether the number we are dealing with is Bignum or Fixnum. It’s just an implementation detail.

Interestingly, Ruby also has Integer class which is superclass for Fixnum and Bignum.

Starting with Ruby 2.4, Fixnum and Bignum are unified into Integer.

# Ruby 2.4

1.class         #=> Integer
(2 ** 62).class #=> Integer

Starting with Ruby 2.4 usage of Fixnum and Bignum constants is deprecated.

# Ruby 2.4

>> Fixnum
(irb):6: warning: constant ::Fixnum is deprecated
=> Integer

>> Bignum
(irb):7: warning: constant ::Bignum is deprecated
=> Integer

How to know if a number is Fixnum, Bignum or Integer?

We don’t have to worry about this change most of the times in our application code. But libraries like Rails use the class of numbers for taking certain decisions. These libraries need to support both Ruby 2.4 and previous versions of Ruby.

Easiest way to know whether the Ruby version is using integer unification or not is to check class of 1.

# Ruby 2.4

1.class #=> Integer

# Before Ruby 2.4
1.class #=> Fixnum

Look at PR #25056 to see how Rails is handling this case.

Similarly Arel is also supporting both Ruby 2.4 and previous versions of Ruby.

Ruby 2.4 implements Array#min and Array#max

This blog is part of our Ruby 2.4 series.

Ruby has Enumerable#min and Enumerable#max which can be used to find the minimum and the maximum value in an Array.

(1..10).to_a.max
#=> 10
(1..10).to_a.method(:max)
#=> #<Method: Array(Enumerable)#max>

Ruby 2.4 adds Array#min and Array#max which are much faster than Enumerable#max and Enuermable#min.

Following benchmark is based on https://blog.blockscore.com/new-features-in-ruby-2-4 .

Benchmark.ips do |bench|
  NUM1 = 1_000_000.times.map { rand }
  NUM2 = NUM1.dup

  ENUM_MAX = Enumerable.instance_method(:max).bind(NUM1)
  ARRAY_MAX = Array.instance_method(:max).bind(NUM2)

  bench.report('Enumerable#max') do
    ENUM_MAX.call
  end

  bench.report('Array#max') do
    ARRAY_MAX.call
  end

  bench.compare!
end

Warming up --------------------------------------
      Enumerable#max     1.000  i/100ms
           Array#max     2.000  i/100ms
Calculating -------------------------------------
      Enumerable#max     17.569  (± 5.7%) i/s -     88.000  in   5.026996s
           Array#max     26.703  (± 3.7%) i/s -    134.000  in   5.032562s

Comparison:
           Array#max:       26.7 i/s
      Enumerable#max:       17.6 i/s - 1.52x  slower

Benchmark.ips do |bench|
  NUM1 = 1_000_000.times.map { rand }
  NUM2 = NUM1.dup

  ENUM_MIN = Enumerable.instance_method(:min).bind(NUM1)
  ARRAY_MIN = Array.instance_method(:min).bind(NUM2)

  bench.report('Enumerable#min') do
    ENUM_MIN.call
  end

  bench.report('Array#min') do
    ARRAY_MIN.call
  end

  bench.compare!
end

Warming up --------------------------------------
      Enumerable#min     1.000  i/100ms
           Array#min     2.000  i/100ms
Calculating -------------------------------------
      Enumerable#min     18.621  (± 5.4%) i/s -     93.000  in   5.007244s
           Array#min     26.902  (± 3.7%) i/s -    136.000  in   5.064815s

Comparison:
           Array#min:       26.9 i/s
      Enumerable#min:       18.6 i/s - 1.44x  slower

This benchmark shows that the new methods Array#max and Array#min are about 1.5 times faster than Enumerable#max and Enumerable#min.

Similar to Enumerable#max and Enumerable#min, Array#max and Array#min also assumes that the objects use Comparable mixin to define spaceship <=> operator for comparing the elements.

Hunting down a memory leak in shoryuken

This is a story of how we found and fixed memory leak in shoryuken.

We use shoryuken to process SQS messages inside of docker containers. A while back we noticed that memory was growing without bound. After every few days, we had to restart all the docker containers as a temporary workaround.

Since the workers were inside of a docker container we had limited tools. So we went ahead with the UNIX way of investigating the issue.

First we noticed that the number of threads inside the worker was high, 115 in our case. shoryuken boots up all the worker threads at startup.

# ps --no-header uH p <PID> | wc -l
#=> 115

The proc filesystem exposes a lot of useful information of all the running processes. ``/proc/[pid]/task` directory has information about all the threads of a process.

Some of the threads with lower ID’s were executing syscall 23 (select) and 271 (ppoll). These threads were waiting for a message to arrive in the SQS queue, but most of the threads were executing syscall 202 (futex).

At this point we had an idea about the root cause of the memory leak - it was due to the worker starting a lot of threads which were not getting terminated. We wanted to know how and when these threads are started.

Ruby 2.0.0 introduced tracepoint, which provides an interface to a lot of internal ruby events like when a exception is raised, when a method is called or when a method returns, etc.

We added the following code to our workers.

tc = TracePoint.new(:thread_begin, :thread_end) do |tp|
  puts tp.event
  puts tp.self.class
end

tc.enable

Executing the ruby workers with tracing enabled revealed that a new Celluloid::Thread was being created before each method was processed and that thread was never terminated. Hence the number of zombie threads in the worker was growing with the number messages processed.

thread_begin
Celluloid::Thread
[development] [306203a5-3c07-4174-b974-77390e8a4fc3] SQS Message: ...snip...

thread_begin
Celluloid::Thread
[development] [2ce2ed3b-d314-46f1-895a-f1468a8db71e] SQS Message: ...snip...

Unfortunately tracepoint didn’t pinpoint the place where the thread was started, hence we added a couple of puts statements to investigate the issue futher.

After a lot of debugging, we were able to find that a new thread was started to increase the visibility time of the SQS message in a shoryuken middleware when auto_visibility_timeout was true.

The fix was to terminate the thread after the work is done.

Ruby 2.4 adds better support for extracting captured data from Regexp match results

This blog is part of our Ruby 2.4 series.

Ruby has MatchData type which is returned by Regexp#match and Regexp.last_match.

It has methods #names and #captures to return the names used for capturing and the actual captured data respectively.

pattern = /(?<number>\d+) (?<word>\w+)/
match_data = pattern.match('100 thousand')
#=> #<MatchData "100 thousand" number:"100" word:"thousand">

>> match_data.names
=> ["number", "word"]
>> match_data.captures
=> ["100", "thousand"]

If we want all named captures in a key value pair, we have to combine the result of names and captures.

match_data.names.zip(match_data.captures).to_h
#=> {"number"=>"100", "word"=>"thousand"}

Ruby 2.4 adds #named_captures which returns both the name and data of the capture groups.

pattern=/(?<number>\d+) (?<word>\w+)/
match_data = pattern.match('100 thousand')

match_data.named_captures
#=> {"number"=>"100", "word"=>"thousand"}

Ruby 2.4 implements Regexp#match? without polluting global variables

This blog is part of our Ruby 2.4 series.

Ruby has many ways to match with a regular expression.

Regexp#===

It returns true/false and sets the $~ global variable.

/stat/ === "case statements"
#=> true
$~
#=> #<MatchData "stat">

Regexp#=~

It returns integer position it matched or nil if no match. It also sets the $~ global variable.

/stat/ =~ "case statements"
#=> 5
$~
#=> #<MatchData "stat">

Regexp#match

It returns match data and also sets the $~ global variable.

/stat/.match("case statements")
#=> #<MatchData "stat">
$~
#=> #<MatchData "stat">

Ruby 2.4 adds Regexp#match?

This new method just returns true/false and does not set any global variables.

/case/.match?("case statements")
#=> true

So Regexp#match? is good option when we are only concerned with the fact that regex matches or not.

Regexp#match? is also faster than its counterparts as it reduces object allocation by not creating a back reference and changing $~.

require 'benchmark/ips'

Benchmark.ips do |bench|

  EMAIL_ADDR = 'disposable.style.email.with+symbol@example.com'
  EMAIL_REGEXP_DEVISE = /\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/

  bench.report('Regexp#===') do
    EMAIL_REGEXP_DEVISE === EMAIL_ADDR
  end

  bench.report('Regexp#=~') do
    EMAIL_REGEXP_DEVISE =~ EMAIL_ADDR
  end

  bench.report('Regexp#match') do
    EMAIL_REGEXP_DEVISE.match(EMAIL_ADDR)
  end

  bench.report('Regexp#match?') do
    EMAIL_REGEXP_DEVISE.match?(EMAIL_ADDR)
  end

  bench.compare!
end

#=> Warming up --------------------------------------
#=>          Regexp#===   103.876k i/100ms
#=>           Regexp#=~   105.843k i/100ms
#=>        Regexp#match    58.980k i/100ms
#=>       Regexp#match?   107.287k i/100ms
#=> Calculating -------------------------------------
#=>          Regexp#===      1.335M (± 9.5%) i/s -      6.648M in   5.038568s
#=>           Regexp#=~      1.369M (± 6.7%) i/s -      6.880M in   5.049481s
#=>        Regexp#match    709.152k (± 5.4%) i/s -      3.539M in   5.005514s
#=>       Regexp#match?      1.543M (± 4.6%) i/s -      7.725M in   5.018696s
#=>
#=> Comparison:
#=>       Regexp#match?:  1542589.9 i/s
#=>           Regexp#=~:  1369421.3 i/s - 1.13x  slower
#=>          Regexp#===:  1335450.3 i/s - 1.16x  slower
#=>        Regexp#match:   709151.7 i/s - 2.18x  slower

Ruby 2.4 implements Enumerable#sum

This blog is part of our Ruby 2.4 series.

It is a common use case to calculate sum of the elements of an array or values from a hash.

[1, 2, 3, 4] => 10
 
{a: 1, b: 6, c: -3} => 4

Active Support already implements Enumerable#sum

> [1, 2, 3, 4].sum
 #=> 10 

> {a: 1, b: 6, c: -3}.sum{ |k, v| v**2 }
 #=> 46 

> ['foo', 'bar'].sum # concatenation of strings
 #=> "foobar" 

> [[1], ['abc'], [6, 'qwe']].sum # concatenation of arrays
 #=> [1, "abc", 6, "qwe"]

Until Ruby 2.3, we had to use Active Support to use Enumerable#sum method or we could use #inject which is used by Active Support under the hood.

Ruby 2.4.0 implements Enumerable#sum as part of the language itself.

Let’s take a look at how sum method fares on some of the enumerable objects in Ruby 2.4.

> [1, 2, 3, 4].sum
 #=> 10 
 
> {a: 1, b: 6, c: -3}.sum { |k, v| v**2 }
 #=> 46 

> ['foo', 'bar'].sum 
 #=> TypeError: String can't be coerced into Integer

> [[1], ['abc'], [6, 'qwe']].sum 
 #=> TypeError: Array can't be coerced into Integer

As we can see, the behavior of Enumerable#sum from Ruby 2.4 is same as that of Active Support in case of numbers but not the same in case of string or array concatenation. Let’s see what is the difference and how we can make it work in Ruby 2.4 as well.

Understanding addition/concatenation identity

The Enumberable#sum method takes an optional argument which acts as an accumulator. Both Active Support and Ruby 2.4 accept this argument.

When identity argument is not passed, 0 is used as default accumulator in Ruby 2.4 whereas Active Support uses nil as default accumulator.

Hence in the cases of string and array concatenation, the error occurred in Ruby because the code attempts to add a string and array respectively to 0.

To overcome this, we need to pass proper addition/concatenation identity as an argument to the sum method.

The addition/concatenation identity of an object can be defined as the value with which calling + operation on an object returns the same object.

> ['foo', 'bar'].sum('')
 #=> "foobar"

> [[1], ['abc'], [6, 'qwe']].sum([])
 #=> [1, "abc", 6, "qwe"]
 

What about Rails ?

As we have seen earlier, Ruby 2.4 implements Enumerable#sum favouring numeric operations whereas also supporting non-numeric callers with the identity element. This behavior is not entirely same as that of Active Support. But still Active Support can make use of the native sum method whenever possible. There is already a pull request open which uses Enumerable#sum from Ruby whenever possible. This will help gain some performance boost as the Ruby’s method is implemented natively in C whereas that in Active Support is implemented in Ruby.

String#concat, Array#concat and String#prepend take multiple arguments in Ruby 2.4

This blog is part of our Ruby 2.4 series.

In Ruby, we use #concat to append a string to another string or an element to the array. We can also use #prepend to add a string at the beginning of a string.

Ruby 2.3

String#concat and Array#concat

string = "Good"
string.concat(" morning")
#=> "Good morning"

array = ['a', 'b', 'c']
array.concat(['d'])
#=> ["a", "b", "c", "d"]

String#prepend

string = "Morning"
string.prepend("Good ")
#=> "Good morning"

Before Ruby 2.4, we could pass only one argument to these methods. So we could not add multiple items in one shot.

string = "Good"
string.concat(" morning", " to", " you")
#=> ArgumentError: wrong number of arguments (given 3, expected 1)

Changes with Ruby 2.4

In Ruby 2.4, we can pass multiple arguments and Ruby processes each argument one by one.

String#concat and Array#concat

string = "Good"
string.concat(" morning", " to", " you")
#=> "Good morning to you"

array = ['a', 'b']
array.concat(['c'], ['d'])
#=> ["a", "b", "c", "d"]

String#prepend

string = "you"
string.prepend("Good ", "morning ", "to ")
#=> "Good morning to you"

These methods work even when no argument is passed unlike in previous versions of Ruby.

"Good".concat
#=> "Good"

Difference between concat and shovel << operator

Though shovel << operator can be used interchangably with concat when we are calling it once, there is a difference in the behavior when calling it multiple times.

str = "Ruby"
str << str
str
#=> "RubyRuby"

str = "Ruby"
str.concat str
str
#=> "RubyRuby"

str = "Ruby"
str << str << str
#=> "RubyRubyRubyRuby"

str = "Ruby"
str.concat str, str
str
#=> "RubyRubyRuby"

So concat behaves as appending present content to the caller twice. Whereas calling << twice is just sequence of binary operations. So the argument for the second call is output of the first << operation.

Hash#compact and Hash#compact! now part of Ruby 2.4

This blog is part of our Ruby 2.4 series.

It is a common use case to remove the nil values from a hash in Ruby.

{ "name" => "prathamesh", "email" => nil} => { "name" => "prathamesh" }

Active Support already has a solution for this in the form of Hash#compact and Hash#compact!.

hash = { "name" => "prathamesh", "email" => nil}
hash.compact #=> { "name" => "prathamesh" }
hash #=> { "name" => "prathamesh", "email" => nil}

hash.compact! #=> { "name" => "prathamesh" }
hash #=> { "name" => "prathamesh" }

Now, Ruby 2.4 will have these 2 methods in the language itself, so even those not using Rails or Active Support will be able to use them. Additionally it will also give performance boost over the Active Support versions because now these methods are implemented in C natively whereas the Active Support versions are in Ruby.

There is already a pull request open in Rails to use the native versions of these methods from Ruby 2.4 whenever available so that we will be able to use the performance boost.

Rails 5 blogs and the art of story telling

Between October 31,2015 and Sep 5, 2016 we wrote 80 blogs on changes in Rails 5.

Producing a blog every 4 days consistently over 310 days takes persistence and time - lots of it.

We needed to go through all the commits and then pick the ones which are worth writing about and then write about it. Going into this I knew it would be a hard task. Ruby on Rails is now a well crafted machine. In order to fully undertand what’s going on in the code base we need to spend sufficient time on it.

However I was surprised by the thing that turned to be the hardest - telling story of the code change.

Every commit has a story. There is a reason for it. The commit itself might be minor but that code change in itself does not tell the full story.

For example take this commit. This commit is so simple that you might think it is not worth writing about. However in order to fully understand what it does we need to tell the full story which was captured in this blog.

Or take the case of Rails 5 officially supports MariaDB . The blog captures the full story and not just the code that changed.

Now you might say that I have cherry picked blog posts that favor my case. So let’s pick a blog which is simple.

You might wonder what could go wrong with a blog like this. As it turns out, plenty. That’s because writing a blog also requires defining the boundary of the blog. Deciding what to include and what to leave out is hard. One gets a feel for it only after writing it. And after having typed the words on screen, pruning is hard.

A good written article is simple writing. The problem with article which are simple to readers is that - well it is simple. So it feels to readers that writing it must be simple. Nothing can be further from the truth. It takes a lot of hard work to produce anything simple. It’s true in writing. And it’s true in producing software.

Coming back to the “Skipping Mailer” blog, it took quite a bit of back and forth to bring the blog to its essence. So yes the final output is quite short but that does not mean that it took short amount of time to produce it.

Tell a story even if you have 10 seconds

John Lasseter was working as an animator at Disney in 1984. He was just fired from Disney for promoting computer animations at Disney. Lasseter joins Lucasfilm. Lucasfilm renamed itselfs to Pixar Graphics Group and sold itself to Steve Jobs for $5 million.

Lasseter was tasked with producing a short film that would show the power of what computer animations could do so that Pixar Graphics Group can get some projects like producing TV commercials with cartoon characters and earn some money. Lasseter needed to produce a short film for the upcoming computer graphics animation conference.

His initial idea was to have a short movie having a plotless character. He presented this idea to a conference in Brussels. There Belgian animator Raoul Servais commented in slightly harsh tone that

No matter how short it is, it should have a beginning, a middle, and an end. Don’t forget the story.

Lasseter complained that it’s a pretty short movie and there might not be time to present a story.

Raoul Servais replied

You can tell a story in ten seconds.

Lasseter started developing a character. He camed up with the idea of Luxo Jr.

Here is final production of Luxo Jr.

Luxo Jr. was a major hit at the conference. Crowd was on its feet in applause even before the two minutes film was over. Remember this is 1986 and Computer Animation was not much advanced at that time and this was the first movie ever made with the use of just computer graphics.

Lasseter later said that when audience was watching the movie they forgot that they were watching a computer animated film because the story took over them. He learned the lesson that technology should enable better story telling and technology in itself divorced from story telling would not advance the cause of Pixar.

Later John Lasseter went on to produce hits like Toy Story, A bug’s life, Toy Story 2, Cars, Cars 2, Monsters Inc, Finding Nemo and many more.

So you see even a great John Lasseter had to be reminded to tell a story.

Actual content over bullet points

Jeff Bezos is so focused on knowing the full story that he banned usage of power point in internal meetings and discussions. As per him it is easy to hide behind bullet points in a power point presentation.

He insisted on writing the full story in word document and distribute it to meeting attendees. The meetings starts with everyone head down reading the document.

He is also known for saying that if we are building a feature then we first need to know how it would be presented to the consumers when it is unveiled. We need to know the story we are going to tell them. Without the story we won’t have full picture of what we are going to build.

Learning to tell story is a journey

I’m glad that during the last 310 days 16 people contributed to the blog posts. The process of writing the posts at times was frustrating for a bunch of them. They had done the work of digging into the code and had posted their findings. Continuously getting feedback to edit the blog to build a nice coherent story where each paragraph is a extension of the previous paragraph is a downer. Some were dismayed at why we are spending so much energy on a technical blog.

However in the end we all are happy that we underwent this exercise. We could see the initial draft of the blog and the final version and we all could see the difference.

By no means we have mastered the art of storytelling. It’s a long journey. However we believe we are on the right path. Hopefully in coming months and years we at BigBinary would be able to bring to you more stories from changes in Rails and other places.

Rails 5 adds ability to create module and class level variables on per thread basis

This blog is part of our Rails 5 series.

Rails already provides methods for creating class level and module level variables in the form of cattr_* and mattr_* suite of methods.

In Rails 5, we can go a step further and create thread specific class or module level variables.

Here is an example which demonstrates an example on how to use it.

module CurrentScope
  thread_mattr_accessor :user_permissions
end

class ApplicationController < ActionController::Base

  before_action :set_permissions

  def set_permissions
    user = User.find(params[:user_id])
    CurrentScope.user_permissions = user.permissions
  end

end

Now CurrentScope.user_permissions will be available till the lifetime of currently executing thread and all the code after this point can use this variable.

For example, we can access this variable in any of the models without explicitly passing current_user from the controller.

class BookingsController < ApplicationController
  def create
    Booking.create(booking_params)
  end
end

class Booking < ApplicationRecord
  validate :check_permissions
      
  private

  def check_permissions
    unless CurrentScope.user_permissions.include?(:create_booking)
      self.errors.add(:base, "Not permitted to allow creation of booking")
    end
  end
end

It internally uses Thread.current#[]= method, so all the variables are scoped to the thread currently executing. It will also take care of namespacing these variables per class or module so that CurrentScope.user_permissions and RequestScope.user_permissions will not conflict with each other.

If you have used PerThreadRegistry before for managing global variables, thread_mattr_* & thread_cattr_* methods can be used in place of it starting from Rails 5.

Globals are generally bad and should be avoided but this change provides nicer API if you want to fiddle with them anyway!

Rails 5 silences assets logs in development mode by default

This blog is part of our Rails 5 series.

As a Rails developer, it was a familiar sign to see assets logs flooding the whole terminal in development mode.

Started GET "/assets/application.self-4a04ce68c5ebf2d39fba46316802f17d0a73fadc4d2da50a138d7a4bf2d26a84.css?body=1" for 127.0.0.1 at 2016-09-02 10:23:04 +0530
Started GET "/assets/bootstrap/transition.self-6ad2488465135ab731a045a8ebbe3ea2fc501aed286042496eda1664fdd07ba9.js?body=1" for 127.0.0.1 at 2016-09-02 10:23:04 +0530
Started GET "/assets/bootstrap/collapse.self-2eb697f62b587bb786ff940d82dd4be88cdeeaf13ca128e3da3850c5fcaec301.js?body=1" for 127.0.0.1 at 2016-09-02 10:23:04 +0530
Started GET "/assets/jquery_ujs.self-e87806d0cf4489aeb1bb7288016024e8de67fd18db693fe026fe3907581e53cd.js?body=1" for 127.0.0.1 at 2016-09-02 10:23:04 +0530
Started GET "/assets/jquery.self-660adc51e0224b731d29f575a6f1ec167ba08ad06ed5deca4f1e8654c135bf4c.js?body=1" for 127.0.0.1 at 2016-09-02 10:23:04 +0530

Fortunately, we could include quiet_assets gem in our application. It turns off the Rails asset pipeline log in development mode.

Started GET "/assets/application.js" for 127.0.0.1 at 2016-08-28 19:35:34

quiet_assets is part of Rails 5

Now quiet_assets gem is folded into Rails 5 itself.

A new configuration config.assets.quiet which when set to true, loads a rack middleware named Sprockets::Rails::QuietAssets. This middleware checks whether the current request matches assets prefix path and if it does, it silences that request.

This eliminates the need to add external gem for this.

By default, config.assets.quiet is set to true in development mode. So we don’t have to do anything. It just works out of the box.

Compatibility with older versions of Rails

This functionality has been backported to sprockets-rails 3.1.0 and is available in Rails 4.2.7 as well.

Rails 5 disables autoloading after booting the app in production

This blog is part of our Rails 5 series.

This blog requires understanding of what is autoloading. If you are not familiar with that then please refer to Autoloading and Reloading Constants article on Rails Guide.

Eagerload paths

Autoloading is not thread-safe and hence we need to make sure that all constants are loaded when application boots. The concept of loading all the constants even before they are actually needed is called “Eager loading”. In a way it is opposite of “Autoloading”. In the case of “Autoloading” the application does not load the constant until it is needed. Once a class is needed and it is missing then the application starts looking in “autoloading paths” to load the missing class.

eager_load_paths contains a list of directories. When application boots in production then the application loads all constants found in all directories listed in eager_load_paths.

We can add directories to eager_load_paths as shown below.

# config/application.rb

config.eager_load_paths << Rails.root.join('lib')

In Rails 5 autoloading is disabled for production environment by default

With this commit Rails will no longer do Autoloading in production after it has booted.

Rails will load all the constants from eager_load_paths but if a constant is missing then it will not look in autoload_paths and will not attempt to load the missing constant.

This is a breaking change for some applications. For vast majority of the applications this should not be an issue.

In the rare situation where our application still needs autoloading in the production environment, we can enable it by setting up enable_dependency_loading to true as follows:

# config/application.rb

config.enable_dependency_loading = true
config.autoload_paths << Rails.root.join('lib')

Rails 5 adds more control to fine tuning SSL usage

This blog is part of our Rails 5 series.

Adding HTTPS support is one of the first steps towards enhancing the security of a web application.

Even when a web app is available over https, some users may end up visiting the http version of the app, losing the security https provides.

It is important to redirect users to the https URLs whenever possible.

Forcing HTTPS in Rails

We can force users to use HTTPS by setting config.force_ssl = true.

If we look at Rails source code, we can see that when we set config.force_ssl = true, a middleware ActionDispatch::SSL, is inserted into our app’s middleware stack :

if config.force_ssl
  middleware.use ::ActionDispatch::SSL,config.ssl_options
end

This middleware, ActionDispatch::SSL is responsible for doing three things :

  1. Redirect all http requests to their https equivalents.

  2. Set secure flag on cookies to tell browsers that these cookies must not be sent for http requests.

  3. Add HSTS headers to response.

Let us go through each of these.

Redirect all http requests to their https equivalents

In Rails 5, we can configure the behavior of redirection using the redirect key in the config.ssl_options configuration.

In previous versions of Rails, whenever an http request was redirected to https request, it was done with an HTTP 301 redirect.

Browsers cache 301 redirects. When forcing https redirects, if at any point we want to test the http version of the page, it would be hard to browse it, since the browser would redirect to the https version. Although this is the desired behavior, this is a pain during testing and deploying.

Rails 5 lets us specify the status code for redirection, which can be set to 302 or 307 for testing, and later to 301 when we are ready for deployment to production.

We can specify the options for redirection in Rails 5 as follows :

...
  config.force_ssl = true
  config.ssl_options = {  redirect: { status: 307, port: 81 } }
...

If a redirect status is not specified, requests are redirected with a 301 status code.

There is an upcoming change to make the status code used for redirecting any non-GET, non-HEAD http requests to 307 by default.

Other options accepted by ssl_options under redirect key are host and body .

Set secure flags on cookies

By setting the Secure flag on a cookie, the application can instruct the browser not to send the cookie in clear text. Browsers which support this flag will send such cookies only through HTTPS connections.

Setting secure flag on cookies is important to prevent cookie hijacking by man in the middle attacks.

In case of a “man in the middle” attack, the attacker places oneself between the user and the server. By doing this, attacker aims to collect cookies which are sent from user to server on every request. However, if we mark the cookies with sensitive information as Secure, those cookies won’t be sent on http requests. This ensures that the browser never sends cookies to an attacker who was impersonating the webserver at an http end point.

Upon enabling config.force_ssl = true, the ActionDispatch::SSL middleware sets the Secure flag on all cookies by default.

Set HSTS Headers on Responses

HSTS or “HTTP Strict Transport Security” is a security enhancement by which applications can specify themselves as HTTPS-only to complying browsers.

HSTS capabilities of a browser can be used by sending appropriate response headers from the server. When a domain is added to the HSTS list of a browser, the browser redirects to the https version of the URL without the help of the server.

Chrome maintains an HSTS Preload List with a list of domains which are hardcoded into chrome as HTTPS only. This list is also used by Firefox and Safari.

Rails 5 has a configuration flag to set the preload directive in the HSTS header and can be used as follows :

config.ssl_options = { hsts: { preload: true } }

We can also specify a max-age for the HSTS header.

Rails 5 by default sets the max-age of HSTS header to 180 days, which is considered as the lower bound by SSL Lab’s SSL Test . This period is also above the 18 week requirement for HSTS max-age mandated for inclusion in browser preload list.

We can specify a custom max-age by :

  config.ssl_options = { hsts: { expires: 10.days } }

In Rails 5, if we disable HSTS by setting :

config.ssl_options = { hsts: false }

Rails 5 will set the value of expires header to 0, so that browsers immediately stop treating the domain as HTTPS-only.

With custom redirect status and greater control over the HSTS header, Rails 5 lets us roll out HTTPS in a controlled manner, and makes rolling back of these changes easier.

Rails 5 trims session storage by discarding some flash messages

This blog is part of our Rails 5 series.

Rails, by default, stores session data in cookies.

The cookies have a storage limit of 4K and cookie overflow exception is raised if we attempt to store more than 4K of data in it.

Flash messages are persisted across requests with the help of session storage.

Flash messages like flash.now are marked as discarded for next request. So, on next request, it gets deleted before reconstituting the values.

This unnecessary storage of discarded flash messages leads to more consumption of data in the cookie store. When the data exceeds 4K limit, Rails throws ActionDispatch::Cookies::CookieOverflow.

Let us see an example below to demonstrate this.

class TemplatesController < ApplicationController
  def search
    @templates = Template.search(params[:search])
    flash.now[:notice] = "Your search results for #{params[:search]}"
    flash[:alert] = "Alert message"
    p session[:flash]
  end
end

#logs

{"discard"=>["notice"],
"flashes"=>{"notice"=>"Your search results for #{Value of search params}",
"alert"=>"Alert message"}}

In the above example, it might be possible that params[:search] is large amount of data and it causes Rails to raise CookieOverflow as the session persists both flash.now[:notice] and flash[:alert] .

Rails 5 removes discarded flash messages

In Rails 5, discarded flash messages are removed before persisting into the session leading to less consumption of space and hence, fewer chances of CookieOverflow being raised.

class TemplatesController < ApplicationController
  def search
    @templates = Template.search(params[:search], params[:template])
    flash.now[:notice] = "Your search results for #{params[:search]} with template #{params[:template]}"
    flash[:alert] = "Alert message"
    p session[:flash]
  end
end

#logs

{"discard"=>[], "flashes"=>{"alert"=>"Alert message"}}

We can see from above example, that flash.now value is not added in session in Rails 5 leading to less chances of raising ActionDispatch::Cookies::CookieOverflow.

Rails 5 deprecates alias_method_chain in favor of module prepend

This blog is part of our Rails 5 series.

Rails 5 has deprecated usage of alias_method_chain in favor of Ruby’s built-in method Module#prepend.

What is alias_method_chain and when to use it

A lot of good articles have been written by some very smart people on the topic of “alias_method_chain”. So we will not be attempting to describe it here.

Ernier Miller wrote When to use alias_method_chain more than five years ago but it is still worth a read.

Using Module#prepend to solve the problem

Ruby 2.0 introduced Module#prepend which allows us to insert a module before the class in the class ancestor hierarchy.

Let’s try to solve the same problem using Module#prepend.

module Flanderizer
  def hello
    "#{super}-diddly"
  end
end

class Person
  def hello
    "Hello"
  end
end

# In ruby 2.0
Person.send(:prepend, Flanderizer)

# In ruby 2.1
Person.prepend(Flanderizer)

flanders = Person.new
puts flanders.hello #=> "Hello-diddly"

Now we are back to being nice to our neighbor which should make Ernie happy.

Let’s see what the ancestors chain looks like.

flanders.class.ancestors # => [Flanderizer, Person, Object, Kernel]

In Ruby 2.1 both Module#include and Module#prepend became a public method. In the above example we have shown both Ruby 2.0 and Ruby 2.1 versions.

New framework defaults in Rails 5 to make upgrade easier

When a new version of Rails comes out, one of the pain points is upgrading existing apps to the latest version.

A Rails upgrade can be boiled down to following essential steps :

  1. Have a green build
  2. Update the Rails version in Gemfile and bundle
  3. Run the update task to update configuration files
  4. Run tests and sanity checks to see if anything is broken by the upgrade and fix the issues
  5. Repeat step 4!

Rails 5 comes with a lot of new features. Some of them, like not halting the callback chain when a callback returns false, are breaking changes for older apps.

To keep the upgrade process easier, Rails 5 has added feature flags for all of these breaking changes.

When we create a brand new Rails 5 app, all of the feature flags will be turned on. We can see these feature flags in config/initializers/new_framework_defaults.rb file.

But when we upgrade an app to Rails 5, just updating the Gemfile and bundling is not enough.

We need to run the bin/rails app:update task which will update few configurations and also add config/initializers/new_framework_defaults.rb file.

Rails will turn off all the feature flags in the config/initializers/new_framework_defaults.rb file while upgrading an older app. In this way our app won’t break due to the breaking features.

Let’s take a look at these configuration flags one by one.

Enable per-form CSRF tokens

Starting from Rails 5, each form will get its own CSRF token. This change will have following feature flag.

Rails.application.config.action_controller.per_form_csrf_tokens

For new apps, it will be set to true and for older apps upgraded to Rails 5, it will be set to false. Once we are ready to use this feature in our upgraded app, we just need to change it to true.

Enable HTTP Origin Header checking for CSRF mitigation

For additional defense against CSRF attacks, Rails 5 has a feature to check HTTP Origin header against the site’s origin. This will be disabled by default in upgraded apps using the following configuration option:

Rails.application.config.action_controller.forgery_protection_origin_check

We can set it to true to enable HTTP origin header check when we are ready to use this feature.

Make Ruby 2.4 preserve the timezone of the receiver

In Ruby 2.4 the to_time method for both DateTime and Time will preserve the timezone of the receiver when converting to an instance of Time. For upgraded apps, this feature is disabled by setting the following configuration option to false :

ActiveSupport.to_time_preserves_timezone

To use the Ruby 2.4+ default of to_time, set this to true .

Require belongs_to associations by default

In Rails 5, when we define a belongs_to association, the association record is required to be present.

In upgraded apps, this validation is not enabled. It is disabled using the following option:

Rails.application.config.active_record.belongs_to_required_by_default

We can update our code to use this feature and turn this on by changing the above option to true.

Do not halt callback chain when a callback returns false

In Rails 5, callback chain is not halted when a callback returns false. This change is turned off for backward compatibility with the following option set to true:

ActiveSupport.halt_callback_chains_on_return_false

We can use the new behavior of not halting the callback chain after making sure that our code does not break due to this change and changing the value of this config to false.

Configure SSL options to enable HSTS with subdomains

HTTP Strict Transport Security or HSTS, is a web security policy mechanism which helps to protect websites against protocol downgrade attacks and cookie hijacking. Using HSTS, we can ask browsers to make connections using only HTTPS. In upgraded apps, HSTS is not enabled on subdomains. In new apps HSTS is enabled using the following option :

Rails.application.config.ssl_options = { hsts: { subdomains: true } }

Having all these backward incompatible features which can be turned on one by one after the upgrade, in one file, eases the upgrade process. This initializer also has helpful comments explaining the features!

Happy Upgrading!

Rails 5 allows wildcard for specifying template dependencies for cache digests

This blog is part of our Rails 5 series.

Cache Digests

After cache digests were introduced in Rails, all calls to #cache in views automatically append a digest of that template and all of its dependencies to the cache key.

So developers no longer need to manually discard cache for the specific templates they make changes to.

# app/views/users/show.html.erb
<% cache user do %>
  <h1>All Posts</h1>
  <%= render user.posts %>
<% end %>

# app/views/posts/_post.html.erb
<% cache post do %>
  <p> <%= post.content %></p>
  <p> <%= post.created_at.to_s %>
  <%= render 'posts/completed' %>
<% end %>

This creates a caching key something like this views/users/605416233-20129410191209/d9fb66b12bx8edf46707c67ab41d93cb2 which depends upon the template and its dependencies.

So, now if we change posts/_completed.html.erb, it will change cache key and thus it allows cache to expire automatically.

Explicit dependencies

As we saw in our earlier example, Rails was able to determine template dependencies implicitly. But, sometimes it is not possible to determine dependencies at all.

Let’s see an example below.

# app/views/users/show.html.erb
<% cache user do %>
  <h1>All Posts</h1>
  <%= render user.posts %>
<% end %>

# app/views/posts/_post.html.erb
<% cache post do %>
  <p> <%= post.content %></p>
  <p> <%= post.created_at.to_s %>
  <%= render_post_complete_or_not(post) %>
<% end %>

# app/helpers/posts_helper.rb

module PostsHelper
  def render_post_complete_or_not(post)
    if post.completed?
      render 'posts/complete'
    else
      render 'posts/incomplete'
    end
  end
end

To explicitly add dependency on this template, we need to add a comment in special format as follows.

  <%# Template Dependency: posts/complete %>
  <%# Template Dependency: posts/incomplete %>

If we have multiple dependencies, we need to add special comments for all the dependencies one by one.

# app/views/posts/_post.html.erb
<% cache post do %>
  <p> <%= post.content %></p>
  <p> <%= post.created_at.to_s %>

  <%# Template Dependency: posts/complete %>
  <%# Template Dependency: posts/incomplete %>
  <%= render_post_complete_or_not(post) %>
<% end %>

Using Wildcard in Rails 5

In Rails 5, we can now use a wildcard for adding dependencies on multiple files in a directory. So, instead of adding files one by one we can add dependency using wildcard.

# app/views/posts/_post.html.erb
<% cache post do %>
  <p> <%= post.content %></p>
  <p> <%= post.created_at.to_s %>

  <%# Template Dependency: posts/* %>
  <%= render_post_complete_or_not(post) %>
<% end %>

Rails 5 supports passing collection of records to 'fresh_when' and 'stale?'

This blog is part of our Rails 5 series.

Rails has powerful tools to control caching of resources via HTTP such as fresh_when and stale?.

Previously we could only pass a single record to these methods but now Rails 5 adds support for accepting a collection of records as well. For example,

def index
  @posts = Post.all
  fresh_when(etag: @posts, last_modified: @posts.maximum(:updated_at))
end

or simply written as,

def index
  @posts = Post.all
  fresh_when(@posts)
end

This works with stale? method too, we can pass a collection of records to it. For example,

def index
  @posts = Post.all

  if stale?(@posts)
    render json: @posts
  end
end

To see this in action, let’s begin by making a request at /posts.

$ curl -I http://localhost:3000/posts

HTTP/1.1 200 OK
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
ETag: W/"a2b68b7a7f8c67f1b88848651a86f5f5"
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 7c8457e7-9d26-4646-afdf-5eb44711fa7b
X-Runtime: 0.074238

In the second request, we would send the ETag in If-None-Match header to check if the data has changed.

$ curl -I -H 'If-None-Match: W/"a2b68b7a7f8c67f1b88848651a86f5f5"' http://localhost:3000/posts

HTTP/1.1 304 Not Modified
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
ETag: W/"a2b68b7a7f8c67f1b88848651a86f5f5"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 6367b2a5-ecc9-4671-8a79-34222dc50e7f
X-Runtime: 0.003756

Since there’s no change, the server returned HTTP/1.1 304 Not Modified. If these requests were made from a browser, it would automatically use the version in its cache on the second request.

The second request was obviously faster as the server was able to save the time of fetching data and rendering it. This can be seen in Rails log,

Started GET "/posts" for ::1 at 2016-08-06 00:39:44 +0530
Processing by PostsController#index as HTML
   (0.2ms)  SELECT MAX("posts"."updated_at") FROM "posts"
   (0.1ms)  SELECT COUNT(*) AS "size", MAX("posts"."updated_at") AS timestamp FROM "posts"
  Rendering posts/index.html.erb within layouts/application
  Post Load (0.2ms)  SELECT "posts".* FROM "posts"
  Rendered posts/index.html.erb within layouts/application (2.0ms)
Completed 200 OK in 31ms (Views: 27.1ms | ActiveRecord: 0.5ms)


Started GET "/posts" for ::1 at 2016-08-06 00:39:46 +0530
Processing by PostsController#index as HTML
   (0.2ms)  SELECT MAX("posts"."updated_at") FROM "posts"
   (0.1ms)  SELECT COUNT(*) AS "size", MAX("posts"."updated_at") AS timestamp FROM "posts"
Completed 304 Not Modified in 2ms (ActiveRecord: 0.3ms)

Cache expires when collection of records is updated. For example, an addition of a new record to the collection or a change in any of the records (which changes updated_at) would change the ETag.

Now that Rails 5 supports collection of records in fresh_when and stale?, we have an improved system to cache resources and make our applications faster. This is more helpful when we have controller actions with time consuming data processing logic.