Ruby 2.4 has optimized lstrip and strip methods for ASCII strings

This blog is part of our Ruby 2.4 series.

Ruby has lstrip and rstrip methods which can be used to remove leading and trailing whitespaces respectively from a string.

Ruby also has strip method which is a combination of lstrip and rstrip and can be used to remove both, leading and trailing whitespaces, from a string.

"    Hello World    ".lstrip    #=> "Hello World    "
"    Hello World    ".rstrip    #=> "    Hello World"
"    Hello World    ".strip     #=> "Hello World"

Prior to Ruby 2.4, the rstrip method was optimized for performance, but the lstrip and strip were somehow missed. In Ruby 2.4, String#lstrip and String#strip methods too have been optimized to get the performance benefit of String#rstrip .

Let’s run following snippet in Ruby 2.3 and Ruby 2.4 to benchmark and compare the performance improvement.

require 'benchmark/ips'

Benchmark.ips do |bench|
  str1 = " " * 10_000_000 + "hello world" + " " * 10_000_000
  str2 = str1.dup
  str3 = str1.dup

  bench.report('String#lstrip') do
    str1.lstrip
  end

  bench.report('String#rstrip') do
    str2.rstrip
  end

  bench.report('String#strip') do
    str3.strip
  end
end

Result for Ruby 2.3

Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     8.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     10.989  (± 0.0%) i/s -     55.000  in   5.010903s
       String#rstrip     92.514  (± 5.4%) i/s -    464.000  in   5.032208s
        String#strip     10.170  (± 0.0%) i/s -     51.000  in   5.022118s

Result for Ruby 2.4

Warming up --------------------------------------
       String#lstrip    14.000  i/100ms
       String#rstrip     8.000  i/100ms
        String#strip     6.000  i/100ms
Calculating -------------------------------------
       String#lstrip    143.424  (± 4.2%) i/s -    728.000  in   5.085311s
       String#rstrip     89.150  (± 5.6%) i/s -    448.000  in   5.041301s
        String#strip     67.834  (± 4.4%) i/s -    342.000  in   5.051584s

From the above results, we can see that in Ruby 2.4, String#lstrip is around 14x faster while String#strip is around 6x faster. String#rstrip as expected, has nearly the same performance as it was already optimized in previous versions.

Performance remains same for multi-byte strings

Strings can have single byte or multi-byte characters.

For example Lé Hello World is a multi-byte string because of the presence of é which is a multi-byte character.

'e'.bytesize        #=> 1
'é'.bytesize        #=> 2

Let’s do performance benchmarking with string Lé hello world instead of hello world.

Result for Ruby 2.3

Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     1.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     11.147  (± 9.0%) i/s -     56.000  in   5.034363s
       String#rstrip      8.693  (± 0.0%) i/s -     44.000  in   5.075011s
        String#strip      5.020  (± 0.0%) i/s -     26.000  in   5.183517s

Result for Ruby 2.4

Warming up --------------------------------------
       String#lstrip     1.000  i/100ms
       String#rstrip     1.000  i/100ms
        String#strip     1.000  i/100ms
Calculating -------------------------------------
       String#lstrip     10.691  (± 0.0%) i/s -     54.000  in   5.055101s
       String#rstrip      9.524  (± 0.0%) i/s -     48.000  in   5.052678s
        String#strip      4.860  (± 0.0%) i/s -     25.000  in   5.152804s

As we can see, the performance for multi-byte strings is almost the same across Ruby 2.3 and Ruby 2.4.

Explanation

The optimization introduced is related to how the strings are parsed to detect for whitespaces. Checking for whitespaces in multi-byte string requires an additional overhead. So the patch adds an initial condition to check if the string is a single byte string, and if so, processes it separately.

In most of the cases, the strings are single byte so the performance improvement would be visible and helpful.

IO#readlines now accepts chomp flag as an argument

This blog is part of our Ruby 2.4 series.

Consider the following file which needs to be read in Ruby. We can use the IO#readlines method to get the lines in an array.

# lotr.txt

Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.

Ruby 2.3

IO.readlines('lotr.txt')
#=> ["Three Rings for the Elven-kings under the sky,\n", "Seven for the Dwarf-lords in their halls of stone,\n", "Nine for Mortal Men doomed to die,\n", "One for the Dark Lord on his dark throne\n", "In the Land of Mordor where the Shadows lie."]

As we can see, the lines in the array have a \n, newline character, which is not skipped while reading the lines. The newline character needs to be chopped in most of the cases. Prior to Ruby 2.4, it could be done in the following way.

IO.readlines('lotr.txt').map(&:chomp)
#=> ["Three Rings for the Elven-kings under the sky,", "Seven for the Dwarf-lords in their halls of stone,", "Nine for Mortal Men doomed to die,", "One for the Dark Lord on his dark throne", "In the Land of Mordor where the Shadows lie."]

Ruby 2.4

Since it was a common requirement, Ruby team decided to add an optional parameter to the readlines method. So the same can now be achieved in Ruby 2.4 in the following way.

IO.readlines('lotr.txt', chomp: true)
#=> ["Three Rings for the Elven-kings under the sky,", "Seven for the Dwarf-lords in their halls of stone,", "Nine for Mortal Men doomed to die,", "One for the Dark Lord on his dark throne", "In the Land of Mordor where the Shadows lie."]

Additionally, IO#gets, IO#readline, IO#each_line, IO#foreach methods also have been modified to accept an optional chomp flag.

open-uri in Ruby 2.4 allows http to https redirection

In Ruby 2.3, if the argument to open-uri is http and the host redirects to https , then open-uri would throw an error.

> require 'open-uri'
> open('http://www.google.com/gmail')

RuntimeError: redirection forbidden: http://www.google.com/gmail -> https://www.google.com/gmail/

To get around this issue, we could use open_uri_redirections gem.

> require 'open-uri'
> require 'open_uri_redirections'
> open('http://www.google.com/gmail/', :allow_redirections => :safe)

=> #<Tempfile:/var/folders/jv/fxkfk9_10nb_964rvrszs2540000gn/T/open-uri20170228-41042-2fffoa>

Ruby 2.4

In Ruby 2.4, this issue is fixed. So now http to https redirection is possible using open-uri.

> require 'open-uri'
> open('http://www.google.com/gmail')
=> #<Tempfile:/var/folders/jv/fxkfk9_10nb_964rvrszs2540000gn/T/open-uri20170228-41077-1bkm1dv>

Note that redirection from https to http will raise an error, like it did in previous versions, since that has possible security concerns.

Ruby 2.4 now has Dir.empty? and File.empty? methods

This blog is part of our Ruby 2.4 series.

In Ruby, to check if a given directory is empty or not, we check it as

Dir.entries("/usr/lib").size == 2       #=> false
Dir.entries("/home").size == 2          #=> true

Every directory in Unix filesystem contains at least two entries. These are .(current directory) and ..(parent directory).

Hence, the code above checks if there are only two entries and if so, consider a directory empty.

Again, this code only works for UNIX filesystems and fails on Windows machines, as Windows directories don’t have . or ...

Dir.empty?

Considering all this, Ruby has finally included a new method Dir.empty? that takes directory path as argument and returns boolean as an answer.

Here is an example.

Dir.empty?('/Users/rtdp/Documents/posts')   #=> true

Most importantly this method works correctly in all platforms.

File.empty?

To check if a file is empty, Ruby has File.zero? method. This checks if the file exists and has zero size.

File.zero?('/Users/rtdp/Documents/todo.txt')    #=> true

After introducing Dir.empty? it makes sense to add File.empty? as an alias to File.zero?

File.empty?('/Users/rtdp/Documents/todo.txt')    #=> true

Ruby 2.4 implements Integer#digits for extracting digits in place-value notation

This blog is part of our Ruby 2.4 series.

If we want to extract all the digits of an integer from right to left, the newly added Integer#digits method will come in handy.

567321.digits
#=> [1, 2, 3, 7, 6, 5]

567321.digits[3]
#=> 7

We can also supply a different base as an argument.

0123.digits(8)
#=> [3, 2, 1]

0xabcdef.digits(16)
#=> [15, 14, 13, 12, 11, 10]

Use case of digits

We can use Integer#digits to sum all the digits in an integer.

123.to_s.chars.map(&:to_i).sum
#=> 6

123.digits.sum
#=> 6

Also while calculating checksums like Luhn and Verhoeff, Integer#digits will help in reducing string allocation.

Ruby 2.4 adds Set#compare_by_identity and Set#compare_by_identity? methods

This blog is part of our Ruby 2.4 series.

In Ruby, Object#equal? method is used to compare two objects by their identity, that is, the two objects are exactly the same or not. Ruby also has Object#eql? method which returns true if two objects have the same value.

For example:

str1 = "Sample string"
str2 = str1.dup

str1.eql?(str2)     #=> true

str1.equal?(str2)   #=> false

We can see that object ids of the objects are not same.

str1.object_id      #=> 70334175057920

str2.object_id      #=> 70334195702480

In ruby, Set does not allow duplicate items in its collection. To determine if two items are equal or not in a Set ruby uses Object#eql? and not Object#equal?.

So if we want to add two different objects with the same values in a set, that would not have been possible prior to Ruby 2.4 .

Ruby 2.3

require 'set'

set = Set.new           #=> #<Set: {}>

str1 = "Sample string"  #=> "Sample string"
str2 = str1.dup         #=> "Sample string"

set.add(str1)           #=> #<Set: {"Sample string"}>
set.add(str2)           #=> #<Set: {"Sample string"}>

But with the new Set#compare_by_identity method introduced in Ruby 2.4, sets can now compare its values using Object#equal? and check for the exact same objects.

Ruby 2.4

require 'set'

set = Set.new.compare_by_identity           #=> #<Set: {}>

str1 = "Sample string"                      #=> "Sample string"
str2 = str1.dup                             #=> "Sample string"

set.add(str1)                               #=> #<Set: {"Sample string"}>
set.add(str2)                               #=> #<Set: {"Sample string", "Sample string"}>

Set#compare_by_identity?

Ruby 2.4 also provides the compare_by_identity? method to know if the set will compare its elements by their identity.

require 'set'

set1= Set.new                          #=> #<Set: {}>
set2= Set.new.compare_by_identity      #=> #<Set: {}>

set1.compare_by_identity?              #=> false

set2.compare_by_identity?              #=> true

Ruby 2.4 adds support for extracting named capture groups using MatchData#values_at

This blog is part of our Ruby 2.4 series.

Ruby 2.3

We can use MatchData#[] to extract named capture and positional capture groups.

pattern=/(?<number>\d+) (?<word>\w+)/
pattern.match('100 thousand')[:number]
#=> "100"

pattern=/(\d+) (\w+)/
pattern.match('100 thousand')[2]
#=> "thousand"

Positional capture groups could also be extracted using MatchData#values_at.

pattern=/(\d+) (\w+)/
pattern.match('100 thousand').values_at(2)
#=> ["thousand"]

Changes in Ruby 2.4

In Ruby 2.4, we can pass string or symbol to extract named capture groups to method #values_at.

pattern=/(?<number>\d+) (?<word>\w+)/
pattern.match('100 thousand').values_at(:number)
#=> ["100"]

Ruby 2.4 adds infinite? and finite? methods to Numeric

This blog is part of our Ruby 2.4 series.

Prior to Ruby 2.4

Prior to Ruby 2.4, Float and BigDecimal responded to methods infinite? and finite?, whereas Fixnum and Bignum did not.

Ruby 2.3

#infinite?

5.0.infinite?
=> nil

Float::INFINITY.infinite?
=> 1

5.infinite?
NoMethodError: undefined method `infinite?' for 5:Fixnum
#finite?

5.0.finite?
=> true

5.finite?
NoMethodError: undefined method `finite?' for 5:Fixnum

Ruby 2.4

To make behavior for all the numeric values to be consistent, infinite? and finite? were added to Fixnum and Bignum even though they would always return nil.

This gives us ability to call these methods irrespective of whether they are simple numbers or floating numbers.

#infinite?

5.0.infinite?
=> nil

Float::INFINITY.infinite?
=> 1

5.infinite?
=> nil
#finite?

5.0.finite?
=> true

5.finite?
=> true

Ruby 2.4 adds Comparable#clamp method

This blog is part of our Ruby 2.4 series.

In Ruby 2.4, clamp method is added to the Comparable module. This method can be used to clamp an object within a specific range of values.

clamp method takes min and max as two arguments to define the range of values in which the given argument should be clamped.

Clamping numbers

clamp can be used to keep a number within the range of min, max.

10.clamp(5, 20)
=> 10

10.clamp(15, 20)
=> 15

10.clamp(0, 5)
=> 5

Clamping strings

Similarly, strings can also be clamped within a range.

"e".clamp("a", "s")
=> "e"

"e".clamp("f", "s")
=> "f"

"e".clamp("a", "c")
=> "c"

"this".clamp("thief", "thin")
=> "thin"

Internally, this method relies on applying the spaceship <=> operator between the object and the min & max arguments.

if x <=> min < 0, x = min; 
if x <=> max > 0 , x = max
else x

Ruby 2.4 introduces liberal_parsing option for parsing bad CSV data

This blog is part of our Ruby 2.4 series.

Comma-Separated Values (CSV) is a widely used data format and almost every langauge has a module to parse it. In Ruby, we have CSV class to do that.

According to RFC 4180, we cannot have unescaped double quotes in CSV input since such data can’t be parsed.

We get MalformedCSVError error when the CSV data does not conform to RFC 4180.

Ruby 2.4 has added a liberal parsing option to parse such bad data. When it is set to true, Ruby will try to parse the data even when the data does not conform to RFC 4180.

# Before Ruby 2.4

> CSV.parse_line('one,two",three,four')

CSV::MalformedCSVError: Illegal quoting in line 1.


# With Ruby 2.4

> CSV.parse_line('one,two",three,four', liberal_parsing: true)

=> ["one", "two\"", "three", "four"]

Passing block with Enumerable#chunk is not mandatory in Ruby 2.4

This blog is part of our Ruby 2.4 series.

Enumerable#chunk method can be used on enumerator object to group consecutive items based on the value returned from the block passed to it.

[1, 4, 7, 10, 2, 6, 15].chunk { |item| item > 5 }.each { |values| p values }

=> [false, [1, 4]]
[true, [7, 10]]
[false, [2]]
[true, [6, 15]]

Prior to Ruby 2.4, passing a block to chunk method was must.

array = [1,2,3,4,5,6]
array.chunk

=> ArgumentError: no block given

Enumerable#chunk without block in Ruby 2.4

In Ruby 2.4, we will be able to use chunk without passing block. It just returns the enumerator object which we can use to chain further operations.

array = [1,2,3,4,5,6]
array.chunk

=> <Enumerator: [1, 2, 3, 4, 5, 6]:chunk>

Reasons for this change

Let’s take the case of listing consecutive integers in an array of ranges.

# Before Ruby 2.4

integers = [1,2,4,5,6,7,9,13]

integers.enum_for(:chunk).with_index { |x, idx| x - idx }.map do |diff, group|
  [group.first, group.last]
end

=> [[1,2],[4,7],[9,9],[13,13]]

We had to use enum_for here as chunk can’t be called without block.

enum_for creates a new enumerator object which will enumerate by calling the method passed to it. In this case the method passed was chunk.

With Ruby 2.4, we can use chunk method directly without using enum_for as it does not require a block to be passed.

# Ruby 2.4

integers = [1,2,4,5,6,7,9,13]

integers.chunk.with_index { |x, idx| x - idx }.map do |diff, group|
  [group.first, group.last]
end

=> [[1,2],[4,7],[9,9],[13,13]]

Ruby 2.4 unifies Fixnum and Bignum into Integer

This blog is part of our Ruby 2.4 series.

Ruby uses Fixnum class for representing small numbers and Bignum class for big numbers.

# Before Ruby 2.4

1.class         #=> Fixnum
(2 ** 62).class #=> Bignum

In general routine work we don’t have to worry about whether the number we are dealing with is Bignum or Fixnum. It’s just an implementation detail.

Interestingly, Ruby also has Integer class which is superclass for Fixnum and Bignum.

Starting with Ruby 2.4, Fixnum and Bignum are unified into Integer.

# Ruby 2.4

1.class         #=> Integer
(2 ** 62).class #=> Integer

Starting with Ruby 2.4 usage of Fixnum and Bignum constants is deprecated.

# Ruby 2.4

>> Fixnum
(irb):6: warning: constant ::Fixnum is deprecated
=> Integer

>> Bignum
(irb):7: warning: constant ::Bignum is deprecated
=> Integer

How to know if a number is Fixnum, Bignum or Integer?

We don’t have to worry about this change most of the times in our application code. But libraries like Rails use the class of numbers for taking certain decisions. These libraries need to support both Ruby 2.4 and previous versions of Ruby.

Easiest way to know whether the Ruby version is using integer unification or not is to check class of 1.

# Ruby 2.4

1.class #=> Integer

# Before Ruby 2.4
1.class #=> Fixnum

Look at PR #25056 to see how Rails is handling this case.

Similarly Arel is also supporting both Ruby 2.4 and previous versions of Ruby.

Ruby 2.4 implements Array#min and Array#max

This blog is part of our Ruby 2.4 series.

Ruby has Enumerable#min and Enumerable#max which can be used to find the minimum and the maximum value in an Array.

(1..10).to_a.max
#=> 10
(1..10).to_a.method(:max)
#=> #<Method: Array(Enumerable)#max>

Ruby 2.4 adds Array#min and Array#max which are much faster than Enumerable#max and Enuermable#min.

Following benchmark is based on https://blog.blockscore.com/new-features-in-ruby-2-4 .

Benchmark.ips do |bench|
  NUM1 = 1_000_000.times.map { rand }
  NUM2 = NUM1.dup

  ENUM_MAX = Enumerable.instance_method(:max).bind(NUM1)
  ARRAY_MAX = Array.instance_method(:max).bind(NUM2)

  bench.report('Enumerable#max') do
    ENUM_MAX.call
  end

  bench.report('Array#max') do
    ARRAY_MAX.call
  end

  bench.compare!
end

Warming up --------------------------------------
      Enumerable#max     1.000  i/100ms
           Array#max     2.000  i/100ms
Calculating -------------------------------------
      Enumerable#max     17.569  (± 5.7%) i/s -     88.000  in   5.026996s
           Array#max     26.703  (± 3.7%) i/s -    134.000  in   5.032562s

Comparison:
           Array#max:       26.7 i/s
      Enumerable#max:       17.6 i/s - 1.52x  slower

Benchmark.ips do |bench|
  NUM1 = 1_000_000.times.map { rand }
  NUM2 = NUM1.dup

  ENUM_MIN = Enumerable.instance_method(:min).bind(NUM1)
  ARRAY_MIN = Array.instance_method(:min).bind(NUM2)

  bench.report('Enumerable#min') do
    ENUM_MIN.call
  end

  bench.report('Array#min') do
    ARRAY_MIN.call
  end

  bench.compare!
end

Warming up --------------------------------------
      Enumerable#min     1.000  i/100ms
           Array#min     2.000  i/100ms
Calculating -------------------------------------
      Enumerable#min     18.621  (± 5.4%) i/s -     93.000  in   5.007244s
           Array#min     26.902  (± 3.7%) i/s -    136.000  in   5.064815s

Comparison:
           Array#min:       26.9 i/s
      Enumerable#min:       18.6 i/s - 1.44x  slower

This benchmark shows that the new methods Array#max and Array#min are about 1.5 times faster than Enumerable#max and Enumerable#min.

Similar to Enumerable#max and Enumerable#min, Array#max and Array#min also assumes that the objects use Comparable mixin to define spaceship <=> operator for comparing the elements.

Hunting down a memory leak in shoryuken

This is a story of how we found and fixed memory leak in shoryuken.

We use shoryuken to process SQS messages inside of docker containers. A while back we noticed that memory was growing without bound. After every few days, we had to restart all the docker containers as a temporary workaround.

Since the workers were inside of a docker container we had limited tools. So we went ahead with the UNIX way of investigating the issue.

First we noticed that the number of threads inside the worker was high, 115 in our case. shoryuken boots up all the worker threads at startup.

# ps --no-header uH p <PID> | wc -l
#=> 115

The proc filesystem exposes a lot of useful information of all the running processes. ``/proc/[pid]/task` directory has information about all the threads of a process.

Some of the threads with lower ID’s were executing syscall 23 (select) and 271 (ppoll). These threads were waiting for a message to arrive in the SQS queue, but most of the threads were executing syscall 202 (futex).

At this point we had an idea about the root cause of the memory leak - it was due to the worker starting a lot of threads which were not getting terminated. We wanted to know how and when these threads are started.

Ruby 2.0.0 introduced tracepoint, which provides an interface to a lot of internal ruby events like when a exception is raised, when a method is called or when a method returns, etc.

We added the following code to our workers.

tc = TracePoint.new(:thread_begin, :thread_end) do |tp|
  puts tp.event
  puts tp.self.class
end

tc.enable

Executing the ruby workers with tracing enabled revealed that a new Celluloid::Thread was being created before each method was processed and that thread was never terminated. Hence the number of zombie threads in the worker was growing with the number messages processed.

thread_begin
Celluloid::Thread
[development] [306203a5-3c07-4174-b974-77390e8a4fc3] SQS Message: ...snip...

thread_begin
Celluloid::Thread
[development] [2ce2ed3b-d314-46f1-895a-f1468a8db71e] SQS Message: ...snip...

Unfortunately tracepoint didn’t pinpoint the place where the thread was started, hence we added a couple of puts statements to investigate the issue futher.

After a lot of debugging, we were able to find that a new thread was started to increase the visibility time of the SQS message in a shoryuken middleware when auto_visibility_timeout was true.

The fix was to terminate the thread after the work is done.

Ruby 2.4 adds better support for extracting captured data from Regexp match results

This blog is part of our Ruby 2.4 series.

Ruby has MatchData type which is returned by Regexp#match and Regexp.last_match.

It has methods #names and #captures to return the names used for capturing and the actual captured data respectively.

pattern = /(?<number>\d+) (?<word>\w+)/
match_data = pattern.match('100 thousand')
#=> #<MatchData "100 thousand" number:"100" word:"thousand">

>> match_data.names
=> ["number", "word"]
>> match_data.captures
=> ["100", "thousand"]

If we want all named captures in a key value pair, we have to combine the result of names and captures.

match_data.names.zip(match_data.captures).to_h
#=> {"number"=>"100", "word"=>"thousand"}

Ruby 2.4 adds #named_captures which returns both the name and data of the capture groups.

pattern=/(?<number>\d+) (?<word>\w+)/
match_data = pattern.match('100 thousand')

match_data.named_captures
#=> {"number"=>"100", "word"=>"thousand"}

Ruby 2.4 implements Regexp#match? without polluting global variables

This blog is part of our Ruby 2.4 series.

Ruby has many ways to match with a regular expression.

Regexp#===

It returns true/false and sets the $~ global variable.

/stat/ === "case statements"
#=> true
$~
#=> #<MatchData "stat">

Regexp#=~

It returns integer position it matched or nil if no match. It also sets the $~ global variable.

/stat/ =~ "case statements"
#=> 5
$~
#=> #<MatchData "stat">

Regexp#match

It returns match data and also sets the $~ global variable.

/stat/.match("case statements")
#=> #<MatchData "stat">
$~
#=> #<MatchData "stat">

Ruby 2.4 adds Regexp#match?

This new method just returns true/false and does not set any global variables.

/case/.match?("case statements")
#=> true

So Regexp#match? is good option when we are only concerned with the fact that regex matches or not.

Regexp#match? is also faster than its counterparts as it reduces object allocation by not creating a back reference and changing $~.

require 'benchmark/ips'

Benchmark.ips do |bench|

  EMAIL_ADDR = 'disposable.style.email.with+symbol@example.com'
  EMAIL_REGEXP_DEVISE = /\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/

  bench.report('Regexp#===') do
    EMAIL_REGEXP_DEVISE === EMAIL_ADDR
  end

  bench.report('Regexp#=~') do
    EMAIL_REGEXP_DEVISE =~ EMAIL_ADDR
  end

  bench.report('Regexp#match') do
    EMAIL_REGEXP_DEVISE.match(EMAIL_ADDR)
  end

  bench.report('Regexp#match?') do
    EMAIL_REGEXP_DEVISE.match?(EMAIL_ADDR)
  end

  bench.compare!
end

#=> Warming up --------------------------------------
#=>          Regexp#===   103.876k i/100ms
#=>           Regexp#=~   105.843k i/100ms
#=>        Regexp#match    58.980k i/100ms
#=>       Regexp#match?   107.287k i/100ms
#=> Calculating -------------------------------------
#=>          Regexp#===      1.335M (± 9.5%) i/s -      6.648M in   5.038568s
#=>           Regexp#=~      1.369M (± 6.7%) i/s -      6.880M in   5.049481s
#=>        Regexp#match    709.152k (± 5.4%) i/s -      3.539M in   5.005514s
#=>       Regexp#match?      1.543M (± 4.6%) i/s -      7.725M in   5.018696s
#=>
#=> Comparison:
#=>       Regexp#match?:  1542589.9 i/s
#=>           Regexp#=~:  1369421.3 i/s - 1.13x  slower
#=>          Regexp#===:  1335450.3 i/s - 1.16x  slower
#=>        Regexp#match:   709151.7 i/s - 2.18x  slower

Ruby 2.4 implements Enumerable#sum

This blog is part of our Ruby 2.4 series.

It is a common use case to calculate sum of the elements of an array or values from a hash.

[1, 2, 3, 4] => 10
 
{a: 1, b: 6, c: -3} => 4

Active Support already implements Enumerable#sum

> [1, 2, 3, 4].sum
 #=> 10 

> {a: 1, b: 6, c: -3}.sum{ |k, v| v**2 }
 #=> 46 

> ['foo', 'bar'].sum # concatenation of strings
 #=> "foobar" 

> [[1], ['abc'], [6, 'qwe']].sum # concatenation of arrays
 #=> [1, "abc", 6, "qwe"]

Until Ruby 2.3, we had to use Active Support to use Enumerable#sum method or we could use #inject which is used by Active Support under the hood.

Ruby 2.4.0 implements Enumerable#sum as part of the language itself.

Let’s take a look at how sum method fares on some of the enumerable objects in Ruby 2.4.

> [1, 2, 3, 4].sum
 #=> 10 
 
> {a: 1, b: 6, c: -3}.sum { |k, v| v**2 }
 #=> 46 

> ['foo', 'bar'].sum 
 #=> TypeError: String can't be coerced into Integer

> [[1], ['abc'], [6, 'qwe']].sum 
 #=> TypeError: Array can't be coerced into Integer

As we can see, the behavior of Enumerable#sum from Ruby 2.4 is same as that of Active Support in case of numbers but not the same in case of string or array concatenation. Let’s see what is the difference and how we can make it work in Ruby 2.4 as well.

Understanding addition/concatenation identity

The Enumberable#sum method takes an optional argument which acts as an accumulator. Both Active Support and Ruby 2.4 accept this argument.

When identity argument is not passed, 0 is used as default accumulator in Ruby 2.4 whereas Active Support uses nil as default accumulator.

Hence in the cases of string and array concatenation, the error occurred in Ruby because the code attempts to add a string and array respectively to 0.

To overcome this, we need to pass proper addition/concatenation identity as an argument to the sum method.

The addition/concatenation identity of an object can be defined as the value with which calling + operation on an object returns the same object.

> ['foo', 'bar'].sum('')
 #=> "foobar"

> [[1], ['abc'], [6, 'qwe']].sum([])
 #=> [1, "abc", 6, "qwe"]
 

What about Rails ?

As we have seen earlier, Ruby 2.4 implements Enumerable#sum favouring numeric operations whereas also supporting non-numeric callers with the identity element. This behavior is not entirely same as that of Active Support. But still Active Support can make use of the native sum method whenever possible. There is already a pull request open which uses Enumerable#sum from Ruby whenever possible. This will help gain some performance boost as the Ruby’s method is implemented natively in C whereas that in Active Support is implemented in Ruby.

String#concat, Array#concat and String#prepend take multiple arguments in Ruby 2.4

This blog is part of our Ruby 2.4 series.

In Ruby, we use #concat to append a string to another string or an element to the array. We can also use #prepend to add a string at the beginning of a string.

Ruby 2.3

String#concat and Array#concat

string = "Good"
string.concat(" morning")
#=> "Good morning"

array = ['a', 'b', 'c']
array.concat(['d'])
#=> ["a", "b", "c", "d"]

String#prepend

string = "Morning"
string.prepend("Good ")
#=> "Good morning"

Before Ruby 2.4, we could pass only one argument to these methods. So we could not add multiple items in one shot.

string = "Good"
string.concat(" morning", " to", " you")
#=> ArgumentError: wrong number of arguments (given 3, expected 1)

Changes with Ruby 2.4

In Ruby 2.4, we can pass multiple arguments and Ruby processes each argument one by one.

String#concat and Array#concat

string = "Good"
string.concat(" morning", " to", " you")
#=> "Good morning to you"

array = ['a', 'b']
array.concat(['c'], ['d'])
#=> ["a", "b", "c", "d"]

String#prepend

string = "you"
string.prepend("Good ", "morning ", "to ")
#=> "Good morning to you"

These methods work even when no argument is passed unlike in previous versions of Ruby.

"Good".concat
#=> "Good"

Difference between concat and shovel << operator

Though shovel << operator can be used interchangably with concat when we are calling it once, there is a difference in the behavior when calling it multiple times.

str = "Ruby"
str << str
str
#=> "RubyRuby"

str = "Ruby"
str.concat str
str
#=> "RubyRuby"

str = "Ruby"
str << str << str
#=> "RubyRubyRubyRuby"

str = "Ruby"
str.concat str, str
str
#=> "RubyRubyRuby"

So concat behaves as appending present content to the caller twice. Whereas calling << twice is just sequence of binary operations. So the argument for the second call is output of the first << operation.

Hash#compact and Hash#compact! now part of Ruby 2.4

This blog is part of our Ruby 2.4 series.

It is a common use case to remove the nil values from a hash in Ruby.

{ "name" => "prathamesh", "email" => nil} => { "name" => "prathamesh" }

Active Support already has a solution for this in the form of Hash#compact and Hash#compact!.

hash = { "name" => "prathamesh", "email" => nil}
hash.compact #=> { "name" => "prathamesh" }
hash #=> { "name" => "prathamesh", "email" => nil}

hash.compact! #=> { "name" => "prathamesh" }
hash #=> { "name" => "prathamesh" }

Now, Ruby 2.4 will have these 2 methods in the language itself, so even those not using Rails or Active Support will be able to use them. Additionally it will also give performance boost over the Active Support versions because now these methods are implemented in C natively whereas the Active Support versions are in Ruby.

There is already a pull request open in Rails to use the native versions of these methods from Ruby 2.4 whenever available so that we will be able to use the performance boost.

Rails 5 blogs and the art of story telling

Between October 31,2015 and Sep 5, 2016 we wrote 80 blogs on changes in Rails 5.

Producing a blog every 4 days consistently over 310 days takes persistence and time - lots of it.

We needed to go through all the commits and then pick the ones which are worth writing about and then write about it. Going into this I knew it would be a hard task. Ruby on Rails is now a well crafted machine. In order to fully undertand what’s going on in the code base we need to spend sufficient time on it.

However I was surprised by the thing that turned to be the hardest - telling story of the code change.

Every commit has a story. There is a reason for it. The commit itself might be minor but that code change in itself does not tell the full story.

For example take this commit. This commit is so simple that you might think it is not worth writing about. However in order to fully understand what it does we need to tell the full story which was captured in this blog.

Or take the case of Rails 5 officially supports MariaDB . The blog captures the full story and not just the code that changed.

Now you might say that I have cherry picked blog posts that favor my case. So let’s pick a blog which is simple.

You might wonder what could go wrong with a blog like this. As it turns out, plenty. That’s because writing a blog also requires defining the boundary of the blog. Deciding what to include and what to leave out is hard. One gets a feel for it only after writing it. And after having typed the words on screen, pruning is hard.

A good written article is simple writing. The problem with article which are simple to readers is that - well it is simple. So it feels to readers that writing it must be simple. Nothing can be further from the truth. It takes a lot of hard work to produce anything simple. It’s true in writing. And it’s true in producing software.

Coming back to the “Skipping Mailer” blog, it took quite a bit of back and forth to bring the blog to its essence. So yes the final output is quite short but that does not mean that it took short amount of time to produce it.

Tell a story even if you have 10 seconds

John Lasseter was working as an animator at Disney in 1984. He was just fired from Disney for promoting computer animations at Disney. Lasseter joins Lucasfilm. Lucasfilm renamed itselfs to Pixar Graphics Group and sold itself to Steve Jobs for $5 million.

Lasseter was tasked with producing a short film that would show the power of what computer animations could do so that Pixar Graphics Group can get some projects like producing TV commercials with cartoon characters and earn some money. Lasseter needed to produce a short film for the upcoming computer graphics animation conference.

His initial idea was to have a short movie having a plotless character. He presented this idea to a conference in Brussels. There Belgian animator Raoul Servais commented in slightly harsh tone that

No matter how short it is, it should have a beginning, a middle, and an end. Don’t forget the story.

Lasseter complained that it’s a pretty short movie and there might not be time to present a story.

Raoul Servais replied

You can tell a story in ten seconds.

Lasseter started developing a character. He came up with the idea of Luxo Jr.

Here is final production of Luxo Jr.

Luxo Jr. was a major hit at the conference. Crowd was on its feet in applause even before the two minutes film was over. Remember this is 1986 and Computer Animation was not much advanced at that time and this was the first movie ever made with the use of just computer graphics.

Lasseter later said that when audience was watching the movie they forgot that they were watching a computer animated film because the story took over them. He learned the lesson that technology should enable better story telling and technology in itself divorced from story telling would not advance the cause of Pixar.

Later John Lasseter went on to produce hits like Toy Story, A bug’s life, Toy Story 2, Cars, Cars 2, Monsters Inc, Finding Nemo and many more.

So you see even a great John Lasseter had to be reminded to tell a story.

Actual content over bullet points

Jeff Bezos is so focused on knowing the full story that he banned usage of power point in internal meetings and discussions. As per him it is easy to hide behind bullet points in a power point presentation.

He insisted on writing the full story in word document and distribute it to meeting attendees. The meetings starts with everyone head down reading the document.

He is also known for saying that if we are building a feature then we first need to know how it would be presented to the consumers when it is unveiled. We need to know the story we are going to tell them. Without the story we won’t have full picture of what we are going to build.

Learning to tell story is a journey

I’m glad that during the last 310 days 16 people contributed to the blog posts. The process of writing the posts at times was frustrating for a bunch of them. They had done the work of digging into the code and had posted their findings. Continuously getting feedback to edit the blog to build a nice coherent story where each paragraph is an extension of the previous paragraph is a downer. Some were dismayed at why we are spending so much energy on a technical blog.

However in the end we all are happy that we underwent this exercise. We could see the initial draft of the blog and the final version and we all could see the difference.

By no means we have mastered the art of storytelling. It’s a long journey. However we believe we are on the right path. Hopefully in coming months and years we at BigBinary would be able to bring to you more stories from changes in Rails and other places.