Ruby 2.6 adds String#split with block

This blog is part of our Ruby 2.6 series.

Before Ruby 2.6, String#split returned array of split strings.

In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.

We will add method is_fruit? to understand how to use split with a block.

1def is_fruit?(value)
2%w(apple mango banana watermelon grapes guava lychee).include?(value)
3end

Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.

String#split

1input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
2
3splitted_values = input_str.split(", ")
4=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]
5
6fruits = splitted_values.select { |value| is_fruit?(value) }
7=> ["apple", "mango", "banana", "watermelon", "grapes"]

Using split an intermediate array is created which contains both fruits and vegetables names.

String#split with a block

1fruits = []
2
3input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
4
5input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
6=> "apple, mango, potato, banana, cabbage, watermelon, grapes"
7
8fruits
9=> ["apple", "mango", "banana", "watermelon", "grapes"]

When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.

Update

Benchmark

We created a large random string to benchmark performance of split and split with block

1require 'securerandom'
2
3test_string = ''
4
5100_000.times.each do
6test_string += SecureRandom.alphanumeric(10)
7test_string += ' '
8end

1require 'benchmark'
2
3Benchmark.bmbm do |bench|
4
5bench.report('split') do
6arr = test_string.split(' ')
7str_starts_with_a = arr.select { |str| str.start_with?('a') }
8end
9
10bench.report('split with block') do
11str_starts_with_a = []
12test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
13end
14
15end

Results

1Rehearsal ----------------------------------------------------
2split              0.023764   0.000911   0.024675 (  0.024686)
3split with block   0.012892   0.000553   0.013445 (  0.013486)
4------------------------------------------- total: 0.038120sec
5
6                       user     system      total        real
7split              0.024107   0.000487   0.024594 (  0.024622)
8split with block   0.010613   0.000334   0.010947 (  0.010991)

We did another iteration of benchmarking using benchmark/ips.

1require 'benchmark/ips'
2Benchmark.ips do |bench|
3
4bench.report('split') do
5splitted_arr = test_string.split(' ')
6str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
7end
8
9bench.report('split with block') do
10str_starts_with_a = []
11test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
12end
13
14bench.compare!
15end

Results

1Warming up --------------------------------------
2               split     4.000  i/100ms
3    split with block    10.000  i/100ms
4Calculating -------------------------------------
5               split     46.906  (± 2.1%) i/s -    236.000  in   5.033343s
6    split with block    107.301  (± 1.9%) i/s -    540.000  in   5.033614s
7
8Comparison:
9    split with block:      107.3 i/s
10               split:       46.9 i/s - 2.29x  slower

This benchmark shows that split with block is about 2 times faster than split.

Here is relevant commit and discussion for this change.

The Chinese version of this blog is available here.

If you liked this blog, you might also like the other blogs we have written. Check out the full archive.

MJIT Support in Ruby 2.6

Sudeep Tarlekar

March 5, 2019

Ruby 2.6 Range#cover? accepts Range object as argument

Abhay Nikam

October 24, 2018

Ruby 2.6 adds RubyVM::AST module

Amit Choudhary

October 2, 2018