Six Year Old Optional / Keyword Arguments bug

I recently conducted a workshop about Contributing to Open-Source at first-ever Rubyconf Philippines. In its introductory talk, I spoke about how Aaron Patterson, fixed a 6 year old bug about Optional Arguments, that existed in Rails.

Bug in ruby

Let’s try a small program.

class Lab

  def day
    puts 'invoked'
    'sunday'
  end

  def run
    day = day
  end

end

puts Lab.new.run

What do you think would be printed on your terminal when you run the above program.

If you are using ruby 2.1 or below then you will see nothing. Why is that ? That’s because of a bug in ruby.

This is bug number 9593 in ruby issue tracker.

In the statement day = day the left hand side variable assignment is stopping the call to method day. So the method day is never invoked.

Another variation of the same bug

class Lab

  def day
    puts 'invoked'
    'sunday'
  end

  def run( day: day)
  end

end

puts Lab.new.run

In the above case we are using the keyword argument feature added in Ruby 2.0 . If you are unfamiliar with keyword arguments feature of ruby then checkout this excellent video by Peter Cooper.

In this case again the same behavior is exhibited. The method day is never invoked.

How this bug affects Rails community

You might be thinking that I would never write code like that. Why would you have a variable name same as method name.

Well Rails had this bug because rails has code like this.

def has_cached_counter?(reflection = reflection)
end

In this case method reflection never got called and the variable reflection was always assigned nil.

Fixing the bug

Nobu fixed this bug in ruby 2.2.0. By the way Nobu is also known as “ruby patch-monster” because of amount of patches he applies to ruby.

So this bug is fixed in ruby 2.2.0. What about the people who are not using ruby 2.2.0.

The simple solution is not to omit the parameter. If we change the above code to

def has_cached_counter?(reflection = reflection())
end

then we are explicitly invoking the method reflection and the variable reflection will be assigned the output of method reflection.

And this is how Aaron Patterson fixed six years old bug.

How to deploy jekyll site to heroku

jekyll is an excellent tool for creating static pages and blogs. Our BigBinary blog is based on jekyll. Deploying our blog to heroku took longer than I had expected. I am outlining what I did to deploy BigBinary blog to heroku.

Add exclude vendor to _config.yml

Open _config.yml and add following line at the very bottom.

exclude: ['vendor']

Add Procfile

Create a new file called Procfile at the root of the project with following content.

web: bundle exec jekyll build && bundle exec thin start -p$PORT -V
console: echo console
rake: echo rake

Add Gemfile

Add Gemfile at the root of the project.

source 'https://rubygems.org'

gem 'jekyll', '2.4.0'
gem 'rake'
gem 'foreman'
gem 'thin'
gem 'rack-contrib'

Add config.ru

Add config.ru at the root of the project with following content.

require 'rack/contrib/try_static'

use Rack::TryStatic,
    :root => "_site",
    :urls => %w[/],
    :try => ['.html', 'index.html', '/index.html']

run lambda { |env|
  return [404, {'Content-Type' => 'text/html'}, ['Not Found']]
}

Test on local machine first

Test locally by executing bundle exec jekyll serve.

Push code to heroku

Now run bundle install and add the Gemfile.lock to the repository and push the repository to heroku.

How to add additional directories to test

In a project we needed to write different parsers for different services. Rather than putting all those parsers in app/models or in lib we created a new directory. We put all the parsers in app/parsers .

We put all the tests for these parsers in test/parsers directory.

We can run tests parsers individually by executing rake test test/parsers/email_parser_test.rb. However when we run rake then tests in test/parsers are not picked up.

We added following code to Rakefile to make rake pickup tests in test/parsers.

# Adding test/parsers directory to rake test.
namespace :test do
  desc "Test tests/parsers/* code"
  Rails::TestTask.new(parsers: 'test:prepare') do |t|
    t.pattern = 'test/parsers/**/*_test.rb'
  end
end

Rake::Task['test:run'].enhance ["test:parsers"]

Now when we run rake or rake test then tests under test/parsers are also picked up.

Above code adds a rake task rake test:parsers which would run all tests under test/parsers directory.

We can see this task by execute rake -T test.

$ rake -T test
rake test         # Runs test:units, test:functionals, test:integration together
rake test:all     # Run tests quickly by merging all types and not resetting db
rake test:all:db  # Run tests quickly, but also reset db
rake test:parsers # Test tests/parsers/* code
rake test:recent  # Run tests for {:recent=>["test:deprecated", "test:prepare"]} / Deprecated; Test recent changes

Configuring Log Formatting in Rails

Ideally we should be logging an exception in Rails like this.

begin
  raise "Amount must be more than zero"
rescue => exception
  Rails.logger.info exception
end

Above code would produce one line log message as shown below.

Amount must be more than zero

In order to get backtrace and other information about the exception we need to handle logging like this.

begin
  raise "Amount must be more than zero"
rescue => exception
  Rails.logger.info exception.class.to_s
  Rails.logger.info exception.to_s
  Rails.logger.info exception.backtrace.join("\n")
end

Above code would produce following log message.

RuntimeError
Amount must be more than zero
/Users/nsingh/code/bigbinary_llc/wheel/app/controllers/home_controller.rb:5:in `index'
/Users/nsingh/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/actionpack-4.0.2/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
/Users/nsingh/.rbenv/versions/2.0.0-p247

Now let’s look at why Rails logger does not produce detailed logging and what can be done about it.

A closer look at Formatters

When we use Rails.logger.info(exception) then the output is formatted by ActiveSupport::Logger::SimpleFormatter. It is a custom formatter defined by Rails that looks like this.

# Simple formatter which only displays the message.
class SimpleFormatter < ::Logger::Formatter
  # This method is invoked when a log event occurs
  def call(severity, timestamp, progname, msg)
    "#{String === msg ? msg : msg.inspect}\n"
  end
end

As we can see it inherits from Logger::Formatter defined by Ruby Logger . It then overrides call method which is originally defined as

#Format = "%s, [%s#%d] %5s -- %s: %s\n"
def call(severity, time, progname, msg)
  Format % [severity[0..0], format_datetime(time), $$, severity, progname,
    msg2str(msg)]
end

......
......

def msg2str(msg)
  case msg
  when ::String
    msg
  when ::Exception
    "#{ msg.message } (#{ msg.class })\n" <<
      (msg.backtrace || []).join("\n")
  else
    msg.inspect
  end
end

When exception object is passed to SimpleFormatter then msg.inspect is called and that’s why we see the exception message without any backtrace.

The problem is that Rails’s SimpleFormatter’s call method is a bit dumb compared to Ruby logger’s call method.

Ruby logger’s method has a special check for exception messages. If the message it is going to print is of class Exception then it prints backtrace also.In comparison SimpleFormatter just prints msg.inspect for objects of Exception class.

Configuring logger

This problem can be solved by using config.logger.

From Rails Configuring Guides we have

config.logger accepts a logger conforming to the interface of Log4r or the default Ruby Logger class. Defaults to an instance of ActiveSupport::Logger, with auto flushing off in production mode.

So now we can configure Rails logger to not to be SimpleFomatter and go back to ruby’s logger.

Let’s set config.logger = ::Logger.new(STDOUT) in config/application.rb and then try following code.

begin
  raise "Amount must be more than zero"
rescue => exception
  Rails.logger.info exception
end

Now above code produces following log message.

I, [2013-12-17T01:05:41.944859 #13537]  INFO -- : Amount must be more than zero (RuntimeError)
test_app/app/controllers/page_controller.rb:3:in `index'
/Users/sward/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/actionpack-4.0.2/lib/action_controller/metal/implicit_render.rb:4:in `send_action'
/Users/sward/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/actionpack-4.0.2/lib/abstract_controller/base.rb:189:in `process_action'
/Users/sward/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/actionpack-4.0.2/lib/action_controller/metal/rendering.rb:10:in `process_action'
...<snip>...

Sending log to STDOUT is also a good practice

As per http://12factor.net/logs, an application should not concern itself much with the kind of logging framework being used. The application should write log to STDOUT and logging frameworks should operate on log streams.

Displaying non repeating random records

For one of our clients we need to display random records from the database. That’s easy enough. We can use random() function.

Batch.published_and_featured.order('random()')
                            .paginate(per_page: 20, page: params[:page])

Here we are using PostgreSQL database but ,I believe, above query will also work on MySQL.

The problem here is that if the user clicks on next page then we will try to get next set of 20 random records. And since these records are truly random, sometimes the user might see the records which has already been seen in the first page.

The fix is to make it random but not truly random. It needs to be random with a seed.

Fix in MySQL

In MySQL we can pass seed directly to random() function.

Batch.published_and_featured.order('random(0.3)')
                            .paginate(per_page: 20, page: params[:page])

Fix in PostgreSQL

In PostgreSQL it is a little more cumbersome. We first need to set seed and then the subsequent query’s usage of random() will make use of seed value.

Batch.connection.execute "SELECT setseed(0.2)"
Batch.published_and_featured.order('random()')
                            .paginate(per_page: 20, page: params[:page])

Set seed value in before_action

For different user we should use different seed value and this value should be random. So we set the seed value in before_action.

def set_seed
  cookies[:random_seed] ||= SecureRandom.random_number
end

Now change the query to use the seed value and we are all set.

Active Record is still magical

WickedGoodRubyConf

I gave a talk at Wicked Good Ruby Conference conference. The conference was very well organized and I had a lot of fun meeting new people.

Confreaks has put out the video. Slides are below too. I’m sorry about the bad audio.

Boston in November is just awesome. I had a lot of fun driving around and enjoying the fall color.


Getting arguments passed to command

In previous blog we discussed ruby code where we used ps -ocommand. In this blog let’s discuss how to get arguments passed to a command.

What is the issue

In the referred blog we are trying to find if --force or -f argument was passed to the git push command.

The kernel knows the arguments that was passed to the command. So the only way to find that answer would be to to ask kernel what was the full command. The tool to deal with such issues is ps.

In order to play with ps command let’s write a simple ruby program first.

# sl.rb
puts Process.pid
puts Process.ppid
sleep 99999999

In terminal execute ruby sl.rb. In another terminal execute ps.

$ ps
  PID TTY           TIME CMD
82246 ttys000    0:00.51 -bash
87070 ttys000    0:00.04 ruby loop.rb a, b, c
82455 ttys001    0:00.40 -bash

So here I have two bash shell open in two different tabs in my terminal. First terminal tab is running s1.rb. The second terminal tab is running ps. In the second terminal we can see the the arguments that were passed to program s1.

By default ps lists all the processes belonging to the user executing the command and the processes started from the current terminal.

Option -p

ps -p87070 would show result only for the given process id.

$ ps -p 87070
  PID TTY           TIME CMD
87070 ttys000    0:00.04 ruby loop.rb a, b, c

We can pass more than on process id.

$ ps -o pid,command -p87070,82246
  PID COMMAND
82246 -bash
87070 ruby loop.rb a, b, c

Option -o

ps -o can be used to select the attributes that we want to be shown. For example I want only pids to be shown.

$ ps -o pid
  PID
82246
87070
82455

Now I want pid and command.

$ ps -o pid,command
  PID COMMAND
82246 -bash
87070 ruby loop.rb a, b, c
82455 -bash

I want result only for a certain process id.

$ ps -o command -p87070
COMMAND
ruby loop.rb a, b, c

Now we have the arguments that were passed to the command. This is the code that article was talking about.

For the sake of completeness let’s see a few more options.

Option -e

ps -e would list all processes.

$ ps -e
  PID TTY           TIME CMD
    1 ??         2:56.20 /sbin/launchd
   11 ??         0:01.90 /usr/libexec/UserEventAgent (System)
   12 ??         0:02.11 /usr/libexec/kextd
   14 ??         0:09.00 /usr/sbin/notifyd
   15 ??         0:05.81 /usr/sbin/securityd -i
   ........................................
   ........................................

Option -f

ps -f would list a lot more attributes including ppid.

$ ps -f
  UID   PID  PPID   C STIME   TTY           TIME CMD
  501 82246 82245   0  2:06PM ttys000    0:00.51 -bash
  501 87070 82246   0  4:54PM ttys000    0:00.04 ruby loop.rb a, b, c
  501 82455 82452   0  2:07PM ttys001    0:00.42 -bash

What is ppid

In previous blog we discussed ruby code where we used two things: ppid and ps -ocommand. In this blog let’s discuss ppid. ps -ocommand is discussed in the next blog.

Parent process id is ppid

We know that every process has a process id. This is usually referred as pid. In *nix world every process has a parent process. And in ruby the way to get the “process id” of the parent process is through ppid.

Let’s see it in action. Time to fire up irb.

irb(main):002:0> Process.pid
=> 83132
irb(main):003:0> Process.ppid
=> 82455

Now keep the irb session open and go to anther terminal tab. In this new tab execute pstree -p 83132

$ pstree -p 83132
-+= 00001 root /sbin/launchd
 \-+= 00151 nsingh /sbin/launchd
   \-+= 00189 nsingh /Applications/Utilities/Terminal.app/Contents/MacOS/Terminal -psn_0_45067
     \-+= 82452 root login -pf nsingh
       \-+= 82455 nsingh -bash
         \--= 83132 nsingh irb

If pstree is not available then you can easily install it using brew install pstree.

As you can see from the output the process id 83132 is at the very bottom of the tree. The parent process id is 82455 which belongs to “bash shell”.

In irb session when we did Process.ppid then we got the same value 82455.

Do not allow force push to master

At BigBinary we create a branch for every issue. We deploy that branch and only when it is approved that branch is merged into master.

Time to time we rebase the branch. And after rebasing we need to do force push to send the changes to github. And once in a while someone force pushes into master by mistake. We recommend to set push.default to current to avoid such issues but still sometimes force push does happen in master.

In order to prevent such mistakes in future we are using pre-push hook. This is a small ruby program which runs before any git push command. If you are force pushing to master then it will reject the push like this.

*************************************************************************
Your attempt to FORCE PUSH to MASTER has been rejected.
If you still want to FORCE PUSH then you need to ignore the pre_push git hook by executing following command.
git push master --force --no-verify
*************************************************************************

Requirements

pre-push hook was added to git in version 1.8.2. So you need git 1.8.2 or higher. You can easily upgrade git by executing brew upgrade git .

$ git --version
git version 1.8.2.3

Seting up hooks

In order for these hooks to kick in they need to be setup.

First step is to clone the repo to your local machine. Now open ~/.gitconfig and add following line.

[init]
  templatedir= /Users/neeraj/code/tiny_scripts/git-hooks

Change the value /Users/neeraj/code/tiny_scripts/git-hooks to match with the directory of your machine.

Making existing repositories aware of this hook

Now pre-push hook is setup. Any new repository that you clone will have the feature of not being able to force push to master.

But existing repositories do not know about this git-hook. To make existing repositories aware of this hook execute following command on all repositories.

$ git init
Reinitialized existing Git repository in /Users/nsingh/dev/projects/streetcommerce/.git/

Now if you look into the .git/hooks directory of your project you should see a file called pre-push.

$ ls .git/hooks/pre-push
.git/hooks/pre-push

It means this project is all set with pre-push hook.

New repositories

When you clone a repository then git init is invoked automatically and you will get pre-push already copied for you. So you are all set for all future repositories too.

How to ignore pre-push hook

To ignore pre-push hook all you need to do is

# Use following command to ignore pre-push check and to force update master.
git push master --force --no-verify

The hook is here .

How to keep your fork up-to-date

Let’s say that I’m forking repo rails/rails. After the repo has been forked to my repository I will clone it on my local machine.

git clone git@github.com:neerajdotname/rails.git

Now cd rails and execute git remote -v . This is what I see.

origin git@github.com:neerajdotname/rails.git (fetch)
origin git@github.com:neerajdotname/rails.git (push)

Now I will add upstream remote by executing following command.

git remote add upstream git@github.com/rails/rails.git

After having done that when I execute git remote -v then I see

origin git@github.com:neerajdotname/rails.git (fetch)
origin git@github.com:neerajdotname/rails.git (push)
upstream git://github.com/rails/rails.git (fetch)
upstream git://github.com/rails/rails.git (push)

Now I want to make some changes to the code. After all this is why I forked the repo.

Let’s say that I want to add exception handling to the forked code I have locally. Then I create a branch called exception-handling and make all your changes in this branch. The key here is to not to make any changes to master branch. I try to keep master of my forked repository in sync with the master of the original repository where I forked it.

So now let’s create a branch and I will put in all my changes there.

git checkout -b exception-handling

In the Gemfile I will use this code like this

gem 'rails', github: 'neerajdotname/rails', branch: 'exception-handling'

A month has passed. In the meantime rails master has tons of changes. I want those changes in my exception-handling branch. In order to achieve that first I need to bring my local master up-to-date with rails master.

I need to switch to master branch and then I need to execute following commands.

git checkout master
git fetch upstream
git rebase upstream/master
git push

Now the master of forked repository is in-sync with the master of rails/rails. Now that master is up-to-date I need to pull in the changes in master in my exception-handling branch.

git checkout exception-handling
git rebase master
git push -f

Now my branch exception-handling has my fix on top of rails master.

How to setup Pinch to Zoom for an image in RubyMotion

In this post we will see how to build “pinch to zoom” functionality to zoom in an image in RubyMotion.

First let’s add a UIViewController that is initialized with an image.

class ImageViewController < UIViewController
  def initWithImage(image)
    @image = image
  end
end

UIScrollView and UIImageView

Now, we will add a UIScrollView with frame size set to full screen size and some other properties as listed below.

scrollView = UIScrollView.alloc.initWithFrame(UIScreen.mainScreen.bounds)
scrollView.scrollEnabled = false
scrollView.clipsToBounds = true
scrollView.contentSize = @image.size
scrollView.minimumZoomScale = 1.0
scrollView.maximumZoomScale = 4.0
scrollView.zoomScale = 0.3

Create a new UIImageView and add it to the scrollView created above.

imageView = UIImageView.alloc.initWithImage(@image)
imageView.contentMode = UIViewContentModeScaleAspectFit
imageView.userInteractionEnabled = true
imageView.frame = scrollView.bounds

We are setting the image view’s content mode to UIViewContentModeScaleAspectFit. Content mode can be set to either UIViewContentModeScaleToFill, UIViewContentModeAspectFill or UIViewContentModeScaleAspectFit depending on what suits your app. By default, contentMode property for most views is set to UIViewContentModeScaleToFill, which causes the view’s contents to be scaled to fit the new frame size. This Apple doc explains this behavior.

We need to add the above imageView as a subview to our scrollView.

scrollView.addSubview(imageView)
self.view.addSubview(@scrollView)

This is how our controller looks with all the above additions.

class ImageViewController < UIViewController

  def initWithImage(image)
    @image = image
    scrollView = UIScrollView.alloc.initWithFrame(UIScreen.mainScreen.bounds)
    scrollView.scrollEnabled = false
    scrollView.clipsToBounds = true
    scrollView.contentSize = @image.size
    scrollView.minimumZoomScale = 1.0
    scrollView.maximumZoomScale = 4.0
    scrollView.zoomScale = 0.3
    scrollView.delegate = self

    imageView = UIImageView.alloc.initWithImage(@image)
    imageView.contentMode = UIViewContentModeScaleToFill
    imageView.userInteractionEnabled = true
    imageView.frame = scrollView.bounds
    init
  end

end

ScrollView delegate

We must set a delegate for our scroll view to support zooming. The delegate object must conform to the UIScrollViewDelegate protocol. This is the reason we are setting scrollView.delegate = self above. The delegate class must implement viewForZoomingInScrollView and scrollViewDidZoom methods.

def viewForZoomingInScrollView(scrollView)
  scrollView.subviews.first
end

def scrollViewDidZoom(scrollView)
  if scrollView.zoomScale != 1.0
    scrollView.scrollEnabled = true
  else
    scrollView.scrollEnabled = false
  end
end

These two methods added above allow the scrollView to support pinch to zoom.

Supporting orientation changes

There is one more thing to do if we want to support orientations changes. We need to add the following methods:

def shouldAutorotateToInterfaceOrientation(*)
  true
end

def viewDidLayoutSubviews
  @scrollView.frame = self.view.bounds
end

We have to set the scrollView’s frame to view bounds in viewDidLayoutSubviews so that the scrollView frame is resized when the device orientation changes.

That’s it. With all those changes now our app supports orientation change and now we are able to pinch and zoom images.

Fix image orientation issue in RubyMotion

I’m building an app using RubyMotion. When I take picture then it all looks good. However when the picture is posted on web then the orientation of the picture is different.

UIImage and UIImageOrientation

UIImage in iOS has a property called UIImageOrientation. Image orientation affects the way the image data is displayed when drawn. The api docs mention that by default, images are displayed in the up orientation. However, if the image has associated metadata (such as EXIF information), then this property contains the orientation indicated by that metadata.

After using UIImagePickerController to take an image using the iPhone camera, I was using BubbleWrap to send the image to a webserver. When the image is taken in landscape/portrait mode, then the image appeared fine when it is viewed in the browser. But, when the image is sent back via api and is shown on the iphone, the image is rotated by 90 degrees if the image is taken in portrait mode. In exif metadata, iOS incorrectly sets the orientation to UIImageOrientationRight .

Here is how I fixed the image orientation issue:

if image.imageOrientation == UIImageOrientationUp
  return_image = image
else
  UIGraphicsBeginImageContextWithOptions(image.size, false, image.scale)
  image.drawInRect([[0,0], image.size])
  normalized_image = UIImage.UIGraphicsGetImageFromCurrentImageContext
  UIGraphicsEndImageContext()
  return_image = normalized_image
end

First, we are checking the image orientation of the image we have in hand. If the image orientation is UIImageOrientationUp, we don’t have to change anything. Otherwise we are redrawing the image and returning the normalized image.

Visitor pattern and double dispatch in ruby

Let’s say that we have an AST that holds integer nodes. We want to print double the value of all nodes. We can do something like this

class IntegerNode
  def initialize(value)
    @value = value
  end

  def double
    @value * 2
  end
end

class Ast
  def initialize
    @nodes = []
    @nodes << IntegerNode.new(2)
    @nodes << IntegerNode.new(3)
  end

  def print_double
    @nodes.each do |node|
      puts node.double
    end
  end
end

ast = Ast.new
ast.print_double # => 4 6

Above solution works. Now let’s try to print triple the value. In order to do that we need to change class IntegerNode. And IntegerNode has knowledge of how to print triple value. Tomorrow if we have another node called FloatNode then that node will have knowledge about how to double and triple the value.

Nodes are merely storing information. And the representation of data should be separate from the data itself. So IntegerNode and FloatNode should not know about how to double and triple.

To take the data representation code out of nodes we can make use of visitor pattern . Visitor pattern uses double dispatch .

Before we look at “double dispatch” let’s first look at “single dispatch”.

Single dispatch

When we invoke a method in ruby we are using single dispatch. In single dispatch, method invocation is done based on a single criteria: class of the object. Most of the object oriented programming languages use single dispatch system.

In the following case method double is invoked solely based on the class of node.

node.double

Double dispatch

As the name suggests in the case of Double dispatch dispatching depends on two things: class of the object and the class of the input object.

Ruby inherently does not support “Double dispatch”. We will see how to get around that issue shortly. First let’s see an example in Java which support Double dispatch. Java supports method overloading which allows two methods with same name to differ only in the type of argument it receives.

class Node
   def double(Integer value); value *2; end
   def double(String value); Integer.parseInt(value) * 2; end
end

node.double(2)
node.double("51")

In the above case the method that would be invoked is decided based on two things: class of the object ( node ) and the class of the value (Integer or String). That’s why this is called Double dispatch.

In ruby we can’t have two methods with same name and different signature because the second method would override the first method. In order to get around that limitation usually the method name has class name. Let’s try to write above java code in ruby.

class Node
  def accept value
   method_name = "visit_#{value.class}"
   send method_name
  end

  def visit_Integer value
   value * 2
  end

  def visit_String value
    value.to_i * 2
  end
end

If the above code is not very clear then don’t worry. We are going to look at visitor pattern in ruby and that will make the above code clearer.

Visitor pattern

Now let’s get back to the problem of traversing the AST. This time we are going to use “Double dispatch” so that node information is separate from representation information.

In visitor pattern nodes define a method called accept. That method accepts the visitor and then that method calls visit on visitor passing itself as self.

Below is a concrete example of visitor pattern. You can see that IntegerNode has method accepts which takes an instance of visitor as argument. And then visit method of visitor is invoked.

class Node
  def accept visitor
    raise NotImpelementedError.new
  end
end

module Visitable
  def accept visitor
    visitor.visit self
  end
end

class IntegerNode < Node
  include Visitable

  attr_reader :value
  def initialize value
    @value = value
  end
end

class Ast < Node
  def initialize
    @nodes = []
    @nodes << IntegerNode.new(2)
    @nodes << IntegerNode.new(3)
  end

  def accept visitor
    @nodes.each do |node|
      node.accept visitor
    end
  end
end

class DoublerVisitor
  def visit subject
    puts subject.value * 2
  end
end

class TriplerVisitor
  def visit subject
    puts subject.value * 3
  end
end

ast = Ast.new
puts "Doubler:"
ast.accept DoublerVisitor.new
puts "Tripler:"
ast.accept TriplerVisitor.new

# =>
Doubler:
4
6
Tripler:
6
9

Above code used only IntegerNode. In the next example I have added StringNode. Now notice how the visit method changed. Now based on the class of the argument the method to dispatch is being decided.

class Node
  def accept visitor
    raise NotImpelementedError.new
  end
end

module Visitable
  def accept visitor
    visitor.visit(self)
  end
end

class IntegerNode < Node
  include Visitable

  attr_reader :value
  def initialize value
    @value = value
  end
end

class StringNode < Node
  include Visitable

  attr_reader :value
  def initialize value
    @value = value
  end
end

class Ast < Node
  def initialize
    @nodes = []
    @nodes << IntegerNode.new(2)
    @nodes << StringNode.new("3")
  end

  def accept visitor
    @nodes.each do |node|
      node.accept visitor
    end
  end
end

class BaseVisitor
  def visit subject
    method_name = "visit_#{subject.class}".intern
    send(method_name, subject )
  end
end

class DoublerVisitor < BaseVisitor
  def visit_IntegerNode subject
    puts subject.value * 2
  end

  def visit_StringNode subject
    puts subject.value.to_i * 2
  end
end

class TriplerVisitor < BaseVisitor
  def visit_IntegerNode subject
    puts subject.value * 3
  end

  def visit_StringNode subject
    puts subject.value.to_i * 3
  end
end

ast = Ast.new
puts "Doubler:"
ast.accept DoublerVisitor.new
puts "Tripler:"
ast.accept TriplerVisitor.new

# =>
Doubler:
4
6
Tripler:
6
9

Real world usage

Arel uses visitor pattern to build query tailored to the specific database. You can see that it has a visitor class for sqlite3, mysql and Postgresql.

You can read more about “double dispatch” in this article by Aaron Patterson.

Preload, Eagerload, Includes and Joins

Rails provides four different ways to load association data. In this blog we are going to look at each of those.

Preload

Preload loads the association data in a separate query.

User.preload(:posts).to_a

# =>
SELECT "users".* FROM "users"
SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" IN (1)

This is how includes loads data in the default case.

Since preload always generates two sql we can’t use posts table in where condition. Following query will result in an error.

User.preload(:posts).where("posts.desc='ruby is awesome'")

# =>
SQLite3::SQLException: no such column: posts.desc:
SELECT "users".* FROM "users"  WHERE (posts.desc='ruby is awesome')

With preload where clauses can be applied.

User.preload(:posts).where("users.name='Neeraj'")

# =>
SELECT "users".* FROM "users"  WHERE (users.name='Neeraj')
SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" IN (3)

Includes

Includes loads the association data in a separate query just like preload.

However it is smarter than preload. Above we saw that preload failed for query User.preload(:posts).where("posts.desc='ruby is awesome'"). Let’s try same with includes.

User.includes(:posts).where('posts.desc = "ruby is awesome"').to_a

# =>
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "posts"."id" AS t1_r0,
       "posts"."title" AS t1_r1,
       "posts"."user_id" AS t1_r2, "posts"."desc" AS t1_r3
FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"
WHERE (posts.desc = "ruby is awesome")

As you can see includes switches from using two separate queries to creating a single LEFT OUTER JOIN to get the data. And it also applied the supplied condition.

So includes changes from two queries to a single query in some cases. By default for a simple case it will use two queries. Let’s say that for some reason you want to force a simple includes case to use a single query instead of two. Use references to achieve that.

User.includes(:posts).references(:posts).to_a

# =>
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "posts"."id" AS t1_r0,
       "posts"."title" AS t1_r1,
       "posts"."user_id" AS t1_r2, "posts"."desc" AS t1_r3
FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"

In the above case a single query was done.

Eager load

eager loading loads all association in a single query using LEFT OUTER JOIN.

User.eager_load(:posts).to_a

# =>
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "posts"."id" AS t1_r0,
       "posts"."title" AS t1_r1, "posts"."user_id" AS t1_r2, "posts"."desc" AS t1_r3
FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"

This is exactly what includes does when it is forced to make a single query when where or order clause is using an attribute from posts table.

Joins

Joins brings association data using inner join.

User.joins(:posts)

# =>
SELECT "users".* FROM "users" INNER JOIN "posts" ON "posts"."user_id" = "users"."id"

In the above case no posts data is selected. Above query can also produce duplicate result. To see it let’s create some sample data.

def self.setup
  User.delete_all
  Post.delete_all

  u = User.create name: 'Neeraj'
  u.posts.create! title: 'ruby', desc: 'ruby is awesome'
  u.posts.create! title: 'rails', desc: 'rails is awesome'
  u.posts.create! title: 'JavaScript', desc: 'JavaScript is awesome'

  u = User.create name: 'Neil'
  u.posts.create! title: 'JavaScript', desc: 'Javascript is awesome'

  u = User.create name: 'Trisha'
end

With the above sample data if we execute User.joins(:posts) then this is the result we get

#<User id: 9, name: "Neeraj">
#<User id: 9, name: "Neeraj">
#<User id: 9, name: "Neeraj">
#<User id: 10, name: "Neil">

We can avoid the duplication by using distinct .

User.joins(:posts).select('distinct users.*').to_a

Also if we want to make use of attributes from posts table then we need to select them.

records = User.joins(:posts).select('distinct users.*, posts.title as posts_title').to_a
records.each do |user|
  puts user.name
  puts user.posts_title
end

Note that using joins means if you use user.posts then another query will be performed.

Set background and header for a form created using Formotion in RubyMotion

Formotion for Rubymotion makes it a breeze to create views with forms. I am building a rubymotion app and my login form uses formotion. I needed to set background color for my form and here is how you can set a background color for a form created using Formotion.

class LoginViewController < Formotion::FormController

  def viewDidLoad
    super
    view = UIView.alloc.init
    view.backgroundColor = 0x838E61.uicolor
    self.tableView.backgroundView = view
  end
end

After the login view is done loading, I’m creating a new UIView and setting its background color. Then this UIView object is set as the background view to formotion’s table view.

Setting header image

If you want to add some branding to the login form, you can add a image to the form’s header by adding the below code to viewDidLoad:

header_image = UIImage.imageNamed('header_image_name.png')
header_view = UIImageView.alloc.initWithImage(header_image)
self.tableView.tableHeaderView = header_view

We are creating a UIImageView and initializing it with the image we want to show in the header. Now, set the tableview’s tableHeaderView value to the UIImageView we created.

Cookies on Rails

Let’s see how session data is handled in Rails 3.2 .

If you generate a Rails application in 3.2 then ,by default, you will see a file at config/initializers/session_store.rb . The contents of this file is something like

Demo::Application.config.session_store :cookie_store, key: '_demo_session'

First thing this line is telling is to use cookie to store session information.

Second thing this line is telling is to use _demo_session as the key to store cookie data.

A single site can have cookies under different key. For example airbnb is using 14 different keys to store cookie data.

http://blog.bigbinary.com

airbnb cookies

Now let’s see how Rails 3.2.13 stores session information.

In my 3.2.13 version of Rails application I added following line to create session data.

session[:github_username] = 'neerajdotname'

Then I visit the action that executes above code. Now if I go and look for cookies for localhost:3000 then this is what I see .

demo session

As you can see I have only one cookie with key _demo_session .

The cookie has following data.

BAh7CEkiD3Nlc3Npb25faWQGOgZFRkkiJTgwZGFiNzhiYWZmYTc3NjU1ZmVmMGUxM2EzYmEyMDhhBjsAVEkiFGdpdGh1Yl91c2V
ybmFtZQY7AEZJIhJuZWVyYWpkb3RuYW1lBjsARkkiEF9jc3JmX3Rva2VuBjsARkkiMU1KTCs2dXVnRFo2R2NTdG5Kb3E2dm5Bcl
ZYRGJGbjJ1TXZEU0swamxyWU09BjsARg%3D%3D--b5bcce534ceab56616d4a215246e9eb1fc9984a4

Let’s open rails console and try to decipher this information.

content = 'BAh7CEkiD3Nlc3Npb25faWQGOgZFRkkiJTgwZGFiNzhiYWZmYTc3NjU1ZmVmMGUxM2EzYmEyMDhhBjsAVEkiFGdpdGh1Yl91c2V
ybmFtZQY7AEZJIhJuZWVyYWpkb3RuYW1lBjsARkkiEF9jc3JmX3Rva2VuBjsARkkiMU1KTCs2dXVnRFo2R2NTdG5Kb3E2dm5BclZYRGJGbjJ1T
XZEU0swamxyWU09BjsARg%3D%3D--b5bcce534ceab56616d4a215246e9eb1fc9984a4'

When the content is written to cookie then it is escaped. So first we need to unescape it.

> unescaped_content = URI.unescape(content)
=> "BAh7CEkiD3Nlc3Npb25faWQGOgZFRkkiJTgwZGFiNzhiYWZmYTc3NjU1ZmVmMGUxM2EzYmEyMDhhBjsAVEkiFGdpdGh1Yl91c2V
ybmFtZQY7AEZJIhJuZWVyYWpkb3RuYW1lBjsARkkiEF9jc3JmX3Rva2VuBjsARkkiMU1KTCs2dXVnRFo2R2NTdG5Kb3E2dm5BclZYRG
JGbjJ1TXZEU0swamxyWU09BjsARg==--b5bcce534ceab56616d4a215246e9eb1fc9984a4"

Notice that towards the end unescaped_content has -- . That is a separation marker. The value before -- is the real payload. The value after -- is digest of data.

> data, digest = unescaped_content.split('--')
=> ["BAh7CEkiD3Nlc3Npb25faWQGOgZFRkkiJTgwZGFiNzhiYWZmYTc3NjU1ZmVmMGUxM2EzYmEyMDhhBjsAVEkiFGdpdGh1Yl91c2V
ybmFtZQY7AEZJIhJuZWVyYWpkb3RuYW1lBjsARkkiEF9jc3JmX3Rva2VuBjsARkkiMU1KTCs2dXVnRFo2R2NTdG5Kb3E2dm5BclZYRGJ
GbjJ1TXZEU0swamxyWU09BjsARg==", "b5bcce534ceab56616d4a215246e9eb1fc9984a4"]

The data is Base64 encoded. So let’s unecode it.

> Marshal.load(::Base64.decode64(data))
=> {"session_id"=>"80dab78baffa77655fef0e13a3ba208a",
    "github_username"=>"neerajdotname",
    "_csrf_token"=>"MJL+6uugDZ6GcStnJoq6vnArVXDbFn2uMvDSK0jlrYM="}

So we are able to get the data that is stored in cookie. However we can’t tamper with the cookie because if we change the cookie data then the digest will not match.

Now let’s see how rails matches the digest.

In order to create the digest rails makes of use of config/initializer/secret_token.rb . In my case the file has following content.

Demo::Application.config.secret_token = '111111111111111111111111111111'

This secret token is used to create the digest.

> secret_token =  '111111111111111111111111111111'
> OpenSSL::HMAC.hexdigest(OpenSSL::Digest.const_get('SHA1').new, secret_token, data)
=> "b5bcce534ceab56616d4a215246e9eb1fc9984a4"

Notice that the result of above produces a value that is same as digest in earlier step. So if cookie data is tampered with then the digest match will fail. This is why it is absolute necessary that attacker should not be able to get access to secret_token value.

Did you notice that we can access the cookie data without needing secret_token. It means the data stored in cookie is not encrypted and anyone can see it. That is why it is recommended that application should not store any sensitive information in cookie .

In the previous example we used session to store and retrieve data from cookie. We can directly use cookie and that gives us a little bit more control.

cookies[:github_username] = 'neerajdotname'

Now if we look at cookie stored in browser then this is what we see.

update cookie

As you can see now we have two keys in our cookie. One created by session and the other one created by code written above.

Another thing to note is that the data stored for key github_username is not Base64encoded and it also does not have -- to separate the data from the digest. It means this type of cookie data can be tampered with by the user and the Rails application will not be able to detect that the data has been tampered with.

Now let’s try to sign the cookie data to make it tamper proof.

cookies.signed[:twitter_username] = 'neerajdotname'

Now let’s look at cookies in browser.

update cookies

This time we got data with another key twitter_username . Another thing to notice is that cookie data is signed and is tamper proof.

When we use session then behind the scene it uses cookies.signed. That’s why we end up seeing signed data for key _demo_session .

What happens when user tampers with signed cookie data.

Rails does not raise any exception. However when you try to access cookie data then nil is returned because the data has been tampered with.

Security should be on by default

session , by default, uses signed cookies which prevents any kind of tampering of data but the data is still visible to users. It means we can’t store sensitive information in session.

It would be nice if the session data is stored in encrypted format. And that’s the topic of our next discussion.

Rails 4 stores session data in encrypted format

If you generate a Rails application in Rails 4 then ,by default, you will see a file at config/initializers/session_store.rb . The contents of this file is something like

Demo::Application.config.session_store :cookie_store, key: '_demo_session'

Also you will notice that file at config/initializers/secret_token.rb looks like this .

Demo::Application.config.secret_key_base = 'b14e9b5b720f84fe02307ed16bc1a32ce6f089e10f7948422ccf3349d8ab586869c11958c70f46ab4cfd51f0d41043b7b249a74df7d53c7375d50f187750a0f5'

Notice that in Rails 3.2.x the key was secret_token. Now the key is secret_key_base .

session[:github_username] = 'neerajdotname'

cookies and site data

Cookie has following data.

RkxNUWo4NlBKakoyU1VqZWJIKzNaV0lQVVJwQjZhdUVTRnowVHppSVJ3Mk84TStoS1hndFZFNHlNaGw2RHBCc0ZiaEpsM0NtYTg4d
nptcjFaQWVJbUdOaFh5MVlCdWVmSHBMNWpKbkRKR0JrSU5KZFYwVjVyWTZ3aUNqSWxJM1RTMkQybEtPUFE5VDFsZVJyakx0dFh3PT
0tLTZ5NGIreU00Z0MyNnErS29SSGEyZkE9PQ%3D%3D--3f2fd67e4e7785933485a583720d29ba88bca15f

Let’s open rails console and try to decipher this information.

content = 'RkxNUWo4NlBKakoyU1VqZWJIKzNaV0lQVVJwQjZhdUVTRnowVHppSVJ3Mk84TStoS1hndFZFNHlNaGw2RHBCc0ZiaEpsM0NtYTg4d
nptcjFaQWVJbUdOaFh5MVlCdWVmSHBMNWpKbkRKR0JrSU5KZFYwVjVyWTZ3aUNqSWxJM1RTMkQybEtPUFE5VDFsZVJyakx0dFh3PT
0tLTZ5NGIreU00Z0MyNnErS29SSGEyZkE9PQ%3D%3D--3f2fd67e4e7785933485a583720d29ba88bca15f'

When the content is written to cookie then it is escaped. So first we need to unescape it.

unescaped_content = URI.unescape(content)
=> "RkxNUWo4NlBKakoyU1VqZWJIKzNaV0lQVVJwQjZhdUVTRnowVHppSVJ3Mk84TStoS1hndFZFNHlNaGw2RHBCc0ZiaEpsM0NtYTg4d
nptcjFaQWVJbUdOaFh5MVlCdWVmSHBMNWpKbkRKR0JrSU5KZFYwVjVyWTZ3aUNqSWxJM1RTMkQybEtPUFE5VDFsZVJyakx0dFh3PT 0tLTZ
5NGIreU00Z0MyNnErS29SSGEyZkE9PQ==--3f2fd67e4e7785933485a583720d29ba88bca15f"

Now we need secret_key_base value. And using that let’s build key_generator .

secret_key_base = 'b14e9b5b720f84fe02307ed16bc1a32ce6f089e10f7948422ccf3349d8ab586869c11958c70f46ab4cfd51f0d41043b7b249a74df7d53c7375d50f187750a0f5'
key_generator = ActiveSupport::KeyGenerator.new(secret_key_base, iterations: 1000)
key_generator = ActiveSupport::CachingKeyGenerator.new(key_generator)

Our MessageEncryptior needs two long random strings for encryption. So let’s generate two keys for encryptor.

secret = key_generator.generate_key('encrypted cookie')
sign_secret = key_generator.generate_key('signed encrypted cookie')
encryptor = ActiveSupport::MessageEncryptor.new(secret, sign_secret)

Now we can finally decipher the data.

data =  encryptor.decrypt_and_verify(unescaped_content)
puts data
=> neerajdotname

As you can see we need the secret_key_base to make sense out of cookie data. So in Rails 4 the session data will be encrypted ,by default.

Rails4 will transparently will upgrade cookies from unencrypted to encrypted cookies. This is a brilliant example of trivial choices removed by Rails.

Understanding instance exec in ruby

In ruby procs have lexical scoping. What does that even mean. Let’s start with a simple example.

square = lambda { x * x }
x = 20
puts square.call()
# => undefined local variable or method `x' for main:Object (NameError)

So even though variable x is present, the proc could not find it because when the code was read then x was missing .

Let’s fix the code.

x = 2
square = lambda { x * x }
x = 20
puts square.call()
# => 400

In the above case we got the answer. But the answer is 400 instead of 4 . That is because the proc binding refers to the variable x. The binding does not hold the value of the variable, it just holds the list of variables available. In this case the value of x happens to be 20 when the code was executed and the result is 400 .

x does not have to a variable. It could be a method. Check this out.

square = lambda { x * x }
def x
  20
end
puts square.call()
# => 400

In the above case x is a method definition. Notice that binding is smart enough to figure out that since no x variable is present let’s try and see if there is a method by name x .

Another example of lexical binding in procs

def square(p)
   x = 2
   puts p.call
end
x = 20
square(lambda { x * x })
#=> 400

In the above case the value of x is set as 20 at the code compile time. Don’t get fooled by x being 2 inside the method call. Inside the method call a new scope starts and the x inside the method is not the same x as outside .

Issues because of lexical scoping

Here is a simple case.

class Person
  code = proc { puts self }

  define_method :name do
    code.call()
  end
end

class Developer < Person
end

Person.new.name # => Person
Developer.new.name # => Person

In the above case when Developer.new.name is executed then output is Person. And that can cause problem. For example in Ruby on Rails at a number of places self is used to determine if the model that is being acted upon is STI or not. If the model is STI then for Developer the query will have an extra where clause like AND "people"."type" IN ('Developer') . So we need to find a solution so that self reports correctly for both Person and ‘Developer` .

instance_eval can change self

instance_eval can be used to change self. Here is refactored code using instance_eval .

class Person
  code = proc { puts self }

  define_method :name do
    self.class.instance_eval &code
  end
end

class Developer < Person
end

Person.new.name #=> Person
Developer.new.name #=> Developer

Above code produces right result. However instance_eval has one limitation. It does not accept arguments. Let’s change the proc to accept some arguments to test this theory out.

class Person
  code = proc { |greetings| puts greetings; puts self }

  define_method :name do
    self.class.instance_eval 'Good morning', &code
  end
end

class Developer < Person
end

Person.new.name
Developer.new.name

#=> wrong number of arguments (1 for 0) (ArgumentError)

In the above case we get an error. That’s because instance_eval does not accept arguments.

This is where instance_exec comes to rescue. It allows us to change self and it can also accept arguments.

instance_exec to rescue

Here is code refactored to use instance_exec .

class Person
  code = proc { |greetings| puts greetings; puts self }

  define_method :name do
    self.class.instance_exec 'Good morning', &code
  end
end

class Developer < Person
end

Person.new.name #=> Good morning Person
Developer.new.name #=> Good morning Developer

As you can see in the above code instance_exec reports correct self and the proc can also accept arguments .

Conclusion

I hope this article helps you understand why instance_exec is useful.

I scanned RubyOnRails source code and found around 26 usages of instance_exec . Look at the usage of instance_exec usage there to gain more understanding on this topic.

Rex, Rexical and Rails routing

Please read Journery into Rails routing to get a background on Rails routing discussion.

A new language

Let’s say that the route definition looks like this.

/page/:id(/:action)(.:format)

The task at hand is to develop a new programming language which will understand the rules of the route definitions. Since this language deals with routes let’s call this language Poutes . Well Pout sounds better so let’s roll with that.

It all begins with scanner

rexical is a gem which generates scanner generator. Notice that rexical is not a scanner itself. It will generate a scanner for the given rules. Let’s give it a try.

Create a folder called pout_language and in that folder create a file called pout_scanner.rex . Notice that the extension of the file is .rex .

class PoutScanner
end

Before we proceed any further, let’s compile to make sure it works.

$ gem install rexical
$ rex pout_scanner.rex -o pout_scanner.rb
$ ls
pout_scanner.rb pout_scanner.rex

While doing gem install do not do gem install rex . We are installing gem called rexical not rex .

Time to add rules

Now it’s time to add rules to our pout.rex file.

Let’s try to develop scanner which can detect difference between integers and strings .

class PoutScanner
rule
  \d+         { puts "Detected number" }
  [a-zA-Z]+   { puts "Detected string" }
end

Regenerate the scanner .

$ rex pout_scanner.rex -o pout_scanner.rb

Now let’s put the scanner to test . Let’s create pout.rb .

require './pout_scanner.rb'
class Pout
  @scanner = PoutScanner.new
  @scanner.tokenize("123")
end

You will get the error undefined method tokenize’ for # (NoMethodError)` .

To fix this error open pout_scanner.rex and add inner section like this .

class PoutScanner
rule
  \d+         { puts "Detected number" }
  [a-zA-Z]+   { puts "Detected string" }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Regenerate the scanner by executing rex pout_scanner.rex -o pout_scanner.rb . Now let’s try to run pout.rb file.

$ ruby pout.rb
Detected number

So this time we got some result.

Now let’s test for a string .

 require './pout_scanner.rb'

class Pout
  @scanner = PoutScanner.new
  @scanner.tokenize("hello")
end

$ ruby pout.rb
Detected string

So the scanner is rightly identifying string vs integer. We are going to add a lot more testing so let’s create a test file so that we do not have to keep changing the pout.rb file.

Tests and Rake file

This is our pout_test.rb file.

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_standalone_string
    assert_equal [[:STRING, 'hello']], @scanner.tokenize("hello")
  end
end

And this is our Rakefile file .

require 'rake'
require 'rake/testtask'

task :generate_scanner do
  `rex pout_scanner.rex -o pout_scanner.rb`
end

task :default => [:generate_scanner, :test_units]

desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
  t.pattern = '*_test.rb'
  t.verbose = true
  t.warning = true
}

Also let’s change the pout_scanner.rex file to return an array instead of puts statements . The array contains information about what type of element it is and the value .

class PoutScanner
rule
  \d+         { [:INTEGER, text.to_i] }
  [a-zA-Z]+   { [:STRING, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

With all this setup now all we need to do is write test and run rake .

Tests for integer

I added following test and it passed.

def test_standalone_integer
  assert_equal [[:INTEGER, 123]], @scanner.tokenize("123")
end

However following test failed .

def test_string_and_integer
  assert_equal [[:STRING, 'hello'], [:INTEGER, 123]], @scanner.tokenize("hello 123")
end

Test is failing with following message

  1) Error:
test_string_and_integer(PoutTest):
PoutScanner::ScanError: can not match: ' 123'

Notice that in the error message before 123 there is a space. So the scanner does not know how to handle space. Let’s fix that.

Here is the updated rule. We do not want any action to be taken when a space is detected. Now test is passing .

class PoutScanner
rule
  \s+
  \d+         { [:INTEGER, text.to_i] }
  [a-zA-Z]+   { [:STRING, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Back to routing business

Now that we have some background on how scanning works let’s get back to business at hand. The task is to properly parse a routing statement like /page/:id(/:action)(.:format) .

Test for slash

The simplest route is one with / . Let’s write a test and then rule for it.

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_just_slash
    assert_equal [[:SLASH, '/']], @scanner.tokenize("/")
  end

end

And here is the .rex file .

class PoutScanner
rule
  \/         { [:SLASH, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

Test for /page

Here is the test for /page .

def test_slash_and_literal
  assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
end

And here is the rule that was added .

 [a-zA-Z]+  { [:LITERAL, text] }

Test for /:page

Here is test for /:page .

def test_slash_and_symbol
  assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
end

And here are the rules .

rule
  \/          { [:SLASH, text]   }
  \:[a-zA-Z]+ { [:SYMBOL, text]  }
  [a-zA-Z]+   { [:LITERAL, text] }

Test for /(:page)

Here is test for /(:page) .

def test_symbol_with_paran
  assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
end

And here is the new rule

  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }

We’ll stop here and will look at the final set of files

Final files

This is Rakefile .

require 'rake'
require 'rake/testtask'

task :generate_scanner do
  `rex pout_scanner.rex -o pout_scanner.rb`
end

task :default => [:generate_scanner, :test_units]

desc "Run basic tests"
Rake::TestTask.new("test_units") { |t|
  t.pattern = '*_test.rb'
  t.verbose = true
  t.warning = true
}

This is pout_scanner.rex .

class PoutScanner
rule
  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }
  \/          { [:SLASH, text]   }
  \:[a-zA-Z]+ { [:SYMBOL, text]  }
  [a-zA-Z]+   { [:LITERAL, text] }

inner
  def tokenize(code)
    scan_setup(code)
    tokens = []
    while token = next_token
      tokens << token
    end
    tokens
  end
end

This is pout_test.rb .

require 'test/unit'
require './pout_scanner'

class PoutTest  < Test::Unit::TestCase
  def setup
    @scanner = PoutScanner.new
  end

  def test_just_slash
    assert_equal [[:SLASH, '/']] , @scanner.tokenize("/")
  end

  def test_slash_and_literal
    assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
  end

  def test_slash_and_symbol
    assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
  end

  def test_symbol_with_paran
    assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
  end
end

How scanner works

Here we used rex to generate the scanner. Now take a look that the pout_scanner.rb . Here is that file . Please take a look at this file and study the code. It is only 91 lines of code.

If you look at the code it is clear that scanning is not that hard. You can hand roll it without using a tool like rex . And that’s exactly what Aaron Patternson did in Journey . He hand rolled the scanner .

Conclusion

In this blog we saw how to use rex to build the scanner to read our routing statements . In the next blog we’ll see how to parse the routing statement and how to find the matching routing statement for a given url .

Journey into Rails Routing -- an under the hood look at how routing works

Following code was tested with edge rails (rails4) .

When a Rails application boots then it reads the config/routes.rb file. In your routes you might have code like this

Rails4demo::Application.routes.draw do
  root 'users#index'
  resources :users
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
  get '/logout' => 'sessions#destroy', :as => :logout
  get "/stories" => redirect("/photos")
end

In the above case there are five different routing statements. Rails needs to store all those routes in a manner such that later when url is ‘/photos/5’ then it should be able to find the right route statement that should handle the request.

In this article we are going to take a peek at how Rails handles the whole routing business.

Normalization in action

In order to compare various routing statements first all the routing statements need to be normalized to a standard format so that one can easily compare one route statement with another route statement.

Before we take a deep dive into how the normalization works lets first see some normalizations in action.

get call with defaults

Here we have following route

Rails4demo::Application.routes.draw do
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
end

After the normalization process the above routing statement is transformed into five different variables. The values for all those five variables is shown below.

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fd05e0cf7e8
           @defaults={:format=>"jpg", :controller=>"photos", :action=>"show"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007fd05e0cf7c0
           @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/photos/:id(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:format=>"jpg", :controller=>"photos", :action=>"show"}
as: nil
anchor: true

app is the application that will be executed if conditions are met. conditions are the conditions. Pay attention to :path_info in conditions. This is used by Rails to determine the right route statement. defaults are defaults and requirements are the constraints.

GET call with as

Here we have following route

Rails4demo::Application.routes.draw do
  get '/logout' => 'sessions#destroy', :as => :logout
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f8ded87e740
           @defaults={:controller=>"sessions", :action=>"destroy"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007f8ded87e718 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/logout(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:controller=>"sessions", :action=>"destroy"}
as: "logout"
anchor: true

Notice that in the above case as is populate with logout .

root call

Here we have following route

Rails4demo::Application.routes.draw do
  root 'users#index'
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fe91507f278
           @defaults={:controller=>"users", :action=>"index"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007fe91507f250 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {}
defaults: {:controller=>"users", :action=>"index"}
as: "root"
anchor: true

Notice that in the above case as is populated. And the path_info is / since this is the root url .

GET call with constraints

Here we have following route

Rails4demo::Application.routes.draw do
  #get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ }
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f8158e052c8
           @defaults={:controller=>"pictures", :action=>"show"},
           @glob_param=nil,
           @controller_class_names=#<ThreadSafe::Cache:0x007f8158e05278 @backend={},
           @default_proc=nil>>
conditions: {:path_info=>"/pictures/:id(.:format)", :required_defaults=>[:controller, :action], :request_method=>["GET"]}
requirements: {:id=>/[A-Z]\d{5}/}
defaults: {:controller=>"pictures", :action=>"show"}
as: nil
anchor: true

Notice that in the above case requirements is populated with constraints mentioned in the route definition .

get with a redirect

Here we have following route

Rails4demo::Application.routes.draw do
  get "/stories" => redirect("/posts")
end

After normalization above code gets following values

app: redirect(301, /posts)
conditions: {:path_info=>"/stories(.:format)", :required_defaults=>[], :request_method=>["GET"]}
requirements: {}
defaults: {}
as: "stories"
anchor: true

Notice that in the above case app is a simple redirect .

Resources

Here we have following route

Rails4demo::Application.routes.draw do
  resources :users
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a315c0
           @defaults={:action=>"index", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a31598 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"index", :controller=>"users"}
as: "users"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a4ef80
           @defaults={:action=>"create", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a4ef58 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users(.:format)", :required_defaults=>[:action, :controller], :request_method=>["POST"]}
defaults: {:action=>"create", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41b63790
           @defaults={:action=>"new", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41b63768 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/new(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"new", :controller=>"users"}
as: "new_user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41a10550
           @defaults={:action=>"edit", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41a10528 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id/edit(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"edit", :controller=>"users"}
as: "edit_user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41f31818
           @defaults={:action=>"show", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41f317f0 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"show", :controller=>"users"}
as: "user"

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d44a9bb70
           @defaults={:action=>"update", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d44a9bb48 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["PATCH"]}
defaults: {:action=>"update", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d41b17480
           @defaults={:action=>"update", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d41b17458 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["PUT"]}
defaults: {:action=>"update", :controller=>"users"}
as: nil

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007f9d439ddf68
           @defaults={:action=>"destroy", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007f9d439ddf40 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/:id(.:format)", :required_defaults=>[:action, :controller], :request_method=>["DELETE"]}
defaults: {:action=>"destroy", :controller=>"users"}
as: nil

In this case I omitted requirements and anchor for brevity .

Notice that a single routing statement resources :users created eight normalized routing statements. It means that resources statement is basically a short cut for defining all those eight routing statements .

Resources with only

Here we have following route

Rails4demo::Application.routes.draw do
  resources :users, only: :new
end

After normalization above code gets following values

app: #<ActionDispatch::Routing::RouteSet::Dispatcher:0x007fdf55043e40
           @defaults={:action=>"new", :controller=>"users"}, @glob_param=nil, @controller_class_names=#<ThreadSafe::Cache:0x007fdf55043e18 @backend={}, @default_proc=nil>>
conditions: {:path_info=>"/users/new(.:format)", :required_defaults=>[:action, :controller], :request_method=>["GET"]}
defaults: {:action=>"new", :controller=>"users"}
as: "new_user"

Because of only keyword only one routing statement was produced in this case.

Mapper

In Rails ActionDispatch::Routing::Mapper class is responsible for normalizing all routing statements.

module ActionDispatch
  module Routing
    class Mapper
      include Base
      include HttpHelpers
      include Redirection
      include Scoping
      include Concerns
      include Resources
    end
  end
end

Now let’s look at what these included modules do

Base

module Base
  def root (options = {})
  end

  def match
  end

  def mount(app, options = {})
  end

As you can see Base handles root, match and mount calls.

HttpHelpers

module HttpHelpers
  def get(*args, &block)
  end

  def post(*args, &block)
  end

  def patch(*args, &block)
  end

  def put(*args, &block)
  end

  def delete(*args, &block)
  end
end

HttpHelpers handles get, post, patch, put and delete .

Scoping

module Scoping
  def scope(*args)
  end

  def namespace(path, options = {})
  end

  def constraints(constraints = {})
  end
end

Resources

module Resources
  def resource(*resources, &block)
  end

  def resources(*resources, &block)
  end

  def collection
  end

  def member
  end

  def shallow
  end
end

Let’s put all the routes together

So now let’s look at all the routes definition together.

Rails4demo::Application.routes.draw do
  root 'users#index'
  get 'photos/:id' => 'photos#show', :defaults => { :format => 'jpg' }
  get '/logout' => 'sessions#destroy', :as => :logout
  get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ }
  get "/stories" => redirect("/posts")
  resources :users
end

Above routes definition produces following information. I am going to show info path info.

{ :path_info=>"/":path_info=>"/photos/:id(.:format)" }

{ :path_info=>"/logout(.:format)" }

{ :path_info=>"/pictures/:id(.:format) }

{ :path_info=>"/stories(.:format)" }

{ :path_info=>"/users(.:format), :request_method=>["GET"]}

{:path_info=>"/users(.:format)", :request_method=>["POST"]}

{:path_info=>"/users/new(.:format)", :request_method=>["GET"]}

{:path_info=>"/users/:id/edit(.:format)", :request_method=>["GET"]}

{:path_info=>"/users/:id(.:format)", :controller], :request_method=>["GET"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["PATCH"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["PUT"]}

{:path_info=>"/users/:id(.:format)", :request_method=>["DELETE"]}

How to find the matching route definition

So now that we have normalized the routing definitions the task at hand is to find the right route definition for the given url along with request_method.

For example if the requested page is /pictures/A12345 then the matching routing definition should be get 'pictures/:id' => 'pictures#show', :constraints => { :id => /[A-Z]\d{5}/ } .

In order to accomplish that I would do something like this.

I would convert all path info into a regular expression and I would push that regular expression in an array. So in this case I would have 12 regular expressions in the array and for the given url I would try to match one by one.

This strategy will work and this is how Rails worked all the way upto Rails 3.1 .

Aaron Patterson loves computer science

Aaron Patterson noticed that finding the best matching route definition for a given url is nothing else but pattern matching task. And computer science solved this problem much more elegantly and this happens to run faster also by building an AST and walking over it.

So he decided to make a mini language out of the route definitions . After all the route definitions , we write , follow certain rules.

And thus Journey was born.

In the next blog we will see how to write grammar rules for routing definitions , how to parse and then walk the ast to see the best match .

Life of save in ActiveRecord

Following code was tested with edge rails (rails4) .

In a RubyonRails application we save records often. It is one of the most used methods in ActiveRecord. In the blog we are going to take a look at the life cycle of save operation.

ActiveRecord::Base

A typical model looks like this.

class Article < ActiveRecord::Base
end

Now lets look at ActiveRecord::Base class in its entirety.

module ActiveRecord
  class Base
    extend ActiveModel::Naming

    extend ActiveSupport::Benchmarkable
    extend ActiveSupport::DescendantsTracker

    extend ConnectionHandling
    extend QueryCache::ClassMethods
    extend Querying
    extend Translation
    extend DynamicMatchers
    extend Explain

    include Persistence
    include ReadonlyAttributes
    include ModelSchema
    include Inheritance
    include Scoping
    include Sanitization
    include AttributeAssignment
    include ActiveModel::Conversion
    include Integration
    include Validations
    include CounterCache
    include Locking::Optimistic
    include Locking::Pessimistic
    include AttributeMethods
    include Callbacks
    include Timestamp
    include Associations
    include ActiveModel::SecurePassword
    include AutosaveAssociation
    include NestedAttributes
    include Aggregations
    include Transactions
    include Reflection
    include Serialization
    include Store
    include Core
  end

  ActiveSupport.run_load_hooks(:active_record, Base)
end

Base class extends and includes a lot of modules. Here we are going to look at the four modules that have method def save .

module ActiveRecord
  class Base
    ......................
    include Persistence
    .......................
    include Validations
    ........................
    include AttributeMethods
    ........................
    include Transactions
    ........................
  end
end

include Persistence

Module Persistence defines save method like this

def save(*)
  create_or_update
rescue ActiveRecord::RecordInvalid
  false
end

Now lets see method create_or_update .

def create_or_update
  raise ReadOnlyRecord if readonly?
  result = new_record? ? create_record : update_record
  result != false
end

So save method invokes create_or_update and create_or_update method either creates a record or updates a record. Dead simple.

include Validations

In module Validations the save method is defined as

def save(options={})
  perform_validations(options) ? super : false
end

In this case the save method simply invokes a call to perform_validations .

include AttributeMethods

Module AttributeMethods includes a bunch of modules like this

module ActiveRecord
  module AttributeMethods
    extend ActiveSupport::Concern
    include ActiveModel::AttributeMethods

    included do
      include Read
      include Write
      include BeforeTypeCast
      include Query
      include PrimaryKey
      include TimeZoneConversion
      include Dirty
      include Serialization
    end

Here we want to look at Dirty module which has save method defined as following.

def save(*)
  if status = super
    @previously_changed = changes
    @changed_attributes.clear
  end
  status
end

Since this module is all about tracking if a record is dirty or not, the save method tracks the changed values.

include Transactions

In module Transactions the save method is defined as

def save(*) #:nodoc:
  rollback_active_record_state! do
    with_transaction_returning_status { super }
  end
end

The method rollback_active_record_state! is defined as

def rollback_active_record_state!
  remember_transaction_record_state
  yield
rescue Exception
  restore_transaction_record_state
  raise
ensure
  clear_transaction_record_state
end

And the method with_transaction_returning_status is defined as

def with_transaction_returning_status
  status = nil
  self.class.transaction do
    add_to_transaction
    begin
      status = yield
    rescue ActiveRecord::Rollback
      @_start_transaction_state[:level] = (@_start_transaction_state[:level] || 0) - 1
      status = nil
    end

    raise ActiveRecord::Rollback unless status
  end
  status
end

Together methods rollback_active_record_state! and with_transaction_returning_status ensure that all the operations happening inside save is happening in a single transaction.

Why save method needs to be in a transaction .

A model can define a number of callbacks including after_save and before_save. All those callbacks are operated within a transaction. It means if an after_save callback operation raises an exception then the save operation is rolled back.

Not only that a number of associations like has_many and belongs_to use callbacks to handle association manipulation. In order to ensure the integrity of the operation the save operation is wrapped in a transaction .

reverse order of operation

In the Base class the modules are included in the following order.

module ActiveRecord
  class Base
    ......................
    include Persistence
    .......................
    include Validations
    ........................
    include AttributeMethods
    ........................
    include Transactions
    ........................
  end
end

All the four modules have save method. The way ruby works the last module to be included gets to act of the method first. So the order in which save method gets execute is Transactions, AttributeMethods, Validations and Persistence .

To get a visual feel, I added a puts inside each of the save methods. Here is the result.

> User.new.save
1.9.1 :001 > User.new.save
entering save in transactions
   (0.1ms)  begin transaction
entering save in attribute_methods
entering save in validations
entering save in persistence
  SQL (47.3ms)  INSERT INTO "users" ("created_at", "updated_at") VALUES (?, ?)  [["created_at", Mon, 21 Jan 2013 14:56:52 UTC +00:00], ["updated_at", Mon, 21 Jan 2013 14:56:52 UTC +00:00]]
leaving save in persistence
leaving save in validations
leaving save in attribute_methods
   (17.6ms)  rollback transaction
leaving save in transactions
 => nil

As you can see the order of operations is

entering save in transactions
entering save in attribute_methods
entering save in validations
entering save in persistence

leaving save in persistence
leaving save in validations
leaving save in attribute_methods
leaving save in transactions