Infinite hash and default_proc

I you already know how this infinite hash works then you are all set. If not read along.

Default value of Hash

If I want a hash to have a default value then that’s easy.

h = Hash.new(0)
puts h['usa'] #=> 0

Above code will give me a fixed value if key is not found. If I want dynamic value then I can use block form.

h = Hash.new{|h,k| h[k] = k.upcase}
puts h['usa'] #=> USA
puts h['india'] #=> INDIA

Default value is hash

If I want the default value to be a hash then it seems easy but it falls apart soon.

h = Hash.new{|h,k| h[k] = {} }
puts h['usa'].inspect #=> {}
puts h['usa']['ny'].inspect #=> nil
puts h['usa']['ny']['nyc'].inspect #=> NoMethodError: undefined method `[]' for nil:NilClass

In the above if a key is missing for h then it returns a hash. However that returned hash is an ordinary hash which does not have a capability of returning another hash if a key is missing.

This is where default_proc comes into picture. hash.default_proc returns the block which was passed to Hash.new .

h = Hash.new{|h,k| Hash.new(&h.default_proc)}
puts h['usa']['ny']['nyc'].inspect #=> {}

Mime type resolution in Rails

This is a long blog. If you want a summary then José Valim has provided a summary in less than 140 characters.

It is common to see following code in Rails

respond_to do |format|
  format.html
  format.xml  { render :xml => @users }
end

If you want output in xml format then request with .xml extension at the end like this localhost:3000/users.xml and you will get the output in xml format.

What we saw is only one part of the puzzle. The other side of the equation is HTTP header field Accept defined in HTTP RFC.

HTTP Header Field Accept

When browser sends a request then it also sends the information about what kind of resources the browser is capable of handling. Here are some of the examples of the Accept header a browser can send.

text/plain

image/gif, images/x-xbitmap, images/jpeg, application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, */*

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

application/vnd.wap.wmlscriptc, text/vnd.wap.wml, application/vnd.wap.xhtml+xml,
application/xhtml+xml, text/html, multipart/mixed, */*

If you are reading this blog on a browser then you can find out what kind of Accept header your browser is sending by visiting this link. Here is list of Accept header sent by different browsers on my machine.

Chrome: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Firefox: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,application/json
Safari: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
IE: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif,
image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*

Let’s take a look at the Accept header sent by Safari.

Safari: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Safari is saying that I can handle documents which are xml (application/xml), html (text/html) or plain text (text/plain) documents. And I can handle images such as image/png. If all else fails then send me whatever you can and I will try to render that document to the best of my ability.

Notice that there are also q values. That signifies the priority order. This is what HTTP spec has to say about q.

Each media-range MAY be followed by one or more accept-params, beginning with the “q” parameter for indicating a relative quality factor. The first “q” parameter (if any) separates the media-range parameter(s) from the accept-params. Quality factors allow the user or user agent to indicate the relative degree of preference for that media-range, using the qvalue scale from 0 to 1 (section 3.9). The default value is q=1.

The spec is saying is that each document type has a default value of q as 1. When q value is specified then take that value into account. For all documents that have same q value give high priority to the one that came first in the list. Based on that this should be the order in which documents should be sent to safari browser.

application/xml (q is 1)
application/xhtml+xml (q is 1)
image/png (q is 1)
text/html (q is 0.9)
text/plain (q is 0.8)
\*/\* (q is 0.5)

Notice that Safari is nice enough to put a lower priority for */*. Chrome and Firefox also puts */* at a lower priority which is a good thing. Not so with IE which does not declare any q value for */* .

Look at the order again and you can see that application/xml has higher priority over text/html. What it means is that safari is telling Rails that I would prefer application/xml over text/html. Send me text/html only if you cannot send application/xml.

And let’s say that you have developed a RESTful app which is capable of sending output in both html and xml formats.

Rails being a good HTTP citizen should follow the HTTP_ACCEPT protocol and should send an xml document in this case. Again all you did was visit a website and safari is telling rails that send me xml document over html document. Clearly HTTP_ACCEPT values being sent by Safari is broken.

HTTP_ACCEPT is broken

HTTP_ACCEPT attribute concept is neat. It defines the order and the priority. However the implementation is broken by all the browser vendors. Given the case that browsers do not send proper HTTP_ACCEPT what can rails do. One solution is to ignore it completely. If you want xml output then request http://localhost:3000/users.xml . Solely relying on formats make life easy and less buggy. This is what Rails did for a long time.

Starting this commit ,by default, rails did ignore HTTP_ACCEPT attribute. Same is true for Twitter API where HTTP_ACCEPT attribute is ignored and twitter solely relies on format to find out what kind of document should be returned.

Unfortunately this solution has its own sets of problems. Web has been there for a long time and there are a lot of applications who expect the response type to be RSS feed if they are sending application/rss+xml in their HTTP_ACCEPT attribute. It is not nice to take a hard stand and ask all of them to request with extension .rss .

Parsing HTTP_ACCEPT attribute

Parsing and obeying HTTP_ACCEPT attribute is filled with many edge cases. First let’s look at the code that decides what to parse and how to handle the data.

BROWSER_LIKE_ACCEPTS = /,\s*\*\/\*|\*\/\*\s*,/

def formats
  accept = @env['HTTP_ACCEPT']

  @env["action_dispatch.request.formats"] ||=
    if parameters[:format]
      Array(Mime[parameters[:format]])
    elsif xhr? || (accept && accept !~ BROWSER_LIKE_ACCEPTS)
      accepts
    else
      [Mime::HTML]
    end
end

Notice that if a format is passed like http://localhost:3000/users.xml or http://localhost:3000/users.js then Rails does not even parse the HTTP_ACCEPT values. Also note that if browser is sending */* along with other values then Rails totally bails out and just returns Mime::HTML unless the request is ajax request.

Next I am going to discuss some of the cases in greater detail which should bring more clarity around this issue.

Case 1: HTTP_ACCEPT is */*

I have following code.

respond_to do |format|
  format.html { render :text => 'this is html' }
  format.js  { render :text => 'this is js' }
end

I am assuming that HTTP_ACCEPT value is */* . In this case browser is saying that send me whatever you got. Since browser is not dictating the order in which documents should be sent Rails will look at the order in which Mime types are declared in respond_to block and will pick the first one. Here is the corresponding code

def negotiate_mime(order)
  formats.each do |priority|
    if priority == Mime::ALL
      return order.first
    elsif order.include?(priority)
      return priority
    end
  end

  order.include?(Mime::ALL) ? formats.first : nil
end

What it’s saying is that if Mime::ALL is sent then pick the first one declared in the respond_to block. So be careful with order in which formats are declared inside the respond_to block.

The order in which formats are declared can be real issue. Checkout these two cases where the author ran into issue because of the order in which formats are declared.

So far so good. However what if there is no respond_to block. If I don’t have respond_to block and if I have index.html.erb, index.js.erb and index.xml.builder files in my view directory then which one will be picked up. In this case Rails will go over all the registered formats in the order in which they are declared and will try to find a match . So in this case it matters in what order Mime types are registered. Here is the code that registers Mime types.

Mime::Type.register "text/html", :html, %w( application/xhtml+xml ), %w( xhtml )
Mime::Type.register "text/plain", :text, [], %w(txt)
Mime::Type.register "text/javascript", :js, %w( application/javascript application/x-javascript )
Mime::Type.register "text/css", :css
Mime::Type.register "text/calendar", :ics
Mime::Type.register "text/csv", :csv
Mime::Type.register "application/xml", :xml, %w( text/xml application/x-xml )
Mime::Type.register "application/rss+xml", :rss
Mime::Type.register "application/atom+xml", :atom
Mime::Type.register "application/x-yaml", :yaml, %w( text/yaml )

Mime::Type.register "multipart/form-data", :multipart_form
Mime::Type.register "application/x-www-form-urlencoded", :url_encoded_form

# http://www.ietf.org/rfc/rfc4627.txt
# http://www.json.org/JSONRequest.html
Mime::Type.register "application/json", :json, %w( text/x-json application/jsonrequest )

# Create Mime::ALL but do not add it to the SET.
Mime::ALL = Mime::Type.new("*/*", :all, [])

As you can see text/html is first in the list, text/javascript next and then application/xml. So Rails will look for view file in the following order: index.html.erb , index.js.erb and index.xml.builder .

Case 2: HTTP_ACCEPT with no */*

I am going to assume that in this case HTTP_ACCEPT sent by browser looks really simple like this

text/javascript, text/html, text/plain

I am also assuming that my respond_to block looks like this

respond_to do |format|
  format.html { render :text => 'this is html' }
  format.js  { render :text => 'this is js' }
end

So browser is saying that I prefer documents in following order

 js
 html
 plain

The order in which formats are declared is

html (format.html)
js (format.js)

In this case rails will go through each Mime type that browser supports from top to bottom one by one. If a match is found then response is sent otherwise rails tries find match for next Mime type. First in the list of Mime types supported by browser is js and Rails does find that my respond_to block supports .js . Rails executes format.js block and response is sent to browser.

Case 3: Ajax requests

When an AJAX request is made the Safari, Firefox and Chrome send only one item in HTTP_ACCEPT and that is */*. So if you are making an AJAX request then HTTP_ACCEPT for these three browsers will look like

Chrome: */*
Firefox: */*
Safari: */*

and if your respond_to block looks like this

respond_to do |format|
  format.html { render :text => 'this is html' }
  format.js  { render :text => 'this is js' }
end

then the first one will be served based on the formats order. And in this case html response would be sent for an AJAX request. This is not what you want.

This is the reason why if you are using jQuery and if you are sending AJAX request then you should add something like this in your application.js file

$(function() {
  $.ajaxSetup({
    'beforeSend': function(xhr) {
      xhr.setRequestHeader("Accept", "text/javascript");
    }
  });
});

If you are using a newer version of rails.js then you don’t need to add above code since it is already take care of for you through this commit .

Trying it out

If you want to play with HTTP_ACCEPT header then put the following line in your controller to inspect the HTTP_ACCEPT attribute.

puts request.headers['HTTP_ACCEPT']

I used following rake task to set custom HTTP_ACCEPT attribute.

require "net/http"
require "uri"

task :custom_accept do
  uri = URI.parse("http://localhost:3000/users")
  http = Net::HTTP.new(uri.host, uri.port)

  request = Net::HTTP::Get.new(uri.request_uri)
  request["Accept"] = "text/html, application/xml, */*"

  response = http.request(request)
  puts response.body
end

Thanks

I got familiar with intricacies of mime parsing while working on ticket #6022 . A big thanks to José Valim for patiently dealing with me while working on this ticket.

Variable declaration at the top is not just pretty thing

I was discussing JavaScript code with a friend and he noticed that I had declared all the variables at the top.

He likes to declare the variable where they are used to be sure that the variable being used is declared with var otherwise that variable will become global variable. This fear of accidentally creating a global variables wants him to see variable declaration next to where it is being used.

Use the right tool

var payment;
payment = soldPrice + shippingCost;

In the above case user has declared payment variable in the middle so that he is sure that payment is declared. However if there is a typo as given below then he has accidentally created a global variable “payment”.

var payment; //there is a typo
payment = soldPrice + shippingCost;

Having variable declaration next to where variable is being used is not a safe way of guaranteeing that variable is declared. Use the right tool and that would be jslint validation. I use MacVim and I use Javascript Lint. So every time I save a JavaScript file validation is done and I get warning if I am accidentally creating a global variable.

You can configure such that JSLint validation runs when you check your code into git or when you push to github. Or you can have a custom rake task. Many solutions are available choose the one that fits you. But do not rely on manual inspection.

Variable declaration are being moved to the top by the browser

Take a look at following code. One might expect that console.log will print “Neeraj” but the output will be “undefined” . That is because even though you have declaration variables next to where they are being used, browsers lift those declarations to the very top.

name = 'Neeraj';
function lab(){
 console.log(name);
 var name = 'John';
 console.log(name);
};
lab();

Browser converts above code into one shown below.

name = 'Neeraj';
function lab(){
 var name = undefined;
 console.log(name);
 name = 'John';
 console.log(name);
};
lab();

In order to avoid this kind of mistakes it is preferred to declared variables at the top like this.

name = 'Neeraj';
function lab(){
 var name = 'John';
 console.log(name);
 console.log(name);
};
lab();

Looking at the first set of code a person might think that

Also remember that scope of variable in JavaScript at the function level.

Implications on how functions are declared

There are two ways of declaring a function.

var myfunc = function(){};
function myfunc2(){};

In the first case only the variable declaration myfunc is getting hoisted up. The definition of myfunc is NOT getting hoisted. In the second case both variable declaration and function definition is getting hoisted up. For more information on this refer to my previous blog on the same topic.

An inline confirmation utility powered by jQuery

I needed inline confirmation utility.

With jQuery it was easy.

After a few hours I had iconfirm.

Source code is at github.

Live Demo is also available.

Return false has changed in jquery 1.4.3

jQuery 1.4.3 was recently released. If you upgrade to jQuery 1.4.3 you will notice that the behavior of return false has changed in this version. First let’s see what return false does.

return false

$('a').click(function(){
  console.log('clicked');
  return false;
});

First ensure that above code is executed on domready. Now if I click on any link then two things will happen.

e.preventDefault() will be called .
e.stopPropagation() will be called .

e.preventDefault()

As the name suggests, calling e.preventDefault() will make sure that the default behavior is not executed.

<a href='www.google.com'>click me</a>

If above link is clicked then the default behavior of the browser is to take you to www.google.com. However by invoking e.preventDefault() browser will not go ahead with default behavior and I will not be taken to www.google.com.

e.stopPropagation

When a link is clicked then an event “click event” is created. And this event bubbles all the way up to the top. By invoking e.stopPropagation I am asking browser to not to propagate the event. In other words the event will stop bubbling.

<div class='first'>
  <div class='two'>
    <a href='www.google.com'>click me</a>
  </div>
</div>

If I click on “click me” then “click event” will start bubbling. Now let’s say that I catch this event at .two and if I call e.stopPropagation() then this event will never reach to .first .

e.stopImmediatePropagation

First note that you can bind more than one event to an element. Take a look at following case.

<a class='one'>one</a>

I am going to bind three events to the above element.

$('a').bind('click', function(e){
  console.log('first');
});

$('a').bind('click', function(e){
  console.log('second');
  e.stopImmediatePropagation();
});

$('a').bind('click', function(e){
  console.log('third');
});

In this case there are three events bound to the same element. Notice that second event binding invokes e.stopImmediatePropagation() . Calling e.stopImmediatePropagation does two things.

Just like stopPropagation it will stop the bubbling of the event. So any parent of this element will not get this event.

However stopImmdiatePropagation stops the event bubbling even to the siblings. It kills the event right then and there. That’s it. End of the event.

Once again calling stopPropagation means stop this event going to parent. And calling stopImmediatePropagation means stop passing this event to other event handlers bound to itself.

If you are interested here is link to DOM Level 3 Events spec.

Back to original problem

Now that I have described what preventDefault, stopPropagation and stopImmediatePropagation does lets see what changed in jQuery 1.4.3.

In jQuery 1.4.2 when I execute “return false” then that action was same as executing:

e.preventDefault()
e.stopPropagation()
e.stopImmediatePropagation()

Now e.stopImmediatePropagation internally calls e.stopPragation but I have added here for visual clarity.

Fact that return false was calling e.stopImmeidatePropagation was a bug. Get that. It was a bug which got fixed in jquery 1.4.3.

So in jquery 1.4.3 e.stopImmediatePropagation is not called. Checkout this piece of code from events.js of jquery code base.

if ( ret !== undefined ) {
  event.result = ret;
  if ( ret === false ) {
    event.preventDefault();
    event.stopPropagation();
  }
}

As you can see when return false is invoked then e.stopImmediatePropagation is not called.

I tried to find which commit made this change but I could not go far because of this issue.

It gets complicated with live and a bug in jQuery 1.4.3

To make the case complicated, jQuery 1.4.3 has a bug in which e.preventStopImmediatePropagation doest not work. Here is a link to this bug I reported.

To understand the bug take a look at following code:

<a href='' class='first'>click me</a>

$('a.first').live('click', function(e){
    alert('hello');
    e.preventDefault();
    e.stopImmediatePropagation();
});

$('a.first').live('click', function(){
    alert('world');
});

Since I am invoking e.stopImmediatePropagation I should never see alert world. However you will see that alert if you are using jQuery 1.4.3. You can play with it here .

This bug has been fixed as per this commit . Note that the commit mentioned was done after the release of jQuery 1.4.3. To get the fix you will have to wait for jQuery 1.4.4 release or use jQuery edge.

I am using rails.js (jquery-ujs). What do I do?

As I have shown “return false” does not work in jQuery 1.4.3 . However I would have to like have as much backward compatibility in jquery-ujs as much possible so that the same code base works with jQuery 1.4 through 1.4.3 since not every one upgrades immediately.

This commit should make jquery-ujs jquery 1.4.3 compatible. Many issues have been logged at jquery-ujs and I will take a look at all of them one by one. Pleaes do provide your feedback.

instance_exec , changing self and params

Here is updated article on the same topic .

Following code will print 99 as the output.

class Klass
  def initialize
    @secret = 99
  end
end
puts Klass.new.instance_eval { @secret }

Nothing great there. However try passing a parameter to instance_eval .

puts Klass.new.instance_eval(self) { @secret }

You will get following error.

wrong number of arguments (1 for 0)

So instance_eval does not allow you to pass parameters to a block.

How to get around to the restriction that instance_eval does not accept parameters

instance_exec was added to ruby 1.9 and it allows you to pass parameters to a proc. This feature has been backported to ruby 1.8.7 so we don’t really need ruby 1.9 to test this feature. Try this.

class Klass
  def initialize
    @secret = 99
  end
end
puts Klass.new.instance_exec('secret') { |t| eval"@#{t}" }

Above code works. So now we can pass parameters to block. Good.

Changing value of self

Another feature of instance_exec is that it changes the value of self. To illustrate that I need to give a longer example.

module Kernel
  def singleton_class
    class << self
      self
    end
  end
end

class Human
  proc = lambda { puts 'proc says my class is ' + self.name.to_s }

  singleton_class.instance_eval do
    define_method(:lab)  do
      proc.call
    end
  end
end

class Developer < Human
end

Human.lab # class is Human
Developer.lab # class is Human ; oops

Notice that in that above case Developer.lab says “Human”. And that is the right answer from ruby perspective. However that is not what I intended. ruby stores the binding of the proc in the context it was created and hence it rightly reports that self is “Human” even though it is being called by Developer.

Go to http://facets.rubyforge.org/apidoc/api/core/index.html and look for instance_exec method. The doc says

Evaluate the block with the given arguments within the context of this object, so self is set to the method receiver.

It means that instance_exec evaluates self in a new context. Now try the same code with instance_exec .

module Kernel
  def singleton_class
    class << self
      self
    end
  end
end

class Human
  proc = lambda { puts 'proc says my class is ' + self.name.to_s }

  singleton_class.instance_eval do
    define_method(:lab)  do
      self.instance_exec &proc
    end
  end
end

class Developer < Human
end

Human.lab # class is Human
Developer.lab # class is Developer

In this case Developer.lab says Developer and not Human.

You can also checkout this page which has much more detailed explanation of instance_exec and also emphasizes that instance_exec does pass a new value of self .

instance_exec is so useful that ActiveSupport needs it. And since ruby 1.8.6 does not have it ActiveSupport has code to support it.

I came across instance_exec issue while resolving #4507 rails ticket . The final solution did not need instance_exec but I learned a bit about it.

$LOADED_FEATURES and require, load, require_dependency

Rails developers know that in development mode classes are loaded on demand. In production mode all the classes are loaded as part of bootstrapping the system. Also in development mode classes are reloaded every single time page is refreshed.

In order to reload the class, Rails first has to unload . That unloading is done something like this.

# unload User class
Objet.send(:remove_const, :User)

However a class might have other constants and they need to be unloaded too. Before you unload those constants you need to know all the constants that are defined in the class that is being loaded. Long story short rails keep track of every single constant that is loaded when it loads User or UserController.

Dependency mechanism is not perfect

Sometimes dependency mechanism by rails lets a few things fall through the crack. Try following case.

require 'open-uri'
class UsersController < ApplicationController
  def index
    open("http://www.ruby-lang.org/") {|f| }
    render :text => 'hello'
  end
end

Start the server in development mode and visit http://localhost:3000/users . First time every thing will come up fine. Now refresh the page. This time you should get an exception uninitialized constant OpenURI .

So what’s going on.

After the page is served the very first time then at the end of response rails will unload all the constants that were autoloaded including UsersController. However while unloading UsersContorller rails will also unload OpenURI.

When the page is refreshed then UsersController will be loaded and require 'open-uri' will be called. However that require will return false.

Why require returns false

Try the following test case in irb.

step 1

irb(main):002:0> require 'ostruct'
=> true

step 2

irb(main):005:0* Object.send(:remove_const, :OpenStruct)
=> OpenStruct

step 3 : ensure that OpenStruct is truly removed

irb(main):006:0> Object.send(:remove_const, :OpenStruct)
NameError: constant Object::OpenStruct not defined
        from (irb):6:in `remove_const'
        from (irb):6:in `send'
        from (irb):6

step 4

irb(main):007:0> require 'ostruct'
=> false

step 5

irb(main):009:0> OpenStruct.new
NameError: uninitialized constant OpenStruct
        from (irb):9

Notice that in the above case in step 4 require returns false. ‘require’ checks against $LOADED_FEATURES. When OpenStruct was removed then it was not removed from $LOADED_FEATURES and hence ruby thought ostruct is already loaded.

How to get around to this issue.

require loads only once. However load loads every single time. In stead of ‘require’, ‘load’ could be used in this case.

irb(main):001:0> load 'ostruct.rb'
=> true

irb(main):002:0> OpenStruct.new
=> #<OpenStruct>

Back to the original problem

In our rails application refresh of the page is failing. To get around to that issue use require_dependency instead of require. require_dependency is a rails thing. Under the hood rails does the same trick we did in the previous step. Rails calls kernel.load to load the constants that would fail if require were used.

I am not seeing hoptoad messages. Now I know why.

Following code has been tested with Rails 2.3.5 .

Every one knows for sure that hoptoad notifier sends exception messages to server in production environment. Between ‘development’ and ‘production’ there could be a number of environments. Some of these would have settings closer to ‘development’ environment and some would have setting closely matching the settings of ‘production’ environment.

When you have many environments and when an exception occurs, one is not really sure if that message is getting logged at hoptoad or not. Here is a run down of which messages will get logged and why.

It alls starts with rails

When an exception occurs while rendering a page then action_controller catches the exception. Following logic is evaluated to decide if user should see an error page with full stack trace or ‘we are sorry something went wrong’ message.

if consider_all_requests_local || local_request?
  rescue_action_locally(exception)
else
  rescue_action_in_public(exception)
end

Let’s look at first part consider_all_requests_local . Open ~/config/environments/development.rb and ~/config/environments/production.rb .

# ~/config/environments/development.rb
config.action_controller.consider_all_requests_local = true

# ~/config/environments/production.rb
config.action_controller.consider_all_requests_local = false

As you can see in development mode all requests are local. Be careful with what you put in your intermediary environments.

If you want to override that value then you can do like this.

#~/app/controllers/application_controller.rb
ActionController::Base.consider_all_requests_local = true

The second part of the equation was local_request? .

Rails has following code for that method.

LOCALHOST = '127.0.0.1'.freeze

def local_request?
  request.remote_addr == LOCALHOST && request.remote_ip == LOCALHOST
end

As you can see all requests coming from 127.0.0.1 are considered local even if RAILS_ENV is ‘production’. For testing purpose you can override this value like this.

#~/app/controllers/application_controller.rb
def local_request?
 false
end

Hoptoad has access to exception now what

If consider_all_request_local is false and if request is not local then hoptoad will get access to exception thanks to alias_method_chain.

def self.included(base)
  base.send(:alias_method, :rescue_action_in_public_without_hoptoad, :rescue_action_in_public)
  base.send(:alias_method, :rescue_action_in_public, :rescue_action_in_public_with_hoptoad)
end

In rescue_action_in_public_with_hoptoad there is a call to notify_or_ignore like this.

unless hoptoad_ignore_user_agent?
  HoptoadNotifier.notify_or_ignore(exception, hoptoad_request_data)
end

For majority of us there is no special handling for a particular user_agent .

def notify_or_ignore(exception, opts = {})
  notice = build_notice_for(exception, opts)
  send_notice(notice) unless notice.ignore?
end

Hoptoad defines following methods as ignorable by default and you won’t get notifications for following types of exceptions.

IGNORE_DEFAULT = ['ActiveRecord::RecordNotFound',
                   'ActionController::RoutingError',
                   'ActionController::InvalidAuthenticityToken',
                   'CGI::Session::CookieStore::TamperedWithCookie',
                   'ActionController::UnknownAction']

Next hop is method send_notice .

def send_notice(notice)
  if configuration.public?
    sender.send_to_hoptoad(notice.to_xml)
  end
end

configuration.public? is defined like this.

@development_environments = %w(development test cucumber)
def public?
  !development_environments.include?(environment_name)
end

As you can see if the Rails.env is development or test or cucumber the exception will not be reported to hoptoad server.

List of only the elements that contains

I was toying with simple list filter plugin and ended up with this markup.

<div id="lab">
  <ul id="list">
    <li><a href="">USA</a></li>
  <ul>
  <p>
    <a href=''>USA</a>
  </p>
</div>

I want to get all links that contains the word USA. Simple enough. jQuery supports contains selector.

$(":contains('USA')");

Above query results in following items.

[html, body#body, div#lab, ul#list, li, a, ul, p, a]

That is because contains looks for given string under all the descendants.

has method to rescue

jQuery has has method which returns the list of elements which have a descendant which has the given string.

b = $('*').has(":contains('USA')");

Above query results in following items.

[html, body#body, div#lab, ul#list, li, ul, p]

Final result

a = $(":contains('USA')");
b = $('*').has(":contains('USA')");
c = a.not(b) ;
console.log(c);

Above query results in following items.

 [a, a]

Singleton function in JavaScript

Recently I was discussed with a friend how to create a singleton function in JavaScript. I am putting the same information here in case it might help someone understand JavaScript better.

Creating an Object

Simplest solution is creating an instance of the object.

var Logger = function(path) {
	this.path = path;
};

l1 = new Logger('/home');
console.log(l1);

l2 = new Logger('/dev');
console.log(l2);

console.log(l1 === l2);

Above solution works. However l2 is a new instance of Logger .

Singleton solution using a global variable

window.global_logger = null;
var Logger = function(path) {
	if (global_logger) {
		console.log('global logger already present');
	} else {
		this.path = path;
		window.global_logger = this;
	}
	return window.global_logger;
};

l1 = new Logger('/home');
console.log(l1);

l2 = new Logger('/dev');
console.log(l2);

console.log(l1 === l2);

Above solution works. However this solution relies on creating a global variable. To the extent possible it is best to avoid polluting global namespace.

Single solution without polluting global namespace

var Logger = function() {

	var _instance;

	return function(path) {

		if (_instance) {
			console.log('an instance is already present');
		} else {
			this.path = path;
			_instance = this;
		}

		return _instance;
	}
} (); //note that it is self invoking function


var l1 = new Logger('/root');
console.log(l1);

var l2 = new Logger('/dev');
console.log(l2);

console.log(l1 === l2);

This solution does not pollute global namespace.

Regular expression in JavaScript

Regular expressions is a powerful tool in any language. Here I am discussing how to use regular expression in JavaScript.

Defining regular expressions

In JavaScript regular expression can be defined two ways.

var regex = /hello/ig; // i is for ignore case. g is for global.
var regex = new RegExp("hello", "ig");

If I am defining regular expression using RegExp then I need to add escape character in certain cases.

var regex = /hello_\w*/ig;
var regex = new RegExp("hello_\\w*", "ig"); //notice the extra backslash before \w

When I am defining regular expression using RegExp then \w needs to be escaped otherwise it would be taken literally.

test method

test method is to check if a match is found or not. This method returns true or false.

var regex = /hello/ig;
var text = 'hello_you';
var bool = regex.test(text);

exec method

exec method finds if a match is found or not. It returns an array if a match is found. Otherwise it returns null.

var regex = /hello_\w*/ig;
var text = 'hello_you';
var matches = regex.exec(text);
console.log(matches); //=> hello_you

match method

match method acts exactly like exec method if no g parameter is passed. When global flag is turned on the match returns an Array containing all the matches.

Note that in exec the syntax was regex.exec(text) while in match method the syntax is text.match(regex) .

var regex = /hello_\w*/i;
var text = 'hello_you and hello_me';
var matches = text.match(regex);
console.log(matches); //=> ['hello_you']

Now with global flag turned on.

var regex = /hello_\w*/ig;
var text = 'hello_you and hello_me';
var matches = text.match(regex);
console.log(matches); //=> ['hello_you', 'hello_me']

Getting multiple matches

Once again both exec and match method without g option do not get all the matching values from a string. If you want all the matching values then you need to iterate through the text. Here is an example.

Get both the bug numbers in the following case.

var matches = [];
var regex = /#(\d+)/ig;
var text = 'I fixed bugs #1234 and #5678';
while (match = regex.exec(text)) {
  matches.push(match[1]);
}
console.log(matches); // ['1234', '5678']

Note that in the above case global flag g. Without that above code will run forever.

var matches = [];
var regex = /#(\d+)/ig;
var text = 'I fixed bugs #1234 and #5678';
matches = text.match(regex);
console.log(matches);

In the above case match is used instead of regex . However since match with global flag option brings all the matches there was no need to iterate in a loop.

match attributes

When a match is made then an array is returned. That array has two methods.

  • index: This tells where in the string match was done
  • input: the original string
var regex = /#(\d+)/i;
var text = 'I fixed bugs #1234 and #5678';
var match = text.match(regex);
console.log(match.index); //13
console.log(match.input); //I fixed bugs #1234 and #5678

replace

replace method takes both regexp and string as argument.

var text = 'I fixed bugs #1234 and #5678';
var output = text.replace('bugs', 'defects');
console.log(output); //I fixed defects #1234 and #5678

Example of using a function to replace text.

var text = 'I fixed bugs #1234 and #5678';
var output = text.replace(/\d+/g, function(match){ return match * 2});
console.log(output); //I fixed bugs #2468 and #11356

Another case.

// requirement is to change all like within <b> </b> to love.
var text = ' I like JavaScript. <b> I like JavaScript</b> ';
var output = text.replace(/<b>.*?<\/b>/g, function(match) { return match.replace(/like/g, "love") } );
console.log(output); //I like JavaScript. <b> I love JavaScript</b>

Example of using special variables.

$& -	the matched substring.
$` -  the portion of the string that precedes the matched substring.
$' -  the portion of the string that follows the matched substring.
$n -  $0, $1, $2 etc where number means the captured group.
var regex = /(\w+)\s(\w+)/;
var text = "John Smith";
var output = text.replace(regex, "$2, $1");
console.log(output);//Smith, John
var regex = /JavaScript/;
var text = "I think JavaScript is awesome";
var output = text.replace(regex, "before:$` after:$' full:$&");
console.log(output);//I think before:I think after: is awesome full:JavaScript is awesome

Replace method also accepts captured groups as parameters in the function. Here is an example;

var regex  = /#(\d*)(.*)@(\w*)/;
var text = 'I fixed bug #1234 and twitted to @javascript';
text.replace(regex,function(_,a,b,c){
  log(_); //#1234 and twitted to @javascript
  log(a); //1234
  log(b); //  and twitted to
  log(c); // javascript
});

As you can see the very first argument to function is the fully matched text. Other captured groups are subsequent arguments. This strategy can be applied recursively.

var bugs = [];
var regex = /#(\d+)/g;
var text = 'I fixed bugs #1234 and #5678';
text.replace(regex, function(_,f){
    bugs.push(f);
});
log(bugs); //["1234", "5678"]

Split method

split method can take both string or a regular expression.

An example of split using a string.

var text = "Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec";
var output = text.split(',');
log(output); // ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

An example of split using regular expression.

var text = "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ";
var regex = /\s*;\s*/;
var output = text.split(regex);
log(output); // ["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]

Non capturing Group

The requirement given to me states that I should strictly look for word java, ruby or rails within word boundary. This can be done like this.

var text = 'java';
var regex = /\bjava\b|\bruby\b|\brails\b/;
text.match(regex);

Above code works. However notice the code duplication. This can be refactored to the one given below.

var text = 'rails';
var regex = /\b(java|ruby|rails)\b/;
text.match(regex);

Above code works and there is no code duplication. However in this case I am asking regular expression engine to create a captured group which I’ll not be using. Regex engines need to do extra work to keep track of captured groups. It would be nice if I could say to regex engine do not capture this into a group because I will not be using it.

?: is a special symbol that tells regex engine to create non capturing group. Above code can be refactored into the one given below.

var text = 'rails';
var regex = /\b(?:java|ruby|rails)\b/;
text.match(regex);
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.]*(?=\))/g);
log(matches);

Get started with nodejs in steps

nodejs is awesome. To get people started with nodejs, node-chat has been developed. Source code for node-chat app is here .

When I looked at source code for the first time, it looked intimidating. In order to get started with nodejs, I have developed a small portion of the node-chat application in 13 incremental steps.

The first step is as simple as 15 lines of code .

If you want to follow along then go through README and you can get a feel of nodejs very quickly. How to checkout each step and other information is mentioned in README.

Enjoy nodejs.

Two ways of declaring functions and impact on variable hoisting

All the JavaScript books I read so far do not distinguish between following two ways of declaring a function.

var foo = function(){};
function foo(){};

Thanks to Ben today I learned that there is a difference .

When a var is used to declare a function then only the variable declaration gets hoisted up

function test(){
  foo();
  var foo = function(){ console.log('foo'); };
};
test();

Above code is same as one given below.

function test(){
  var foo;
  foo();
  foo = function(){ console.log('foo'); };
};
test();

When a function variable is declared without var then both variable declaration and body gets hoisted up

function test(){
  foo();
  function foo(){ console.log('foo'); };
};
test();

Above code is same as one given below.

function test(){
  var foo;
  foo = function(){};
  console.log(foo());
};
test();

Conclusion

Now it will be clear why foo() does not work in the following case while bar() does work.

function test() {
    foo(); // TypeError "foo is not a function"
    bar(); // "this will run!"
    var foo = function () { // function expression assigned to local variable 'foo'
        alert("this won't run!");
    }
    function bar() { // function declaration, given the name 'bar'
        alert("this will run!");
    }
}
test();

Lessons learned from JavaScript quizzes

Nicholas answered three JavaScript quizzes in his blog. I am not interested in quiz like the one given below

var num1 = 5,
    num2 = 10,
    result = num1+++num2;

However some of the questions helped me learn a few things.

Questions from quiz

Recently there was a quiz out.

This was question #5 in the original blog. I have modified the quiz a little bit to suit my needs.

var x = 10;
var foo = {
  x: 20,
  bar: function () {
    var x = 30;
    return this.x;
  }
};

// 1
console.log(foo.bar());

// 2
console.log((foo.bar)());

// 3
console.log(foo.bar.call());

I got the first two answers wrong. In JavaScript a variable and a property are two different things. When this.xyx is invoked then JavaScript engine is looking for property called xyz.

var  bar = function () {
  var x = 30;
  return this.x;
};
console.log(bar()); //=> undefined

In the above case output is undefined. This is because this refers to a property named x and since no such property was found undefined is the answer.

var foo = {
  x: 20,
  bar: function () {
    return x;
  }
};
console.log(foo.bar());

Above code causes ReferenceError because x is not defined. Same thoery applies here. In this case x is a variable and since no such variable was found code failed.

Coming back to the third part of the original question. This one uses call.

console.log(foo.bar.call());

First arugument of call or apply method determines what this would be inside the function. If no argument is passed is passed then JavaScript engine assumes that this would be global scope which translates to this being window. Hence the answer is 10 in this case.

Questions from another quiz

There was another quiz .

In the original blog this is question #2.

var x = 5,
    o = {
        x: 10,
        doIt: function doIt(){
            var x = 20;
            setTimeout(function(){
                alert(this.x);
            }, 10);
        }
    };
o.doIt();

The key thing to remember here is that All functions passed into setTimeout() are executed in the global scope .

In the original blog this is question #5.

var o = {
        x: 8,

        valueOf: function(){
            return this.x + 2;
        },
        toString: function(){
            return this.x.toString();
        }
    },
    result = o < "9";

alert(o);

The thing to remember here is that when comparison is done then valueOf method is called on the object.

Questions from quiz

This is question #1 in the original blog.

if (!("a" in window)) {
  var a = 1;
}
alert(a);

I knew that all the variable declarations are hoisted up but somehow failed to apply that logic here. Please see the original blog for a detailed answer.

This is question #5 in the original blog.

function a() {
  alert(this);
}
a.call(null);

I knew that if nothing is passed to call method then this becomes global but did not know that if null is passed then also this becomes global.

Practical example of need for prototypal inheritance

Alex Sexton wrote a wonderful article on how to use inheritance pattern to manage large piece of code. His code also has a practical need for prototypal inheritance for writing modular code.

Creating standard jQuery plugin

Given below is code that does exactly what Alex’s code does.

$(function() {
	$.fn.speaker = function(options) {
		if (this.length) {
			return this.each(function() {

				var defaultOptions = {
					name: 'No name'
				};
				options = $.extend({},
				defaultOptions, options);

				var $this = $(this);

				$this.html('<p>' + options.name + '</p>');

				var fn = {};
				fn.speak = function(msg) {
					$this.append('<p>' + msg + '</p>' );
				};

				$.data(this, 'speaker', fn);
			});
		}
	};
});

For smaller plugins this code is not too bad. However if the plugin is huge then it presents one big problem. The code for business problem and the code that deals with jQuery is all mixed in. What it means is that if tomorrow same functionality needs to be implemented for Prototype framework then it is not clear what part of code deals with framework and what part deals with business logic.

Separating business logic and framework code

Given below is code that separates business logic and framework code.

var Speaker = function(opts, elem) {

	this._build = function() {
		this.$elem.html('<h1>' + options.name + '</h1>');
	};

	this.speak = function(msg) {
		this.$elem.append('<p>' + msg + '</p>');
	};

	var defaultOptions = {
		name: 'No name'
	};

	var options = $.extend({},
	defaultOptions, this.opts);

	this.$elem = $(elem);

	this._build();

};

$(function() {
	$.fn.speaker = function(options) {
		if (this.length) {
			return this.each(function() {
				var mySpeaker = new Speaker(options, this);
				$.data(this, 'speaker', mySpeaker);
			});
		}
	};
});

This code is an improvement over first iteration. However the whole business logic is captured inside a function. This code can be further improved by embracing object literal style of coding.

Final Improvement

Third and final iteration of the code is the code presented by Alex.

var Speaker = {
	init: function(options, elem) {
		this.options = $.extend({},
		this.options, options);

		this.elem = elem;
		this.$elem = $(elem);

		this._build();
	},
	options: {
		name: "No name"
	},
	_build: function() {
		this.$elem.html('<h1>' + this.options.name + '</h1>');
	},
	speak: function(msg) {
		this.$elem.append('<p>' + msg + '</p>');
	}
};

// Make sure Object.create is available in the browser (for our prototypal inheritance)
if (typeof Object.create !== 'function') {
	Object.create = function(o) {
		function F() {}
		F.prototype = o;
		return new F();
	};
}

$(function() {
	$.fn.speaker = function(options) {
		if (this.length) {
			return this.each(function() {
				var mySpeaker = Object.create(Speaker);
				mySpeaker.init(options, this);
				$.data(this, 'speaker', mySpeaker);
			});
		}
	};

Notice the Object.create pattern Alex used. The business logic code was converted from a function to a JavaScript object. However the problem is that you can’t create a new on that object. And you need to create new object so that you could dole out new objects to each element. Object.create pattern comes to rescue.

This pattern takes a standard Object and returns an instance of a function. This function has the input object set as prototype. So you get a brand new object for each element and you get to have all your business logic in object literal way and not in a function. If you want to know more about prototypal inheritance then you can read more about it in previous blog .

Object.create is now part of ECMAScript 5 .

Prototypal inheritance in JavaScript

One of the key features of JavaScript language is its support for prototype method. This feature could be used to bring inheritance in JavaScript.

In the beginning there was duplication

function Person(dob){
  this.dob = dob;
  this.votingAge = 21;
}
function Developer(dob, skills){
  this.dob = dob;
  this.skills = skills || '';
  this.votingAge = 21;
}
// create a Person instance
var person = new Person('02/02/1970');

//create a Developer instance
var developer = new Developer('02/02/1970', 'JavaScript');

As you can see both Person and Developer objects have votingAge property. This is code duplication. This is an ideal case where inheritance can be used.

prototype method

Whenever you create a function, that function instantly gets a property called prototype. The initial value of this prototype property is empty JavaScript object {} .

var fn = function(){};
fn.prototype //=> {}

JavaScript engine while looking up for a method in a function first searches for method in the function itself. Then the engine looks for that method in that functions’ prototype object.

Since prototype itself is a JavaScript object, more methods could be added to this JavaScript object.

var fn = function(){};
fn.prototype.author_name = 'John';
var f = new fn();
f.author_name; //=> John

Refactoring code to make use of prototype method

Currently Person function is defined like this.

function Person(dob){
  this.dob = dob;
  this.votingAge = 21;
}

Problem with above code is that every time a new instance of Person is created, two new properties are created and they take up memory. If a million objects are created then all instances will have a property called votingAge even though the value of votingAge is going to be same. All the million person instances can refer to same votingAge method if that method is define in prototype. This will save a lot of memory.

function Person(dob){
  this.dob = dob;
}
Person.prototype.votingAge = 21;

The modified solutions will save memory if a lot of objects are created. However notice that now it will a bit longer for JavaScript engine to look for votingAge method. Previously JavaScript engine would have looked for property named votingAge inside the person object and would have found it. Now the engine will not find votingAge property inside the person object. Then engine will look for person.prototype and will search for votingAge property there. It means, in the modified code engine will find votingAge method in the second hop instead of first hop.

Bringing inheritance using prototype property

Currently Person is defined like this.

function Person(dob){
  this.dob = dob;
}
Person.prototype.votingAge = 21;

If Developer Object wants to extend Person then all that needs to be done is this.

function Developer (dob, skills) {
 this. skills = skills || '';
 this.dob = dob;
}
Developer.prototype = new Person();

Now Developer instance will have access to votingAge method. This is much better. Now there is no code duplication between Developer and Person.

However notice that looking for votingAge method from a Developer instance will take an extra hop.

  • JavaScript engine will first look for votingAge property in the Developer instance object.
  • Next engine will look for votingAge property in its prototype property of Developer instance which is an instance of Person. votingAge method is not declared in the Person instance.
  • Next engine will look for votingAge property in the prototype of Person instance and this method would be found.

Since only the methods that are common to both Developer and Person are present in the Person.prototype there is nothing to be gained by looking for methods in the Person instance. Next implementation will be removing the middle man.

Remove the middle man

Here is the revised implementation of Developer function.

function Developer (dob, skills) {
 this.skills = skills || '';
 this.dob = dob;
}
Developer.prototype = Person.prototype;

In the above case Developer.prototype directly refers to Person.prototype. This will reduce the number of hops needed to get to method votingAge by one compared to previous case.

However there is a problem. If Developer changes the common property then instances of person will see the change. Here is an example.

Developer.prototype.votingAge = 18;
var developer = new Developer('02/02/1970', 'JavaScript');
developer.votingAge; //=> 18

var person = new Person();
person.votingAge; //=> 18. Notice that votingAge for Person has changed from 21 to 18

In order to solve this problem Developer.prototype should point to an empty object. And that empty object should refer to Person.prototype .

Solving the problem by adding an empty object

Here is revised implementation for Developer object.

function Developer(dob, skills) {
  this.dob = dob;
  this.skills = skills;
}
var F = function(){};
F.prototype = Person.prototype;
Developer.prototype = new F();

Let’s test this code.

Developer.prototype.votingAge = 18;
var developer = new Developer('02/02/1970', 'JavaScript');
developer.votingAge; //=> 18

var person = new Person();
person.votingAge; //=> 21

As you can see with the introduction of empty object, Developer instance have votingAge of 18 while Person intances have votingAge of 21.

Accessing super

If child wants to access super object then that should be allowed. That can be accomplished like this.

function Person(dob){
 this.dob = dob;
}
Person.prototype.votingAge = 21;

function Developer(dob, skills) {
  this.dob = dob;
  this.skills = skills;
}
var F = function(){};
F.prototype = Person.prototype;
Developer.prototype = new F();
Developer.prototype.__super = Person.prototype;
Developer.prototype.votingAge = 18;

Capturing it as a pattern

The whole thing can be captured in a helper method that would make it simple to create inheritance.

var extend = function(parent, child){
  var F = function(){};
  F.prototype = parent.prototype;
  child.prototype = new F();
  child.prototype.__super = parent.prototype;
};

Pure prototypal inheritance

A simpler form of pure prototypal inheritance can be structured like this.

if (typeof Object.create !== 'function') {
    Object.create = function (o) {
        function F() {}
        F.prototype = o;
        return new F();
    };
}

Before adding the create method to object, I checked if this method already exists or not. That is important because Object.create is part of ECMAScript 5 and slowly more and more browsers will start adding that method natively to JavaScript.

You can see that Object.create takes only one parameter. This method does not necessarily create a parent child relationship . But it can be a very good tool in converting an object literal to a function.

return false considered harmful in live

Checkout following jQuery code written with jQuery.1.4.2. What do you think will happen when first link is clicked.

$('a:first').live('click', function(){
  log('clicked once');
  return false;
});
$('a:first').live('click', function(){
  log('clicked twice');
  return false;
});

I was expecting that I would see both the messages. However jQuery only invokes the very first message.

return false does two things. It stops the default behavior which is go and fetch the link mentioned in the href of the anchor tags. Also it stops the event from bubbling up. Since live method relies on event bubbling, it makes sense that second message does not appear.

Fix is simple. Just block the default action but let the event bubble up.

$('a:first').live('click', function(e){
  log('clicked once');
  e.preventDefault();
});
$('a:first').live('click', function(e){
  log('clicked twice');
  e.preventDefault();
});

Simplest jQuery slideshow code explanation

Jonathan Snook wrote a blog titled Simplest jQuery SlideShow. Checkout the demo page. The full JavaScript code in its entirety is given below. If you understand this code then you don’t need to read rest of the article.

$(function(){
    $('.fadein img:gt(0)').hide();
    setInterval(function(){
      $('.fadein :first-child').fadeOut()
         .next('img').fadeIn()
         .end().appendTo('.fadein');},
      3000);
});

appendTo removes and attaches elements

In order to understand what’s going on above, I am constructing a simple test page. Here is the html markup.

<div id='container'>
  <div class='lab'>This is div1 </div>
  <div class='lab'>This is div2 </div>
</div>

Open this page in browser and execute following command in firebug.

$('.lab:first').appendTo('#container');

Run the above command 5/6 times to see its effect. Every single time you run JavaScript the order is changing.

The order of div elements with class lab is changing because if a jQuery element is already part of document and that element is being added somewhere else then jQuery will do cut and paste and not copy and paste . Again elements that already exist in the document get plucked out of document and then they are inserted somewhere else in the document.

Back to the original problem

In the original code the very first image is being plucked out of document and that image is being added to set again. In simpler terms this is what is happening. Initially the order is like this.

Image1
Image2
Image3

After the code is executed the order becomes this.

Image2
Image3
Image1

After the code is executed again then the order becomes this.

Image3
Image1
Image2

After the code is executed again then the order becomes this.

Image1
Image2
Image3

And this cycle continues forever.

How jQuery selects elements using Sizzle

jQuery’s motto is to select something and do something with it. As jQuery users, we provide the selection criteria and then we get busy with doing something with the result. This is a good thing. jQuery provides extremely simple API for selecting elements. If you are selecting ids then just prefix the name with ‘#’. If you are selecting a class then prefix it with ‘.’.

However it is important to understand what goes on behind the scene for many reasons. And one of the important reasons is the performance of Rich Client. As more and more web pages use more and more jQuery code, understanding of how jQuery selects elements will speed up the loading of pages.

What is a selector engine

HTML documents are full of html markups. It’s a tree like structure. Ideally speaking all the html documents should be 100% valid xml documents. However if you miss out on closing a div then browsers forgive you ( unless you have asked for strict parsing). Ultimately browser engine sees a well formed xml document. Then the browser engine renders that xml on the browser as a web page.

After a page is rendered then those xml elements are referred as DOM elements.

JavaScript is all about manipulating this tree structure (DOM elements) that browser has created in memory. A good example of manipulating the tree is command like the one give below which would hide the header element. However in order to hide the header tag, jQuery has to get to that DOM element.

jQuery('#header').hide()

The job of a selector engine is to get all the DOM elements matching the criteria provided by a user. There are many JavaScript selector engines in the market. Paul Irish has a nice article about JavaScript CSS Selector Engine timeline .

Sizzle is JavaScript selector engine developed by John Resig and is used internally in jQuery. In this article I will be showing how jQuery in conjunction with Sizzle finds elements.

Browsers help you to get to certain elements

Browsers do provide some helper functions to get to certain types of elements. For example if you want to get DOM element with id header then document.getElementById function can be used like this

document.getElementById('header')

Similarly if you want to collect all the p elements in a document then you could use following code .

document.getElementsByTagName('p')

However if you want something complex like the one given below then browsers were not much help. It was possible to walk up and down the tree however traversing the tree was tricky because of two reasons: a) DOM spec is not very intuitive b) Not all the browsers implemented DOM spec in same way.

jQuery('#header a')

Later selector API came out.

The latest version of all the major browsers support this specification including IE8. However IE7 and IE6 do not support it. This API provides querySelectorAll method which allows one to write complex selector query like document.querySelectorAll("#score>tbody>tr>td:nth-of-type(2)" .

It means that if you are using IE8 or current version of any other modern browser then jQuery code jQuery('#header a') will not even hit Sizzle. That query will be served by a call to querySelectorAll .

However if you are using IE6 or IE7, Sizzle will be invoked for jQuery(‘#header a’). This is one of the reasons why some apps perform much slower on IE6/7 compared to IE8 since a native browser function is much faster then elements retrieval by Sizzle.

Selection process

jQuery has a lot of optimization baked in to make things run faster. In this section I will go through some of the queries and will try to trace the route jQuery follows.

When jQuery sees that the input string is just one word and is looking for an id then jQuery invokes document.getElementById . Straight and simple. Sizzle is not invoked.

$(‘#header a’) on a modern browser

If the browser supports querySelectorAll then querySelectorAll will satisfy this request. Sizzle is not invoked.

$(‘.header a[href!=”hello”]’) on a modern browser

In this case jQuery will try to use querySelectorAll but the result would be an exception (atleast on firefox). The browser will throw an exception because the querySelectorAll method does not support certain selection criteria. In this case when browser throws an exception, jQuery will pass on the request to Sizzle. Sizzle not only supports css 3 selector but it goes above and beyond that.

$(‘.header a’) on IE6/7

On IE6/7 querySelectorAll is not available so jQuery will pass on this request to Sizzle. Let’s see a little bit in detail how Sizzle will go about handling this case.

Sizzle gets the selector string ‘.header a’. It splits the string into two parts and stores in variable called parts.

parts = ['.header', 'a']

Next step is the one which sets Sizzle apart from other selector engines. Instead of first looking for elements with class header and then going down, Sizzle starts with the outer most selector string. As per this presentation from Paul Irish YUI3 and NWMatcher also go right to left.

So in this case Sizzle starts looking for all a elements in the document. Sizzle invokes the method find. Inside the find method Sizzle attempts to find out what kind of pattern this string matches. In this case Sizzle is dealing with string a .

Here is snippet of code from Sizzle.find .

match: {
     ID: /#((?:[\w\u00c0-\uFFFF-]|\\.)+)/,
     CLASS: /\.((?:[\w\u00c0-\uFFFF-]|\\.)+)/,
     NAME: /\[name=['"]*((?:[\w\u00c0-\uFFFF-]|\\.)+)['"]*\]/,
     ATTR: /\[\s*((?:[\w\u00c0-\uFFFF-]|\\.)+)\s*(?:(\S?=)\s*(['"]*)(.*?)\3|)\s*\]/,
     TAG: /^((?:[\w\u00c0-\uFFFF\*-]|\\.)+)/,
     CHILD: /:(only|nth|last|first)-child(?:\((even|odd|[\dn+-]*)\))?/,
     POS: /:(nth|eq|gt|lt|first|last|even|odd)(?:\((\d*)\))?(?=[^-]|$)/,
     PSEUDO: /:((?:[\w\u00c0-\uFFFF-]|\\.)+)(?:\((['"]?)((?:\([^\)]+\)|[^\(\)]*)+)\2\))?/
},

One by one Sizzle will go through all the match definitions. In this case since a is a valid tag, a match will be found for TAG. Next following function will be called.

TAG: function(match, context){
     return context.getElementsByTagName(match[1]);
}

Now result consists of all a elements.

Next task is to find if each of these elements has a parent element matching .header. In order to test that a call will be made to method dirCheck. In short this is what the call looks like.

dir = 'parentNode';
cur = ".header"
checkSet = [ a www.neeraj.name, a www.google.com ] // object representation

dirCheck( dir, cur, doneName, checkSet, nodeCheck, isXML )

dirCheck method returns whether each element of checkSet passed the test. After that a call is made to method preFilter. In this method the key code is below

if ( not ^ (elem.className && (" " + elem.className + " ").replace(/[\t\n]/g, " ").indexOf(match) >= 0) )

For our example this is what is being checked

" header ".indexOf(" header ")

This operation is repeated for all the elements on the checkSet. Elements not matching the criteria are rejected.

More methods in Sizzle

if you dig more into Sizzle code you would see functions defined as +, > and ~ . Also you will see methods like

enabled: function(elem) {
          return elem.disabled === false && elem.type !== "hidden";
    },
disabled: function(elem) {
          return elem.disabled === true;
     },
checked: function(elem) {
          return elem.checked === true;
     },
selected: function(elem) {
          elem.parentNode.selectedIndex;
          return elem.selected === true;
     },
parent: function(elem) {
          return !!elem.firstChild;
     },
empty: function(elem) {
          return !elem.firstChild;
     },
has: function(elem, i, match) {
          return !!Sizzle( match[3], elem ).length;
     },
header: function(elem) {
          return /h\d/i.test( elem.nodeName );
     },
text: function(elem) {
          return "text" === elem.type;
     },
radio: function(elem) {
          return "radio" === elem.type;
     },
checkbox: function(elem) {
          return "checkbox" === elem.type;
     },
file: function(elem) {
          return "file" === elem.type;
     },
password: function(elem) {
          return "password" === elem.type;
     },
submit: function(elem) {
          return "submit" === elem.type;
     },
image: function(elem) {
          return "image" === elem.type;
     },
reset: function(elem) {
          return "reset" === elem.type;
     },
button: function(elem) {
          return "button" === elem.type || elem.nodeName.toLowerCase() === "button";
     },
input: function(elem) {
          return /input|select|textarea|button/i.test(elem.nodeName);
     }
},

first: function(elem, i) {
          return i === 0;
     },
last: function(elem, i, match, array) {
          return i === array.length - 1;
     },
even: function(elem, i) {
          return i % 2 === 0;
     },
odd: function(elem, i) {
          return i % 2 === 1;
     },
lt: function(elem, i, match) {
          return i < match[3] - 0;
     },
gt: function(elem, i, match) {
          return i > match[3] - 0;
     },
nth: function(elem, i, match) {
          return match[3] - 0 === i;
     },
eq: function(elem, i, match) {
          return match[3] - 0 === i;
     }

I use all these methods almost daily and it was good to see how these methods are actually implemented.

Performance Implications

Now that I have little more understanding of how Sizzle works, I can better optimize my selector queries. Here are two selectors doing the same thing.

$('p.about_me .employment');

$('.about_me  p.employment');

Since Sizzle goes from right to left, in the first case Sizzle will pick up all the elements with the class employment and then Sizzle will try to filter that list. In the second case Sizzle will pick up only the p elements with class employment and then it will filter the list. In the second case the right most selection criteria is more specific and it will bring better performance.

So the rule with Sizzle is to go more specific on right hand side and to go less specific on left hand side. Here is another example.

$('.container :disabled');

$('.container input:disabled');

The second query will perform better because the right side query is more specific.

Understanding jQuery effects queue

Recently I tried following code in jQuery and it did not work.

$('#lab')
  .animate({height: '200px'})
  .hide();

If I pass a parameter to hide then it would start working.

$('#lab')
  .animate({height: '200px'})
  .hide(1);

As it turns out I did not have proper understanding of how effects work in jQuery.

animate method uses a queue inside. This is the queue to which all the pending activities are added.

$('#lab').animate({height: '200px'}).animate({width: '200px'});

In the above code element is being animated twice. However the second animation will not start until the first animation is done. While the first animation is happening the second animation is added to a queue. Name of this default queue is fx. This is the queue to which jQuery adds all the pending activities while one activity is in progress. You can inquire an element about how many pending activities are there in the queue.

$('#lab')
  .animate({height: '200px'})
  .animate({width: '200px'})
  .animate({width: '800px'})
  .queue(function(){ console.log( $(this).queue('fx').length); $(this).dequeue(); })
  .animate({width: '800px'})
  .queue(function(){ console.log( $(this).queue('fx').length);$(this).dequeue(); }) ;

In the above code, twice the current queue is being asked to list number of pending activities. First time the number of pending activities is 3 and the second time it is 1.

Method show and hide also accepts duration. If a duration is passed then that operation is added to the queue. If duration is not passed or if the duration is zero then that operation is not added to queue.

$('#lab').hide(); // this action is not added to fx queue


$('#lab').hide(0); // this action is not added to fx queue


$('#lab').hide(1); // this action is added to fx queue

Coming back to the original question

When show or hide method is invoked without any duration then those actions are not added to queue.

$('#lab')
  .animate({height: '200px'})
  .hide();

In the above code since hide method is not added to queue, both the animate and the hide method are executed simultaneously. Hence the end result is that element is not hidden.

It could be fixed in a number of ways. One way would be to pass a duration to hide method.

$('#lab')
  .animate({height: '200px'})
  .hide(1);

Another way to fix it would be to pass hiding action as a callback function to animate method.

$('#lab')
  .animate({height: '200px'}, function(){
   $(this).hide();
  }
);

Another way would be to explicitly put hide method in a queue.

$('#lab')
  .animate({height: '200px'})
  .queue(function(){
    $(this).hide();
  })

Since hide method is not added to queue by default, in this case I have explicitly put the hide method to the queue.

Note that inside a queue method you must explicitly call dequeue for the next activity from the queue to be picked up.

$('#lab')
    .animate({height: '200px'})
    .queue(function(){
      $(this).hide().dequeue();
    })
    .animate({width: '200px'})

In the above code if dequeue is not called then second animation will never take place.

Also note that methods like fadeTo, fadeIn, fadeOut, slideDown, slideUp and animate are ,by default, added to default queue.

Turning off all animations

If for some reason you don’t want animation then just set $.fx.off = true.

$.fx.off = true;
$('#lab')
   .animate({height: '200px'}, function(){
     $(this).hide();
   });

Above code is telling jQuery to turn off all animations and that would result in the element hiding in an instant.