Solr, Sunspot, Websolr and Delayed job
Solr is an open source search platform from Apache. It has a very powerful full-text search capability among other things.
Solr is written in Java. And it runs as a standalone search server within a servlet container like Tomcat. When you are working on a Ruby on Rails application you do not want to maintain Tomcat server. This is where websolr comes in picture. Websolr manages the index and the Rails application interacts with index using a gem called sunspot-rails .
# Gemfile gem 'sunspot_rails', '= 1.3.3' # search feature
Here I am interested in searching products.
class Product < ActiveRecord::Base searchable do text :name, boost: 1.5 text :description end end
Using sunspot gem
rails g sunspot_rails:install
Above command creates
config/sunspot.yml file. By default this file looks like following.
production: solr: hostname: localhost port: 8983 log_level: WARNING development: solr: hostname: localhost port: 8982 log_level: INFO test: solr: hostname: localhost port: 8981 log_level: WARNING
The way sunspot works is that after every single web request it updates solr about the changes that took place in the request. This is not desirable. To turn that off add
auto_commit_after_request option to false in the
I would also change the
log_level for development to
DEBUG . The revised
config/sunspot.yml file would look like
production: solr: hostname: localhost port: 8983 log_level: WARNING auto_commit_after_request: false development: solr: hostname: localhost port: 8980 log_level: DEBUG auto_commit_after_request: false test: solr: hostname: localhost port: 8981 log_level: DEBUG auto_commit_after_request: false
Taking care of callbacks
In the above case anytime I create, update or destroy a product then as part of
after_save callback solr commit commands are issued. Since
after_save callbacks are part of ActiveRecord transaction, this slows up the create, update and destroy operation. I like all these operations to happen in background.
Here is how I handled it
class Product < ActiveRecord::Base searchable do text :name, boost: 1.5 text :description end handle_asynchronously :solr_index, queue: 'indexing', priority: 50 handle_asynchronously :solr_index!, queue: 'indexing', priority: 50 handle_asynchronously :remove_from_index, queue: 'indexing', priority: 50 end
In the above case I used Delayed Job but you can use any background job processing tool.
In case of Delayed Job the higher the priority value the less is the priority. By bumping the priority value to 50, I'm making sure that emails and other background jobs are processed before solr work is taken up.
In the above case the call to
remove_from_index has been deferred to Delayed Job. However the record has already been destroyed. So when Delayed Job takes up the work it first tries to retrieve the record. However the record is missing and the background job fails.
Here is how we solved this problem.
class Product < ActiveRecord::Base searchable do text :name, boost: 1.5 text :description end handle_asynchronously :solr_index, queue: 'indexing', priority: 50 handle_asynchronously :solr_index!, queue: 'indexing', priority: 50 def remove_from_index_with_delayed Delayed::Job.enqueue RemoveIndexJob.new(record_class: self.class.to_s, attributes: self.attributes), queue: 'indexing', priority: 50 end alias_method_chain :remove_from_index, :delayed end
Add another worker named
class RemoveIndexJob < Struct.new(:options) def perform return if options.nil? options.symbolize_keys! record = options[:record_class].constantize.new options[:attributes].except("id") record.id = options[:attributes]["id"] record.remove_from_index_without_delayed end end
Connecting to websolr
From the websolr documentation it was not clear that the sunspot gem first looks for an environment variable called
WEBSOLR_URL and if that envrionment variable has a value then sunspot assumes that the solr index is at that url. If no value is found then it assumes that it is dealing with local solr instance.
So if you are using websolr then make sure that your application has environment variable
WEBSOLR_URL properly configured in staging and in production environment.