DevZona

Rails, Heroku, AWS and other awesome technologies

Using Sunspot for Full Text Search With Pagination and Faceting on API

| Comments

Overview

Virtually every app sooner or later needs some kind of a search, and when it comes to searching against your database, simple SQL LIKE is normally not enough due to a number of reasons (word stemming, speed, etc.) In this case, the obvious solution is full text search.

When it comes to selecting a full text search engine, you have a variety of options. Most popular open source engines are: Apache Solr, Spinx, Elastic Search. Heck, Postgres has it’s own full text search implementation (Ryan Bates has an excellent railscast on it, I will touch on it in later posts). As a Rails developer, I often look for solutions that will easily integrate into Rails app, i.e. I look for Ruby gems. And, when it comes to full text search, you are fortunate. There are plenty of options. sunspot_rails is powered by Apache Solr, thinking_sphinx by Sphinx, and tire by Elastic Search. Moreover, there is at least one Heroku addon for each one of these.

I am planning to make this and the next post on sunspot_rails, then write a post on Elastic Search with tire, and finally, have a post on full text search with Postgres.

Personally, I’ve been using sunspot_rails for more than two years to power full text search on a couple of relatively big APIs with hundreds of thousands documents each. So far, I’m happy with results: Sunspot is very easy to setup, customize and run complex search queries. And, with over than 300,000 downloads it is the most popular Solr client for Ruby applications.

Setting up Search

I’m not going to go in details on how to set up sunspot_rails in your app since it can all be found on sunspot’s readme page. I’ve been using sunspot to run search for the API containing job postings data. And here is the searchable block of my main model:

app/models/job.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Job < ActiveRecord::Base
  belongs_to :employer

  FACETS = [ :state, :city_state, :employer_id ]

  searchable do
    text    :title, boost: 2.0
    text    :description, boost: 1.5
    location :coordinates
    string  :city
    string  :state
    string  :city_state, using: :city_state, stored: true
    string  :zip
    boolean :active
    time    :posted_at
    time    :activated_at
    integer :employer_id, references: Employer, stored: true
    boolean :featured
  end
end

Nothing extraordinary. I run fulltext search on two fields: title and description, giving title a little more weight in relevancy calculation. I use other fields for scoping. The sample search routine looks like this:

app/models/job.rb
1
2
3
4
5
6
7
8
9
10
11
12
@search = Job.solr_search(include: :employer) do
  with(:employer_id).equal_to(@employer_id) unless @employer_id.blank?
  with(:city).equal_to(@city) unless @city.blank?
  with(:state).equal_to(@state) unless @state.blank?
  with(:active).equal_to(@active) unless @active.nil?
  with(:featured).equal_to(@featured) unless @featured.nil?
  fulltext @q
  paginate page: @page, per_page: @@per_page
  FACETS.each do |symbol|
    facet(symbol)
  end
end

Returning JSON response

The code above will return a first page of results that will be contained in the @search.results. Also, @search will have other useful meta information about search results: faceting and pagination data.

Since this data is located on the REST API that backs several apps, it has to be returned in JSON. It is pretty straight forward for the @search.results collection. I use rabl templating engine to help me with that:

app/controllers/jobs_controller.rb
1
2
3
4
5
...
def index
  ...
  @jobs = @search.results
end
app/views/jobs/base.json.rabl
1
2
attributes :id, :employer_id, :title, :source_id, :employment_type,
           :description, :zip, :city, :state, :latitude, :longitude, :active, :employer_name, :created_at, :featured
app/views/jobs/index.json.rabl
1
2
extends "jobs/base"
collection @jobs

Which gives me nice JSON response like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
jobs: [
        {
        id: 174,
        employer_id: 3,
        title: "Account Executives Financial Sales",
        source_id: 779314,
        employment_type: "Full Time",
        description: "Job description",
        zip: "23453",
        city: "Virginia Beach",
        state: "VA",
        latitude: 36.8527778,
        longitude: -75.9783333,
        active: true,
        employer_name: "Acme Corp.",
        created_at: "2012-01-03T21:52:01Z",
        featured: true
        },
        {...}
      ]

Faceting

However, returning faceting and pagination in JSON is a little trickier. @search responds to facets method, which in my case contains three collections of facet data. But, if I just try to return @search.facets, it returns very complex collection of objects like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[
    [0] {
                 "options" => {},
                  "search" => #<Sunspot::Search::StandardSearch:0x7f80d2919538
            @connection = #<RSolr::Client:0x7f80cfb0ab38
                attr_reader :connection = #<RSolr::Connection:0x7f80cfb0ab88
                    attr_reader :http = #<Net::HTTP:0x7f80d129f3e8
                        @compression = nil,
                        @curr_http_version = "1.1",
                        @debug_output = nil,
                        @enable_post_connection_check = true,
                        @no_keepalive_server = false,
                        @socket = nil,
                        @ssl_context = nil,
                        @sspi_enabled = false,
                        @started = false,
                        attr_accessor :ca_file = nil,
                        attr_accessor :ca_path = nil,
                        attr_accessor :cert = nil,
                        attr_accessor :cert_store = nil,
                        attr_accessor :ciphers = nil,
                        attr_accessor :close_on_empty_response = false,
                        attr_accessor :continue_timeout = nil,
                        attr_accessor :key = nil,
                        attr_accessor :open_timeout = nil,
                        attr_accessor :read_timeout = 60,
                        attr_accessor :ssl_timeout = nil,
                        attr_accessor :ssl_version = nil,
                        attr_accessor :verify_callback = nil,
                        attr_accessor :verify_depth = nil,
                        attr_accessor :verify_mode = nil,
                        attr_reader :address = "localhost",
                        attr_reader :port = 8982,
                        attr_writer :use_ssl = false
                    >
                >,
                attr_reader :options = {
                    :url => "http://localhost:8982/solr"
                },
                attr_reader :proxy = nil,
                attr_reader :uri = #<URI::HTTP:0x7f80cfb16bb8
                    attr_accessor :fragment = nil,
                    attr_accessor :host = "localhost",
                    attr_accessor :opaque = nil,
                    attr_accessor :password = nil,
                    attr_accessor :path = "/solr/",
                    attr_accessor :port = 8982,
                    attr_accessor :query = nil,
                    attr_accessor :registry = nil,
                    attr_accessor :scheme = "http",
                    attr_accessor :user = nil,
                    attr_reader :parser = nil
                >
            >, ...

If I would be just returning @search.facets.as_json, it will only confuse client apps, especially if all the the client app needs is a simple value-count pairs like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
 state: {
          VA: 3,
          IL: 1,
          MA: 1
        },
 city_state: {
               Virginia Beach, VA: 3,
               Chicago, IL: 1,
               Pittsfield, MA: 1
             },
 employer_id: {
                14: 6,
                 3: 2,
                 4: 1,
                20: 1
              }
}

In order to generate JSON like this, I had to monkey-patch Sunspot::Search::StandardSearch and add method like this:

models/sunspot/search/standard_search.rb
1
2
3
4
5
6
7
8
9
10
11
def job_facets
  all_facets = {}
  self.facets.each do |f|
    one_facet_type = {}
    f.rows.each do |row|
      one_facet_type[row.value] = row.count
    end
    all_facets.merge!("#{f.name}" => one_facet_type)
  end
  all_facets
end

This method, when invoked on a @search object will return me a hash of assets above that could be easily serialized into JSON.

Pagination

All sunspot_rails results are returned by sunspot as WillPaginate::Collection objects, and therefore, they contain all nessesary pagination information that may be needed to generate pagination links on a client:

1
2
3
4
@search.results.current_page # => 1
@search.results.per_page # => 20
@search.results.total_entries # => 130
@search.results.total_pages # => 7

Since the client app that will be consuming this data will be generating pagination links using will_paginate gem, it will be a good idea to return it in JSON in “will-paginate-friendly” format, like so:

1
2
3
4
5
6
pagination: {
              current_page: 1,
              per_page: 20,
              total_entries: 130,
              total_pages: 7
            }

In order to do it, I added another method while monkey-patching Sunspot::Search::StandardSearch class:

1
2
3
4
5
6
def pagination_info
  { current_page: results.current_page,
    per_page: results.per_page,
    total_entries: results.total_entries,
    total_pages: results.total_pages }
end

Now, in order to return first page of results, faceting and pagination data in the same response:

app/controllers/jobs_controller.rb
1
2
3
4
5
6
7
...
def index
  ...
  @jobs = @search.results
  @facets = @search.job_facets
  @pagination = @search.pagination_info
end

And, then, using RABL magick, I craft the response that I need:

app/views/jobs/index.json.rabl
1
2
3
4
5
6
7
8
9
10
11
12
object false
child(@jobs => :jobs) do
  extends "jobs/base"
end

node(:facets) do
  @facets
end

node(:pagination) do
  @pagination
end

Which in return gives me desired JSON view like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  facets: {
    state: {
            VA: 3,
            IL: 1,
            MA: 1
          },
   city_state: {
                 Virginia Beach, VA: 3,
                 Chicago, IL: 1,
                 Pittsfield, MA: 1
               },
   employer_id: {
                  14: 6,
                   3: 2,
                   4: 1,
                  20: 1
                }
            },
pagination: {
  current_page: results.current_page,
  per_page: results.per_page,
  total_entries: results.total_entries,
  total_pages: results.total_pages
            },
jobs: [...]
}

Comments