I’m using ElasticSearch along with the tire gem to power the search
functionality of my site. I’m having trouble figuring out how to map and
query the data to get the results I need.
Relevant code is below. I will explain the desired outbut below that as
well.
# models/product.rb
class Product < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
has_many :categorizations
has_many :categories, :through => :categorizations
has_many :product_traits
has_many :traits, :through => :product_traits
mapping do
indexes :id, type: 'integer'
indexes :name, boost: 10
indexes :description, analyzer: 'snowball'
indexes :categories do
indexes :id, type: 'integer'
indexes :name, type: 'string', index: 'not_analyzed'
end
indexes :product_traits, type: 'string', index: 'not_analyzed'
end
def self.search(params={})
out = tire.search(page: params[:page], per_page: 12, load: true) do
query do
boolean do
must { string params[:query], default_operator: "OR" } if params[:query].present?
must { term 'categories.id', params[:category_id] } if params[:category_id].present?
# if we aren't browsing a category, search results are "drill-down"
unless params[:category_id].present?
must { term 'categories.name', params[:categories] } if params[:categories].present?
end
params.select { |p| p[0,2] == 't_' }.each do |name,value|
must { term :product_traits, "#{name[2..-1]}##{value}" }
end
end
end
# don't show the category facets if we are browsing a category
facet("categories") { terms 'categories.name', size: 20 } unless params[:category_id].present?
facet("traits") {
terms :product_traits, size: 1000 #, all_terms: true
}
# raise to_curl
end
# process the trait facet results into a hash of arrays
if out.facets['traits']
facets = {}
out.facets['traits']['terms'].each do |f|
split = f['term'].partition('#')
facets[split[0]] ||= []
facets[split[0]] << { 'term' => split[2], 'count' => f['count'] }
end
out.facets['traits']['terms'] = facets
end
out
end
def to_indexed_json
{
id: id,
name: name,
description: description,
categories: categories.all(:select => 'categories.id, categories.name, categories.keywords'),
product_traits: product_traits.includes(:trait).collect { |t| "#{t.trait.name}##{t.value}" }
}.to_json
end
end
As you can see above, I’m doing some pre/post processing of the data
to/from elasticsearch in order to get what i want from the
‘product_traits’ field. This is what doesn’t feel right and where my
questions originate.
I have a large catalog of products, each with a handful of ‘traits’ such
as color, material and brand. Since these traits are so varied, I
modeled the data to include a Trait model which relates to the Product
model via a ProductTrait model, which holds the value of the trait for
the given product.
First question is: How can i create the elasticsearch mapping to index
these traits properly? I assume that this involves a nested type but I
can’t make enough sense of the docs to figure it out.
Second question: I want the facets to come back in groups (in the
manner that I am processing them at the end of the search method
above) but with counts that reflect how many matches there are without
taking into account the currently selected value for each trait. For
example: If the user searches for ‘Glitter’ and then clicks the link
corresponding to the ‘Blue Color’ facet, I want all the ‘Color’ facets
to remain visible and show counts correspinding the query results
without the ‘Blue Color’ filter. I hope that is a good explanation,
sorry if it needs more clarification.
If you index your traits as:
this would be indexed internally as:
which means that you could only ever query for docs which have a
traitwith value ‘color’ and avaluewith valuegreen. There is no relationship between thetraitand thevalue.You have a few choices to solve this problem.
As single terms
The first you are already doing, and it is a good solution, ie storing the traits as single terms like:
As objects
An alternative (assuming you have a limited number of trait names) would be to store them as:
Then you could run queries against
traits.colorortraits.material.As nested
If you want to keep your array structure, then you can use the nested type eg:
Each trait/value pair would be indexed internally as a separate (but related) document, meaning that there would be a relationship between the trait and its value. You’d need to use nested queries or nested filters to query them, eg:
Combining facets, filtering and nested docs
You state that, when a user filters on eg
color == greenyou want to show results only wherecolor == green, but you still want to show the counts for all colors.To do that, you need to use the
filterparam to the search API rather than a filtered query. A filtered query filters out the results BEFORE calculating the facets. Thefilterparam is applied to query results AFTER calculating facets.Here’s an example where the final query results are limited to docs where
color == greenbut the facets are calculated for all colors: