I’m looking for some help and wisdom on how to properly design the schema for indexing documents for my situation. Basically I have products which can belong in multiple categories. Within those categories these products may or may not be sequenced. Ideally I’d like to keep just one unique document per product.
I’m using Solr 3.4.0 and currently have documents with this structure:
{
productId : "1",
sku : "ABC123",
productName : "My Product",
categorySequence : ["123-1", "456-7", "789-noseq", "000-noseq"],
description : "Product description",
rating: "4.36"
}
The categorySequence is where I’m having trouble. It’s a multi value field which contains strings that are formatted with the category id and the sequence of my product within that category id separated by a dash. In cases where the product is not sequenced in the category I’ve arbitrarily appended “noseq”.
Since my product can exist in multiple categories, I do a filter query on the categorySequence field like this:
fq=categorySequence:123-*
which is working for me to bring back only products which are in the category with the id “123”.
However my problem now as I have discovered is that you can’t sort on multi value fields. I initially was hoping this would be a quick way to sort the filtered products in the appropriate sequence.
I’ve seen some other suggestions on here regarding grouping and having multiple documents for the same product. However my products can exist in lots of categories and as you can imagine would create a lot of documents.
I’m hoping to stick with a single document representing a single product. Can someone help point me in the right direction? I guess I’m basically looking at doing a filter and a sort on a two dimensional field?
Faced an similar issue, and here is what we implemented –
Field –
data fed to Solr –
Do not need to store ones without any sort sequence. The positions of these can be handled with sortMissingLast & sortMissingFirst attributes.
These fields will maintain the position/sequence of products for the categories.
As you know the category id you can easily filter and sort for products.
fq=categorySequence:123-*&sort=123_sort_seq asc
Won’t need to maintain multiple copies of the products.