If you have used indeed.com before, you may know that for the keywords you look for, it returns a traditional search results as long as multiple search refinement options on the left side of screen.
For example, searching for keyword “designer”, the refinement options are:
Salary Estimate
$40,000+ (45982)
$60,000+ (29795)
$80,000+ (15966)
$100,000+ (6896)
$120,000+ (2828)
Title
Floral Design Specialist (945)
Hair Stylist (817)
GRAPHIC DESIGNER (630)
Hourly Associates/Co-managers (589)
Web designer (584)
more »
Company
Kelly Services (1862)
Unlisted Company (1133)
CyberCoders Engineering (1058)
Michaels Arts & Crafts (947)
ULTA (818)
Elance (767)
Location
New York, NY (2960)
San Francisco, CA (1633)
Chicago, IL (1184)
Houston, TX (1057)
Seattle, WA (1025)
more »
Job Type
Full-time (45687)
Part-time (2196)
Contract (8204)
Internship (720)
Temporary (1093)
How does it gather statistics information so quickly (e.g. the number of job offers in each salary range). Looks like the refinement options are created in realtime since minor keywords load fast too.
Is there a specific SQL technique to create such feature? Or is there a manual on the web explaining the tech behind this?
The technology used in Indeed.com and other search engines is known as inverted indexing which is at the core of how search engines work (e.g Google). The filtering you refer to (“refinement options”) are known as facets.
You can use Apache Solr, a full-fledged search server built using Lucene and easily integrable into your application using its RESTful API. Comes out-of-the-box with several features such as faceting, caching, scaling, spell-checking, etc. Is also used by several sites such as Netflix, C-Net, AOL etc. – hence stable, scalable and battle-tested.
If you want to dig deep into facet based filtering works, lookup Bitsets/Bitarrays and is described in this article.