I have two table fields in a MySQL table.
One is VARCHAR and is a “headline” for a classified (classifieds website).
The other is TEXT field which contains the “text” for the classified.
Two Questions:
How should I determine how to index these two fields? (what field-type, what classes to use etc)
Currently I have an “ad_id” as a unique identifier for each ad, example “bmw_m3_82398292”.
How can I make SOLR return this identifier whenever a ‘query match’ is found by SOLR?
(The first part of the identifier is actually the headline fields content, the second part is a random number chosen)
Thanks
1. Schema
Your Solr schema is very much determined by your intended search behavior. In your schema.xml file, you’ll see a bunch of choices like “text” and “string”. They behave differently.
The string field type is a literal string match. It would operate like
==in a SQL statement.The text_ws field type does tokenization. However, a big difference in the
textfield is the filters for stop-words and delimiters and lower-casing. Notice how these filters are designated for both the Lucene index and the Solr query. So when searching a text field, it will adapt the query terms using these filters to help find a match.When indexing things like news stories, for example, you probably want to search for company names and headlines differently.
The above example would allow you to do a search like
&coname:Intel&headline:processor+specificationsand retrieve matches hitting exactly Intel stories.If you wanted to search a range
2. Result Fields
You can defined a standard set of return fields in your RequestHandler
You may also define the desired fields in your query string, using the
flparameter.:You can also select ranges in your query terms using the
field:[x TO *]syntax. If you wanted to select certain ads by their date , you might build a query within your query terms. (There are many ways to search ranges, I’m presenting a method that uses integers instead of Date class.)