I have news-article content which is being indexes using Lucene and interrogated using Zend_Lucene in PHP.
The content frequently makes reference to UK television channels (e.g. BBC One) but I know that our users will often enter a search term of “BBC 1” or “BBC1” rather than “BBC One”.
Is there any “standard” approach to dealing with this numbers-as-words vs. numbers-as-numerals search issue?
My choices seem to be to either amend the search term whenever I see numbers so, for example, I change a search terms of “BBC1” to “BBC 1 One” (or something similar) – or I amend the indexed content so that numerals are converted to words and vice-versa and both versions stored in the index.
Please see this lucene FAQ entry, it suggests to use a token filter to provide alias / aliasing of words:
26. How can I make ‘pig’ also match ‘hog’ ?:
That’s older information probably this is even more comfortable nowadays, but probably worth the direction.