I’ve got a UUID field I’m adding to my document in the following format: 372d325c-e01b-432f-98bd-bc4c949f15b8. However, when I try to query for documents by the UUID it will not return them no matter how I try to escape the expression. For example:
+uuid:372d325c-e01b-432f-98bd-bc4c949f15b8
+uuid:"372d325c-e01b-432f-98bd-bc4c949f15b8"
+uuid:372d325c\-e01b\-432f\-98bd\-bc4c949f15b8
+uuid:(372d325c-e01b-432f-98bd-bc4c949f15b8)
+uuid:("372d325c-e01b-432f-98bd-bc4c949f15b8")
And even skipping the QueryParser altogether using TermQuery like so:
new TermQuery(new Term("uuid", uuid.toString()))
Or
new TermQuery(new Term("uuid", QueryParser.escape(uuid.toString())))
None of these searches will return a document, but if I search for portions of the UUID it will return a document. For example these will return something:
+uuid:372d325c
+uuid:e01b
+uuid:432f
What should I do to index these documents so I can pull them back by their UUID? I’ve considered reformatting the UUID to remove the hyphens, but I haven’t implemented it yet.
The only way I got this to work is to use WhitespaceAnalyzer instead of StandardAnalyzer. Then using a TermQuery like so:
Then searching:
WhitespaceAnalyzer prevented Lucene from splitting apart the UUID by the hyphens. Another option could be to eliminate the dashes from the UUID, but using the WhitespaceAnalyzer works just as well for my purposes.