I spent several hours reading through docs and forums, trying to find a solution for the following problem:
In A Mongo database, I have a collection with some unstructured data:
{"data" : "some data" , "_id" : "497ce96f395f2f052a494fd4"}
{"more_data" : "more data here" ,"recursive_data": {"some_data": "even more data here", "_id" : "497ce96f395f2f052a4323"}
{"more_unknown_data" : "string or even dictionaries" , "_id" : "497ce96f395f2f052a494fsd2"}
…
The catch is that the elements in this collections don’t have a predefined structure and they can be unlimited levels.
My goal is to create a query, that searches through the collection and finds all the elements that match a regular expression( in both the keys and the values ).
For example, if I have a regex: ‘^even more’ – It should return all the elements that have the string "even more" somewhere in the structure. In this case – that will be the second one.
Simply add an array to each object and populate it with the strings you want to be able to search on. Typically I’d lowercase those values to make case-insensitive search easy.
e.g. Tags : [“copy of string 1”, “copy of string 2”, …]
You can extend this technique to index every word of every element. Sometimes I also add the field with an identifier in front of it, e.g. “genre:rock” which allows searches for values in specific fields (choose the ‘:’ character carefully).
Add an index on this array and now you have the ability to search for any word or phrase in any document in the collection and you can search for “genre:rock” to search for that value in a specific field.