How to search/filter in the many XML files? (200K files, approx each 20-40KB size). This should be done in the web-app, therefore need as fast as possible method.
Need search not only for specified XML-tag,
- but filter the result based on the content of some tags (e.g. display files what are newer as
<created>timpestamp</created>) - or fulltext in some tags like
<content>full text here</content> - the data must be in the XML files (can only use a sort of caching)
Thinking about the working solution:
- use something like XML::Simple or XML::Twig in a cycle over 200K files is slow, therefore
- need cache them for the fast access – convert all XML files into some “DB” (probably NoSQL)
- For what I should looking and learn? MongoDB or something other has nice support in the perl for the search/filter task?
- what should I avoid/beware and to what need pay attention?
I am not sure about this, but I think you are looking for some kind of XML Database.
If the above doesn’t fit your needs, you could always parse your new/updated XMLs, store them as indexed documents in a
Sphinxserver and when your users are doing the searches, serve the documents based on it – it is really fast and works smoothly with millions of documents