I’m trying to use Sphinx Search Server to index a really huge file (around 14gb).
The file is whitespace separated, one entry per line.
To be able to use it with Sphinx, I need to provide a xml file to the Sphinx server.
How can I do it without killing my computer ?
What is the best strategy? Should I try to split the main file in several little files? What’s the best way to do it?
Note: I’m doing it in Ruby, but I’m totally open to other hints.
Thanks for your time.
I hate guys who doesn’t write solution after a question. So I’ll try to don’t be one of them, hopefully it will help somebody.
I added a simple reader method to the File class then used it to loop on the file based on a chunk size of my choice. Quite simple actually, working like a charm with Sphinx.
Then just use it like this: