I’m using the split linux command to split huge xml files into node-sized ones. The problem is now I have directory with hundreds of thousands of files.
I want a way to get a file from the directory (to pass to another process for import into our database) without needing to list everything in it. Is this how Dir.foreach already works? Any other ideas?
You can use
Dir.globto find the files you need. More details here, but basically, you pass it a pattern likeDir.glob 'dir/*.rb'and get back filenames matching that pattern. I assume it’s done in a reasonably good way, but it will depend on your platform and implementation.As to
Dir.foreach, this should be efficient too – the concern would be if it has to process the entire directory for every pass around the loop. But that would be awful implementation, and is not the case.