I have written a c++ prog to query a 100 GB dictionary. I have

Question

0

Asked: June 11, 20262026-06-11T16:13:52+00:00 2026-06-11T16:13:52+00:00

I have written a c++ prog to query a 100 GB dictionary. I have

0

I have written a c++ prog to query a 100 GB dictionary. I have split the dictionary into n number of files of equal size. All split-files are placed in the same dir. The dictionary is fully indexed, i.e., once a query comes I know which spit-file to open and where to seek. My question is for better performance, which split will be better:
(a) Small number of large files or (b) Large number of small files?
Also, what would be an ideal split?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T16:13:54+00:00

I don’t think there’s a direct answer to that question. only experimenting can tell you. The cost of opening a file for read should be constant regardless of the size, reading the contents of the file is then of course dependant on the file size.

There are other hints though
I will assume that when you get a query, you open the file, parse/read it completly or until you find the word then close the file and return the result, in this case there are many enhancements to do, maybe you have them, maybe not, but here goes

If you get alot of queries, opening files can be expensive, in this
case you might need to cache your files, or your search queries for
better performance
When you open a file and read it, you are doing so sequentially, and this means that more or less the file is being loaded into the memory, I came once across a sax xml parser for java, which is able to load only the desired chunks of xml into memory, for handling really huge xml files, maybe there’s somehing similar for c++. SAX project

check when is a file loaded into memory

A total different approach would be to use a database with index. this problem you don’t have to deal with file opening problems

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have written a c++ prog to query a 100 GB dictionary. I have

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply