I have a 1.7G file with the following format:
String Long String Long String Long String Long ... etc
Essentially, String is a key and Long is a value in a hashmap i’m interested in initialising before running anything else in my application.
My current code is:
RandomAccessFile raf=new RandomAccessFile("/home/map.dat","r");
raf.seek(0);
while(raf.getFilePointer()!=raf.length()){
String name=raf.readUTF();
long offset=raf.readLong();
map.put(name,offset);
}
This takes about 12 mins to complete and I’m sure there are better ways of doing this so I would appreciate any help or pointer.
thanks
Update as in EJP suggestion?
EJP thank you for your suggestion and I hope this is what you meant. Correct me if this is wrong
DataInputStream dis=null;
try{
dis=new DataInputStream(new BufferedInputStream(new FileInputStream("/home/map.dat")));
while(true){
String name=dis.readUTF();
long offset=dis.readLong();
map.put(name, offset);
}
}catch (EOFException eofe){
try{
dis.close();
}catch (IOException ioe){
ioe.printStackTrace();
}
}
I would construct the file so it can be used in place. i.e. without loading this way. As you have variable length records you can construct an array of the location of each record, then place the key in order so you can perform a binary search for data. (Or you can use a custom hash table) You can then wrap this with method which hide the fact the data is actually store in a file instead of turned into data objects.
If you do all this the “load” phase becomes redundant and you won’t need to create so many objects.
This is a long example but hopefully shows what is possible.
generates 2 GB of raw data and performs a million lookups. It’s written in such a way that the loading and lookup uses very little heap. ( << 1 MB )
Using a hash table lookup would be faster per lookup as it is O(1) instead of O(ln N), but more complex to implement.