I found out the memory my program is increasing is because of the code below, currently I am reading a file that is about 7GB big, and I believe the one that would be stored in the hashset is lesson than 10M, but the memory my program keeps increasing to 300MB and then crashes because of OutofMemoryError. If it is the Hashset problem, which data structure shall I choose?
if(tagsStr!=null) {
if(tagsStr.contains("a")||tagsStr.contains("b")||tagsStr.contains("c")) {
maTable.add(postId);
}
} else {
if(maTable.contains(parentId)) {
//do sth else, no memories added here
}
}
You’ve either got a memory leak or your understanding of the amount of string data that you are storing is incorrect. We can’t tell which without seeing more of your code.
The scientific solution is to run your application using a memory profiler, and analyze the output to see which of your data structures is using an unexpectedly large amount of memory.
If I was to guess, it would be that your application (at some level) is doing something like this:
This uses a lot more memory than you’d expect. The
substring(...)call creates atagStrobject that refers to the backing array of the originallinestring. Your tag strings that you expect to be short actually refer to achar[]object that holds all characters in the original line.The fix is to do this:
This creates a String object that does not share the backing array of the argument String.
UPDATE – this or something similar is an increasingly likely explanation … given your latest data.
To expand on another of Jon Skeet’s point, the overheads of a small String are surprisingly high. For instance, on a typical 32 bit JVM, the memory usage of a one character String is:
Total: 10 words – 40 bytes – to hold one
charof data … or onebyteof data if your input is in an 8-bit character set.(This is not sufficient to explain your problem, but you should be aware of it anyway.)