here is my code:
public void mapTrace(String Path) throws FileNotFoundException, IOException {
FileReader arq = new FileReader(new File(Path));
BufferedReader leitor = new BufferedReader(arq, 41943040);
Integer page;
String std;
Integer position = 0;
while ((std = leitor.readLine()) != null) {
position++;
page = Integer.parseInt(std, 16);
LinkedList<Integer> values = map.get(page);
if (values == null) {
values = new LinkedList<>();
map.put(page, values);
}
values.add(position);
}
for (LinkedList<Integer> referenceList : map.values()) {
Collections.reverse(referenceList);
}
}
This is the HashMap structure
Map<Integer, LinkedList<Integer>> map = new HashMap<>();
For 50mb – 100mb trace files i don’t have any problem, but for bigger files i have:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
I don’t know if the reverse method is increasing the memory use, if the LinkedList is using more space than other List structure or if the way i’m adding the list to the map is taking more space than it should. Does anyone can tell me what’s using so much space?
The short answer is that it is probably the space overheads of the data structure you have chosen that is using the space.
By my reckoning, a
LinkedList<Integer>on a 64 bit JVM uses about 48 bytes of storage per integer in the list including the integers themselves.By my reckoning, a
Map<?, ?>on a 64 bit machine will use in the region of 48 bytes of storage per entry excluding the space need to represent the key and the value objects.Now, your trace size estimates are rather too vague for me to plug the numbers in, but I’d expect a 1.5Gb trace file to need a LOT more than 2Gb of heap.
Given the numbers you’ve provided, a reasonable rule-of-thumb is that a trace file will occupy roughly 10 times its file size in heap memory … using the data structure that you are currently using.
You don’t want to configure a JVM to try to use more memory than the physical RAM available. Otherwise, you are liable to push the machine into thrashing … and the operating system is liable to start killing processes. So for an 8Gb machine, I wouldn’t advise going over -Xmx8g.
Putting that together, with an 8Gb machine you should be able to cope with a 600Mb trace file (assuming my estimates are correct), but a 1.5Gb trace file is not feasible. If you really need to handle trace files that big, my advice would be to either:
design and implement custom collection types for your specific use-case that use memory more efficiently,
rethink your algorithms so that you don’t need to hold the entire trace files in memory, or
get a bigger machine.
The
-Xmx14goption sets the maximum heap size. Based on the observed behaviour, I expect that the JVM didn’t need anywhere like that much memory … and didn’t request it from the OS. And if you’d looked at memory usage in the task manager, I expect you’d have seen numbers consistent with that.Yes that it is what it does.
Yes, each page of your processes virtual address space corresponds to a page on the hard disc.
If you’ve got more virtual pages than physical memory pages, at any given time some of those virtual memory pages will live on disk only. When your application tries to use a one of those non-resident pages, the VM hardware generates an interrupt, and the operating system finds an unused page and populates it from the disc copy and then hands control back to your program. But if your application is busy, then it will have had to make that physical memory page by evicting another page. And that may have involved writing the contents of the evicted page to disc.
The net result is that when you try to use significantly more virtual address pages than you have physical memory, the application generates lots of interrupts that result in lots of disc reads and writes. This is known as thrashing. If your system thrashes too badly, the system will spend most of its waiting for disc reads and writes to finish, and performance will drop dramatically. And on some operating systems, the OS will attempt to “fix” the problem by killing processes.