I am busy with an exercise to plot nodes on a pane. My first goal is to work with 1million nodes and then ramp it up to 15million.
I have a custom Object Graph and I can add edges and nodes to this object. Each node object has an ellipse that I can call and plot and same with the edge objects.
At the moment I have a function that generates a random position for the nodes.
I am using a scroll pane at the moment to enable panning around the pane and to view all the nodes.
What I thought was a good idea was to use a hashmap
Map<String, ArrayList<Node>> mapX = new HashMap<String, ArrayList<Node>>();
Map<String, ArrayList<Node>> mapY = new HashMap<String, ArrayList<Node>>();
I use the following code to add nodes to the hashmap:
int tempXFloor = (int)Math.floor(tempX);
ArrayList<Node> tempList = mapX.get(tempXFloor+"");
if(tempList == null){
tempList = new ArrayList<>();
}
tempList.add(node);
mapX.put(tempXFloor+"",tempList);
Then while I am panning I get the current position, floor it and check if an entry exist in the map. if an entry exist, I add all the nodes in the ArrayList to nodesOnScreen.
nodesOnScreen is an ArrayList type, and I will add the nodes to that list while I am panning and likewise the nodes that are off the screen are removed from the nodesOnScreen variable.
I only plot the nodes that is in the ArrayList nodesOnScreen.
I would appreciate some guidance in this matter, and how to handle such big data structures. Am I going in the right direction or am I missing an obvious “trick” to do it.
There are several points to think about:
How complex are your nodes? If they are just dots you may consider drawing them on an WritableImage and save a lot of memory. For more complex cases you may want to use Canvas. Either way you will save on event handlers, properties and other small things which counts in larger amounts.
Another important matter is relevance of the data view. If you presents a map or something similar then user only care about visible part. The rest can be stored in disk cache and by Pareto principle only 20% of that data will be of much use. So you can plan accordingly and have real graphical nodes only for visible part (and maybe preload some adjusted parts for user experience sake).
“Divide and conquer” conception. Even if you don’t want to restrict user view according to plan (2) you can’t be in constant need of 15 millions nodes. Not in UI library, there is no big enough monitors I’m afraid. So, split your data into segments and load one segment in a time. If you need to perform any kind of calculations on the whole set — do not work with nodes, use simplest implementation and perform calculations in some background process.
Existing solutions are always a matter of investigations before doing big stuff. For example, there are a lot of caching libraries like PojoCache which seamlessly allow you to work with relevant data only once you splitted your billions of nodes into groups.