We generate graphs for huge datasets. We are talking 4096 samples per second, and 10 minutes per graph. A simple calculation makes for 4096 * 60 * 10 = 2457600 samples per linegraph. Each sample is a double (8 bytes) precision FP. Furthermore, we render multiple linegraphs on one screen, up to about a hundred. This makes we render about 25M samples in a single screen. Using common sense and simple tricks, we can get this code performant using the CPU drawing this on a 2D canvas. Performant, that is the render times fall below one minute. As this is scientific data, we cannot omit any samples. Seriously, this is not an option. Do not even start thinking about it.
Naturally, we want to improve render times using all techniques available. Multicore, pre-rendering, caching are all quite interesting but do not cut it. We want 30FPS rendering with these datasets at minimum, 60FPS preferred. We now this is an ambitious goal.
A natural way to offload graphics rendering is using the GPU of the system. GPU’s are made to work with huge datasets and process them parrallel. Some simple HelloWorld tests showed us a difference of day and night in rendering speed, using the GPU.
Now the problem is: GPU API’s such as OpenGL, DirectX and XNA are made for 3D scenes in mind. Thus, using them to render 2D linegraphs is possible, but not ideal. In the proof of concepts we developed, we encountered that we need to transform the 2D world into a 3D world. Suddnely we have to work with and XYZ coordinate system with polygons, vertices and more of the goodness. That is far from ideal from a development perspective. Code gets unreadable, maintenance is a nightmare, and more issues boil up.
What would your suggestion or idea be to to this in 3D? Is the only way to do this to actually convert the two systems (2D coordinates versus 3D coordinates & entities)? Or is there a sleeker way to achieve this?
-Why is it usefull to render multiple samples on one pixel? Since it represents the dataset better. Say on one pixel, you have the values 2, 5 and 8. Due to some sample omitting algorithm, only the 5 is drawn. The line would only go to 5, and not to 8, hence the data is distorted. You could argue for the opposite too, but fact of the matter is that the first argument counts for the datasets we work with. This is exactly the reason why we cannot omit samples.
A really popular toolkit for scientific visualization is VTK, and I think it suits your needs:
It’s a high-level API, so you won’t have to use OpenGL (VTK is built on top of OpenGL). There are interfaces for C++, Python, Java, and Tcl. I think this would keep your codebase pretty clean.
You can import all kinds of datasets into VTK (there are tons of examples from medical imaging to financial data).
VTK is pretty fast, and you can distribute VTK graphics pipelines across multiple machines if you want to do very large visualizations.
Regarding:
You can render large datasets in VTK by sampling and by using LOD models. That is, you’d have a model where you see a lower-resolution version from far out, but if you zoom in you would see a higher-resolution version. This is how a lot of large dataset rendering is done.
You don’t need to eliminate points from your actual dataset, but you can surely incrementally refine it when the user zooms in. It does you no good to render 25 million points to a single screen when the user can’t possibly process all that data. I would recommend that you take a look at both the VTK library and the VTK user guide, as there’s some invaluable information in there on ways to visualize large datasets.