This question is about the best strategy for implementing the following simulation in C++.
I’m trying to make a simulation as a part of a physics research project, which basically tracks the dynamics of a chain of nodes in space. Each node contains a position together with certain parameters (local curvature, velocity, distance to neighbors etc…) which all evolve trough time.
Each time step can be broken down to these four parts:
- Calculate local parameters. The values are dependent on the nearest neighbors in the chain.
- Calculate global parameters.
- Evolving. The position of each node is moved a small amount, depending on global and local parameters, and some random force fields.
- Padding. New nodes are inserted if the distance between two consecutive nodes reach a critical value.
(In addition, nodes can get stuck, which make them inactive for the rest of the simulation. The local parameters of inactive nodes with inactive neighbors, will not change, and does not need any more calculation.)
Each node contains ~ 60 bytes, I have ~ 100 000 nodes in the chain, and i need to evolve the chain about ~ 1 000 000 time steps. I would however like to maximize these numbers, as it would increase the accuracy of my simulation, but under the restriction that the simulation is done in reasonable time (~hours). (~30 % of the nodes will be inactive.)
I have started to implement this simulation as a doubly linked list in C++. This seems natural, as I need to insert new nodes in between existing ones, and because the local parameters depends on the nearest neighbors. (I added an extra pointer to the next active node, to avoid unnecessary calculation, whenever I loop over the whole chain).
I’m no expert when it comes to parallelization (or coding for that matter), but I have played around with OpenMP, and I really like how I can speed up for loops of independent operations with two lines of code. I do not know how to make my linked list do stuff in parallel, or if it even works (?). So I had this idea of working with stl vector. Where Instead of having pointers to the nearest neighbors, I could store the indices of the neighbors and access them by standard lookup. I could also sort the vector by the position the chain (every x’th timestep) to get a better locality in memory. This approach would allowed for looping the OpenMP way.
I’m kind of intimidated by the idea, as I don’t have to deal with memory management. And I guess that the stl vector implementation is way better than my simple ‘new’ and ‘delete’ way of dealing with Nodes in the list. I know I could have done the same with stl lists, but i don’t like the way I have to access the nearest neighbors with iterators.
So I ask you, 1337 h4x0r and skilled programmers, what would be a better design for my simulation? Is the vector approach sketched above a good idea? Or are there tricks to play on linked list to make them work with OpenMP? Or should I consider a totally different approach?
The simulation is going to run on a computer with 8 cores and 48G RAM, so I guess I can trade a lot of memory for speed.
Thanks in advance
Edit:
I need to add 1-2 % new nodes each time step, so storing them as a vector without indices to nearest neighbors won’t work unless I sort the vector every time step.
This is a classic tradeoff question. Using an array or std::vector will make the calculations faster and the insertions slower; using a doubly linked list or std::list will make the insertions faster and the calculations slower.
The only way to judge tradeoff questions is empirically; which will work faster for your particular application? All you can really do is try it both ways and see. The more intense the computation and the shorter the stencil (eg, the computational intensity — how many flops you have to do per amount of memory you have to bring in) the less important a standard array will be. But basically you should mock up an implementation of your basic computation both ways and see if it matters. I’ve hacked together a very crude go at something with both std::vector and std::list; it is probably wrong in any of a numer of ways, but you can give it a go and play with some of the parameters and see which wins for you. On my system for the sizes and amount of computation given, list is faster, but it can go either way pretty easily.
W/rt openmp, yes, if that’s the way you’re going to go, you’re hands are somewhat tied; you’ll almost certainly have to go with the vector structure, but first you should make sure that the extra cost of the insertions won’t blow away any benifit of multiple cores.