I have an interesting problem coming up soon and I’ve started to think about the algorithm. The more I think about it, the more I get frightened because I think it’s going to scale horribly (O(n^4)), unless I can get smart. I’m having trouble getting smart about this one. Here’s a simplified description of the problem.
I have N polygons (where N can be huge >10,000,000) that are stored as a list of M vertices (where M is on the order of 100). What I need to do is for each polygon create a list of any vertices that are shared among other polygons (Think of the polygons as surrounding regions of interest, sometimes the regions but up against each other). I see something like this
Polygon i | Vertex | Polygon j | Vertex 1 1 2 2 1 2 2 3 1 5 3 1 1 6 3 2 1 7 3 3
This mean that vertex 1 in polygon 1 is the same point as vertex 2 in polygon 2, and vertex 2 in polygon 1 is the same point as vertex 3 in polygon 2. Likewise vertex 5 in polygon 1 is the same as vertex 1 in polygon 3….
For simplicity, we can assume that polygons never overlap, the closest they get is touching at the edge, and that all the vertices are integers (to make the equality easy to test).
The only thing I can thing of right now is for each polygon I have to loop over all of the polygons and vertices giving me a scaling of O(N^2*M^2) which is going to be very bad in my case. I can have very large files of polygons, so I can’t even store it all in RAM, so that would mean multiple reads of the file.
Here’s my pseudocode so far
for i=1 to N Pi=Polygon(i) for j = i+1 to N Pj=Polygon(j) for ii=1 to Pi.VertexCount() Vi=Pi.Vertex(ii) for jj=1 to Pj.VertexCount() Vj=Pj.Vertex(jj) if (Vi==Vj) AddToList(i,ii,j,jj) end for end for end for end for
I’m assuming that this has come up in the graphics community (I don’t spend much time there, so I don’t know the literature). Any Ideas?
This is a classic iteration-vs-memory problem. If you’re comparing every polygon with every other polygon, you’ll run into a O(n^2) solution. If you build a table as you step through all the polygons, then march through the table afterwards, you get a nice O(2n) solution. I ask a similar question during interviews.
Assuming you have the memory available, you want to create a multimap (one key, multiple entries) with each vertex as the key, and the polygon as the entry. Then you can walk each polygon exactly once, inserting the vertex and polygon into the map. If the vertex already exists, you add the polygon as an additional entry to that vertex key.
Once you’ve hit all the polygons, you walk the entire map once and do whatever you need to do with any vertex that has more than one polygon entry.