I have a very large possible data set that I am trying to visualize at once. The set itself consists of hundreds of thousands of segments, each of which is mapped to an id.
I have received a second data source that gives more real-time information for each segment, but the id’s do not correspond to the id’s I have.
I have a 1:1 mapping of the data id’s (9-character strings) to the current id’s (long integers). The problem is that there are a lot of id’s, and the data that is coming in is in no specific order.
The solution I came up with is to have a hash-map that maps the strings to the road id’s. The problem is that I don’t know if the hash-map will be efficient enough to have all 166k data entries.
Does anyone have any suggestions and/or hashing algorithms that I can use for this?
If you’re only dealing with hundreds of thousands of datapoints, it will likely not be a problem to go with the naive way and just stick with a hash-map.
Even if you have 500,000 9-character strings and an equal number of
longs, that still only 16ish bytes per item, or 8,000,000 bytes total. Even if you double that for overhead, 16 MB is hardly too big to have in memory at one time.Basically, try the easy way first, and only worry about it when your profiling tells you it’s taking too long.