I have about 90000 IPv4 address ranges with data associated with each range
e.g.
1.0.0.0 - 1.1.0.0 -> "foo"
2.0.0.0 - 10.0.0.0 -> "bar"
Given an IP address, I need to retrieve the associated data. How can I do this efficiently?
I guess I can make things easier by converting the addresses to a single integer, but what data structure would be best to use to store this to enable fast searching?
Clarification – I’m searching with single IP, not a range (e.g. “192.168.0.1”)
Thanks
Sort the ends of the non-overlapping intervals in a single array. Mark each end with a flag indicating if it is the beginning or the end of the interval, like this:
Now run a binary search with the target address, say,
3.2.1.0. The insertion point falls on2.0.0.0, which is marked asstart. This means that3.2.1.0is one of the intervals.Now consider searching for
1.2.3.4. Its insertion point falls on1.1.0.0, which is marked asend. Since1.2.3.4is not equal to the1.1.0.0, we know that1.2.3.4is not one of the intervals.The cost of a search is
Log(N), whereNis the number of intervals.If you feel adventurous, consider implementing an
interval tree. This is probably not worth it for non-overlapping intervals, though.