If you were trying to create a domain object in a database schema, and in your code said domain object has a hashtable/list member, like so:
public class SpaceQuadrant : PersistentObject { public SpaceQuadrant() { } public virtual Dictionary<SpaceCoordinate, SpaceObject> Space { get; set; } }
A Dictionary is just a hashtable/list mapping object keys to value keys, I’ve come up with multiple ways to do this, creating various join tables or loading techniques, but they all kind of suck in terms of getting that O(1) access time that you get in a hashtable.
How would you represent the SpaceQuadrant, SpaceCoordinate, and Space Object in a database schema? A simple schema code description would be nice, ie.
table SpaceQuadrant { ID int not null primary key, EntryName varchar(255) not null, SpaceQuadrantJoinTableId int not null foreign key references ...anothertable... }
but any thoughts at all would be nice as well, thanks for reading!
More Information:
Thanks for the great answers, already, I’ve only skimmed them, and I want to take some time thinking about each before I respond.
If you think there is a better way to define these classes, then by all means show me an example, any language your comfortable with is cool
First, dedicated support for geo-located data exists in many databases – different algorithms can be used (a spatial version of a B-Tree exists for instance), and support for proximity searches probably will exist.
Since you have a different hash table for each SpaceQuadrant, you’d need something like (edited from S.Lott’s post):
This is a
(SpaceCoordinate, Quadrant) -> SpaceObjectIddictionary.=====
Now, about your O(1) performance concern, there is a lot of reasons why it’s wrongly addressed.
You can use in many DB’s a hash index for memory-based tables, as somebody told you. But if you need persistent storage, you’d need to update two tables (the memory one and the persistent one) instead of one (if there is no built-in support for this). To discover whether that’s worth, you’d need to benchmark on the actual data (with actual data sizes).
Also, forcing a table into memory can have worse implications.
If something ever gets swapped, you’re dead – if you had used a B-Tree (i.e. normal disk-based index), its algorithms would have minimized the needed I/O. Otherwise, all DBMS’s would use hash tables and rely on swapping, instead of B-Trees. You can try to anticipate whether you’ll fit in memory, but…
Moreover, B-Trees are not O(1) but they are O(log_512(N)), or stuff like that (I know that collapses to O(log N), but bear me on this). You’d need (2^9)^4 = 2^36 = 64GiB for that to be 4, and if you have so much data you’d need a big iron server anyway for that to fit in memory. So, it’s almost O(1), and the constant factors are what actually matters.
Ever heard about low-asymptotic-complexity, big-constant-factor algorithms, that would be faster than simple ones just on unpractical data sizes?
Finally, I think DB authors are smarter than me and you. Especially given the declarative nature of SQL, hand-optimizing it this way isn’t gonna pay. If an index fits in memory, I guess they could choose to build and use a hashtable version of the disk index, as needed, if it was worth it. Investigate your docs for that.
But the bottom line is that, premature optimization is evil, especially when it’s of this kind (weird optimizations we’re thinking on our own, as opposed as standard SQL optimizations), and with a declarative language.