Some scripting languages, such as Python and Javascript, have arrays (aka lists) as a separate datatype from hash tables (aka dictionaries, maps, objects). In other scripting languages, such as PHP and Lua, an array is merely a hash table whose keys happen to be integers. (The implementation may be optimized for that special case, as is done in the current version of Lua, but that’s transparent to the language semantics.)
Which is the better approach?
-
The unified approach is more elegant in the sense of having one thing rather than two, though the gain isn’t quite as large as it might seem at first glance, since you still need to have the notion of iterating over the numeric keys specifically.
-
The unified approach is arguably more flexible. You can start off with nested arrays, find you need to annotate them with other stuff, and just add the annotations, without having to rework the data structures to interleave the arrays with hash tables.
-
In terms of efficiency, it seems to be pretty much a wash (provided the implementation optimizes for the special case, as Lua does).
What am I missing? Does the separate approach have any advantages?
An array is more than a table intentionally restricted to consecutive integer keys. It’s a sequence, a collection of n items (not key-value pairs, just the values) with a well-defined order. This is, in my opinion, a data structure that has no place for additional data in the form of non-integer keys. It’s conceptually simpler.
Also, implementing the two seperately may be simpler, especially when considering the addition of an optimization (which is apparently obscure enough that a performance-oriented language like Lua didn’t implement it for many many years) which makes arrays perform well.
Also, the flexibility point is arguable. If the need for more complex annotation arises, it’s quite possible that you’ll soon also need polymorphism, in which case you should just switch to objects with an array among other attributes.