I need to process data that has a set of attributes where the number of attributes will be determined at run time. For example, a data set may contain animals, and attributes might include gender, species, age, etc. where each attribute can be represented by an integer (or an enum). I want to be able to iterate along any dimension so that I can, say, quickly calculate the total number of males, or number of dogs, etc.
I am thinking of a Java interface like this:
public interface DynamicMultidimensionalStore<T>
{
Object getPoint(List<Integer> coordinates);
void setPoint(List<Integer> coordinates, T item);
Iterator<T> iterate(int dimension, List<Integer> remainingCoordinates);
DynamicMultidimensionalStore<T> getSlice(int dimension, int offset);
}
First, there must be a name for this; Cube? I see it is similar to http://en.wikipedia.org/wiki/Spatial_index#Spatial_index but these seem more focused on spatial relationships rather than iterating over arbitrary axes.
The only structure I can think of is a class that stores the data in a linear array and performs pointer arithmetic to calculate the offsets.
Are there better solutions? I think my approach would becomes less efficient as the array becomes more sparse (or as dimensions increase).
If I’ve understood your question correctly, a “sparse solution” that would work is as follows. Represent your data set as a list of dictionaries, one for each variable. Store an item by inserting a reference to it into each dictionary, keyed by the relevant property. So you’d end up with data like
Here,
<slug>should be read as a reference / pointer to a single instance of your object type. I don’t know much about Java so I can’t be specific there, but an implementation in C++, say, could usestd::mapkeyed on possible values of the parameters. Then the values would be stored as some general collection: eitherstd::listor maybestd::set. If you were being fancier, maybestd::multimapwould be more suited – I’m not entirely sure.Counting objects with a given property should be extremely fast: you would be querying the length of some container that you look up in a hash table. The major downside is that you have
n*kpointers (or references or or …) wherenis the number of objects andkis the number of axes. This may or may not be OK for you.