Given the following HBase schema scenario (from the official FAQ)…
How would you design an Hbase table
for many-to-many association between
two entities, for example Student and
Course?I would define two tables:
Student: student id student data
(name, address, …) courses (use
course ids as column qualifiers here)Course: course id course data (name,
syllabus, …) students (use student
ids as column qualifiers here)This schema gives you fast access to
the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
How would you satisfy the request: “Give me all the students that share at least two courses in common“? Can you build a “query” in HBase that will return that set, or do you have to retrieve all the pertinent data and crunch it yourself in code?
The query as described is better suited to a relational database. You can answer the query quickly, however, by precomputing the result. For example, you might have a table where the key is the number of classes in common, and the cells are individual students that have key-many classes in common.
You could use a variant on this to answer questions like “which students are in class X and class Y”: use the classes as pieces of the key (in alphabetical ordering, or something at least consistent), and again, each column is a student.