Let’s say I have rows of data retrieved from a relational database tables (perhaps by joining the tables). Each row has several columns (such as A, B, C, D..) and the following rows are what I have.
A1, B1, C1, D1
A1, B2, C1, D1
A1, B2, C1, D2
If I were to draw a network graph among the entities, I could save the information in RDF by creating multiple triples such as
A1 connectsTo B1
B1 connectsTo C1
C1 connectsTo D1
A1 connectsTo B2
B2 connectsTo C1
C1 connectsTo D2 (and in the opposite direction as well)
So in a bidirectional graph they would be
A1 — B1 — C1 — D1
‘—– B2 — ‘ ‘— D2
A problem in this approach is that I now have introduced an ambiguity because by looking at the above graph I can also get a connection among
A1 — B1 — C1 — D2 that I did not have in the original rows. My first question – In general, is this what happens when saving database rows into a network graph (or am I doing something wrong?)
In order to preserve the original information, I could group the four entities in each row in a bnode, but my concern is if this will give me the same flexibility (and performance) in creating the graph connections as before. I may need to just grab all the connections between As and Bs or other combinations of sub sets later on. And this won’t be as space-efficient as before either because you’d have to store duplicate information across different bnodes.
So my second question is – What is the best way to store the rows in RDF but still maintain flexiblity and performance? I’ve looked at http://WWW.org’s recommendations for mapping RDF to RDF ( http://www.w3.org/TR/r2rml/ and also http://www.w3.org/TR/rdb-direct-mapping/ ), but it seems I’d have to group the data under the same row id in order to preserve the data. Is this the only way?
Thanks.
Your rows present n-ary relations (with n=4). So you have four things that are in some relationship to each other. RDF is based on binary relations (n=2), so you can only express that two things are in a relationship to each other. To represent n-ary relationships in RDF, you always have to introduce an additional node and connect the n members to it. W3C has a long best practices document on this topic: Defining N-ary Relations on the Semantic Web.
This approach doesn’t lose flexibility (you can easily query those relationships with SPARQL), and doesn’t store duplicate information either – in fact, the reason why your proposed representation doesn’t work is because you dropped essential information.