I’m reading a paper on SCOPE that discusses SQL like query semantics for big data applications. It does not follow how SQL deals with null values and discusses “null-extended” variables, which I have not encountered before. Consider the pseudo-query
SELECT * FROM DATA WHERE A != B
What does “the predicate A != B is satisfied only for rows that are null extended on B” mean?
The term “null extended” is used generally to refer to the set algebra in a modern DBMS. That is, it “extends” regular relational algebra by introducing
NULLvalues, or rather a single universalNULLvalue. Every predicate involving aNULLhas a defined result that is logically consistent with the rest of the algebra.I’ve also seen the term used to refer to outer joins. For example, this query:
Might give you the following results:
What’s happening here is that for id 12, A is being “null extended” with the columns from B because there are no values from B available. In general, when you perform a join on two relations, A and B, and you want to include tuples in A that have no matching tuples in B (outer join), then A must be null-extended with the attributes of B in order to form a complete result set.
This specific line that you put in quotations:
…doesn’t really make sense when taken out of context. You have to look at the whole thing:
And a little bit later:
Now with some context, it’s easier to understand what they’re trying to say. Since the join condition in M2 is
Rc == Sc, it follows that the conditionRc != Sccan only be true ifScisNULL– otherwise,Scwould be equal toRcbecause that’s how it was joined. In other words, the conditionRc != Sccan only be true for the rows inM2whereM1was null-extended with the columns fromSQbecause it did not match any rows inSQ.Hopefully that clears up some of the confusion.