I’ve been working on my own SQL library/ Query builder for a little while. (https://github.com/aviat4ion/Query) For the most part, I’m pretty happy with how things work.
The one problem is with query joins.
Say something like
$db->join($table, 'table1.field1=table2.field2', 'inner');
I’m rather stumped as to how to parse the second argument, which needs to properly escape table identifiers.
I want to be able to handle functions in the condition as well.
My current implementation is rather naive – spliting the conditional on spaces, so ‘table1.field1=table2.field2’ would fail, and ‘table1.field1 = table2.field2’ would work.
Each database driver has a function to abstract the identifier escaping, which works on table identifiers like database.table.field, so that it is escaped as "database"."table"."field".
So my basic problem is this: how to parse out identifiers to escape them in the join conditional.
Edit:
I need to do this in a way that can be used for MySQL, Postgres, SQLite, and Firebird.
If you only want to parse the where expression, a simple operator-precedence parser will do the trick. You have to apply a few checks on the parse tree to ensure the expression is valid, but this is not hard.
You can download an excellent guide to parsing in general here http://dickgrune.com/Books/PTAPG_1st_Edition/ (“Parsing Techniques – A Practical Guide “). Precedence parsing is covered in 9.2 PRECEDENCE PARSING, page 187.
The technique assumes you have 2 things:
You read tokens from the tokenizer, one by one. When you find the token is an operator (you know it is because those are stored in the precedence table) then you determine whether the current token has a higher or lower precedence than the previous operator. If the current operator’s precedence is lower than the previous token’s precedence, then you have to write the previous operator, along with it’s operands, to the parse tree, and look back from there to find what the previous operator of the formerly previous operator was. These operations work best if the tokenizer delivers the tokens as a double-linked list so that you can easily traverse the tokens.
If this all sounds to hard, then either:
Regarding option #2, Instead of allowing people to specify expressions as raw text, you could require them to pass it in as an array, or as a easily parsible format like JSON or even XML.
for instance, you could have it like this: