Last year, my group developed a web service that included basic search functionality.
All search conditions where combined with a boolean AND:
<conditions>
<condition name="name1">value1</condition>
<condition name="name2">value2</condition>
<conditions>
… is equivalent to name1=value1 AND name2=value2 etc.
Now, we have been asked to expand the search feature to allow for more complex searches.
I see two plausible approaches:
OPTION #1: Let users pass in their own SQL query (either full clause, or just the ‘where).
Examples:
<where>Cost = 5000.00 OR Cost > 5000.00</where>
<query>SELECT cmis:name FROM cmis:document WHERE cmis:name LIKE '%test%'</query>
Precedent:
- SearchSQL.SetWhereClause in IBM’s FileNet API
- Content Management Interoperability Services (CMIS) spec
- ADO uses this approach in various places. For example, recordset.Filter
Advantages:
- Our schema stays simple. We could leave the “<conditions>” approach in place for the simple use cases, and add an alternative syntax.
- We’d just use the WHERE clause directly server-side (after scrubbing for sql injection) == cleaner code server side
- Follows industry standards (does it? CMIS, Microsoft … anything from the Java world?)
Disadvantages:
- Not exactly “elegant xml” (is there any such thing?). Potentially forces consumers of the service to do some hackish string manipulation on their side, rather than providing them with something more elegant.
OPTION #2. Revamp our <conditions> approach to allow more granular queries in the soap request.
Example (from FetchXML):
<filter type='and'>
<condition attribute='lastname' operator='ne' value='Cannon' />
</filter>
Precedent:
- FetchXML
- Ant gets close with <if> / <else>
Advantages:
- Arguably more consistent with what an end user would expect (often the mark of a good API)
- Potential to give end users cleaner code
- Doesn’t create dependency on SQL language / backend. Keeps it abstract
Disadvantages:
- More server-side code required to reconstruct the XML into the SQL statement the user meant in the first place
I hope that the examples, precedent, advantages, and disadvantages give enough background to avoid subjective answers. I am looking for answers grounded in standards and best practices.
My question is: are there definitive reasons for choosing one approach over the other in expanding an API?
Option #2, if only for one reason: security.
Allowing end users to pass arbitrary SQL to your database is an invitation to disaster. You either trust your users to NEVER make mistakes in SQL, or you have to write code to determine which SQL you’re going to accept and which SQL you’re going to reject.
Option #2 will be harder to design and implement, but option #1 guarantees that you will hate yourself at some point when some user updates every record in an important table.