I’m doing a raw query on a Django model using a string of SQL with two parameters. The head scratcher is that if I fill in the parameters using a string substitution, I get more results than if I do so using parameter substitution. The results are correct in the latter case, just incomplete. The code looks like this, with the only change being that I’ve omitted the exact SQL:
# I have a long and ornate SQL statement that looks basically like this.
sql = "SELECT blah blah WHERE something = %s AND something_else in (%s)"
# If I do a raw query with string substitution I get more results (22) ...
sqlInsecureFilled = sql % (divID, storeRestrictStr)
promos_insecured = Promotions.objects.raw(sqlInsecureFilled)
# ... than if I use a parameterized raw query, which produces (10)
promos_secure = Promotions.objects.raw(sql, [divID, storeRestrictStr])
But it gets weirder. If I do this from the command line, take the SQL from the raw_query objects (i.e., promos_secure.query) and copy it into the Sequel Pro terminal, then both queries produce the same number of results — 22! And yet:
In [35]: [len(list(promos_insecured)), len(list(promos_secure))]
Out[35]: [22, 10]
So to summarize: the queries appear to be the same by eyeball (it’s rather long, so it’s hard to tell exactly) and, when the promos_xx.query strings are copied into the MySQL terminal, they produce the full result set. And yet, when executed as shown above, the parameterized version returns 10 results, whereas the other version returns the full 22.
For completeness, here is promos_secure.query (promos_insecured is the same):
SELECT DISTINCT promotion_id,
promotion_name,
promotion_up_date,
promotion_down_date,
promotion_asset_id,
promotion_notes,
promotion_promo_id
FROM promotions,
promo_detail
WHERE promotion_id = promo_detail_promotion_id
AND promotion_start_date < now()
AND promotion_end_date > now()
AND promo_detail_cust_division_id = 1
AND promo_detail_not_expired = 1
AND promo_detail_store_id in
(8214, 8217, 4952, 8194, 8198, 8162, 5010, 5011, 5012, 8219, 8182, 5048, 5076, 5095, 5096, 5102, 5109, 5131, 5156,
5160, 5161, 5165, 5166, 5173, 5182, 5198, 5200, 5201, 5202, 5203, 5227, 5228, 5229, 5230, 5232, 5233, 5234, <
bunch of other comma - separated numbers omitted> , 9281)
ORDER BY promotion_end_date ASC
EDIT: maybe this is the most succinct way to show what’s going on and why it’s weird:
promo_u = promotions.models.Promotions.objects.raw(sql % (1, storeRestrictStr))
promo_s = promotions.models.Promotions.objects.raw(sql, (1, storeRestrictStr))
pid_u = [s.promotion_id for s in promo_u]
pid_s = [s.promotion_id for s in promo_s]
In [76]: [len(list(pid_u)), len(list(pid_s))]
Out[76]: [22, 10]
# You can see the smaller number of results is a subset of the larger.
In [77]: [pid in pid_u for pid in pid_s]
Out[77]: [True, True, True, True, True, True, True, True, True, True]
# The larger number results shows no obvious pattern as to why they're missing.
In [87]: [pid in pid_s for pid in pid_u]
Out[87]: [False, False, False, False, True, False, False, False, True, True, True, True,
True, False, True, False, False, True, False, True, False, True]
The problem with string substitution is that things are not properly quoted, as they are in parameter substitution. That said, the quoting is likely doing something you don’t expect. Some things to try:
Right now, you’re passing the ID list as a string (
storeRestrictStr). Try passing it as a list instead.Install Django Debug Toolbar and use it to look at the actual SQL being generated as well as its output.
You haven’t said whether the 10 results are correct and the 22 results have 12 false matches, or if the 22 results are correct and the 10 are missing 12, or something in between.