I am trying to fetch in a single query a fixed set of rows, plus some other rows found by a subquery. My problem is that the query generated by my SQLAlchemy code is incorrect.
The problem is that the query generated by SQLAlchemy is as follows:
SELECT tbl.id AS tbl_id
FROM tbl
WHERE tbl.id IN
(
SELECT t2.id AS t2_id
FROM tbl AS t2, tbl AS t1
WHERE t2.id =
(
SELECT t3.id AS t3_id
FROM tbl AS t3, tbl AS t1
WHERE t3.id < t1.id ORDER BY t3.id DESC LIMIT 1 OFFSET 0
)
AND t1.id IN (4, 8)
)
OR tbl.id IN (0, 8)
while the correct query should not have the second tbl AS t1 (the goal from this query is to select IDs 0 and 8, as well as the IDs just before 4 and 8).
Unfortunately, I can’t find how to get SQLAlchemy to generate the correct one (see the code below).
Suggestions to also achieve the same result with a simpler query are also welcome (they need to be efficient though — I tried a few variants and some were a lot slower on my real use case).
The code producing the query:
from sqlalchemy import create_engine, or_
from sqlalchemy import Column, Integer, MetaData, Table
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///:memory:', echo=True)
meta = MetaData(bind=engine)
table = Table('tbl', meta, Column('id', Integer))
session = sessionmaker(bind=engine)()
meta.create_all()
# Insert IDs 0, 2, 4, 6, 8.
i = table.insert()
i.execute(*[dict(id=i) for i in range(0, 10, 2)])
print session.query(table).all()
# output: [(0,), (2,), (4,), (6,), (8,)]
# Subquery of interest: look for the row just before IDs 4 and 8.
sub_query_txt = (
'SELECT t2.id '
'FROM tbl t1, tbl t2 '
'WHERE t2.id = ( '
' SELECT t3.id from tbl t3 '
' WHERE t3.id < t1.id '
' ORDER BY t3.id DESC '
' LIMIT 1) '
'AND t1.id IN (4, 8)')
print session.execute(sub_query_txt).fetchall()
# output: [(2,), (6,)]
# Full query of interest: get the rows mentioned above, as well as more rows.
query_txt = (
'SELECT * '
'FROM tbl '
'WHERE ( '
' id IN (%s) '
'OR id IN (0, 8))'
) % sub_query_txt
print session.execute(query_txt).fetchall()
# output: [(0,), (2,), (6,), (8,)]
# Attempt at an SQLAlchemy translation (from innermost sub-query to full query).
t1 = table.alias('t1')
t2 = table.alias('t2')
t3 = table.alias('t3')
q1 = session.query(t3.c.id).filter(t3.c.id < t1.c.id).order_by(t3.c.id.desc()).\
limit(1)
q2 = session.query(t2.c.id).filter(t2.c.id == q1, t1.c.id.in_([4, 8]))
q3 = session.query(table).filter(
or_(table.c.id.in_(q2), table.c.id.in_([0, 8])))
print list(q3)
# output: [(0,), (6,), (8,)]
What you are missing is a correlation between the innermost sub-query and the next level up; without the correlation, SQLAlchemy will include the
t1alias in the innermost sub-query:Note that
tbl AS t1is now missing from the query. From the.correlate()method documentation:Thus,
t1is assumed to be part of the enclosing query, and isn’t listed in the query itself.Now your query works: