I have the following table:
class Feedback(Base):
__tablename__ = 'feedbacks'
__table_args__ = (UniqueConstraint('user_id', 'look_id'),)
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id'), nullable=False)
look_id = Column(Integer, ForeignKey('looks.id'), nullable=False)
I am currently inserting a lot of entries into this table that will be violating that UniqueConstraint.
I am using the following code:
for comment in session.query(Comment).filter(Comment.type == Comment.TYPE_LOOK).yield_per(100):
feedback = Feedback()
feedback.user_id = User.get_or_create(comment.model_id).id
feedback.look_id = comment.commentable_id
session.add(feedback)
try: # Refer to T20
session.flush()
except IntegrityError,e:
print "IntegrityError", e
session.rollback()
session.commit()
and I am getting the following error:
IntegrityError (IntegrityError) duplicate key value violates unique constraint "feedbacks_user_id_look_id_key"
DETAIL: Key (user_id, look_id)=(140, 263008) already exists.
'INSERT INTO feedbacks (user_id, look_id, score) VALUES (%(user_id)s, %(look_id)s, %(score)s) RETURNING feedbacks.id' {'user_id': 140, 'score': 1, 'look_id': 263008}
IntegrityError (IntegrityError) duplicate key value violates unique constraint "feedbacks_user_id_look_id_key"
...
(there's about 24 of these integrity errors here)
...
DETAIL: Key (user_id, look_id)=(173, 263008) already exists.
'INSERT INTO feedbacks (user_id, look_id, score) VALUES (%(user_id)s, %(look_id)s, %(score)s) RETURNING feedbacks.id' {'user_id': 173, 'score': 1, 'look_id': 263008}
No handlers could be found for logger "sqlalchemy.pool.QueuePool"
Traceback (most recent call last):
File "load.py", line 40, in <module>
load_crawl_data_into_feedback()
File "load.py", line 21, in load_crawl_data_into_feedback
for comment in session.query(Comment).filter(Comment.type == Comment.TYPE_LOOK).yield_per(100):
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2337, in instances
fetch = cursor.fetchmany(self._yield_per)
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3230, in fetchmany
self.cursor, self.context)
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3223, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3343, in _fetchmany_impl
row = self._fetchone_impl()
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3333, in _fetchone_impl
self.__buffer_rows()
File "/Volumes/Data2/Dropbox/projects/Giordano/venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 3326, in __buffer_rows
self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
sqlalchemy.exc.ProgrammingError: (ProgrammingError) named cursor isn't valid anymore None None
Before you jump to conclusions about this error being caused by yield_per, I can assure you that yield_per is not the culprit here.
I tried the same code without the unique constraints and I did not experience any error at all.
I believe that the integrity errors are causing No handlers could be found for logger “sqlalchemy.pool.QueuePool”.
I assume that every integrity error is killing each “thread” in the queuepool.
Can someone enlighten me as to what is happening?
If I can’t do much about the data at this point, what would you recommend me to do?
That error is just from the Python
loggingmodule; your pool class is trying to log some debug message but you don’t have SQLA logging configured. Configuring logging is easy, and then you can see what it’s actually trying to say.I’m not quite sure what’s going on here, but it certainly doesn’t help that you’re rolling back your top-level transaction dozens of times. Rollback ends the transaction and invalidates every live row object. That certainly won’t interact well with
yield_per.If your database supports savepoints or nesting transactions (i.e., is Postgres or Oracle… or maybe recent MySQL?), try starting a nested transaction for each attempt:
withrolls back on error and commits on success, so a failedflushwon’t wreak havoc on the rest of your main transaction.If you don’t have the backend support, the other sensible options coming to mind are:
Complexify your query: do a
LEFT JOINwith your feedback table so you know in-app whether or not the feedback row already exists.If you’re willing to make
(user_id, look_id)be your primary key, I think you can usesession.merge(feedback). This acts like an insert-or-update based on primary key: if SQLA can find an existing row with the same pk, it’ll update that, else it’ll create a new one in the database. Would risk firing off an extraSELECTfor every new row, though.