I’m having a difficult time figuring out why a simple SELECT query is taking such a long time with sqlalchemy using raw SQL (I’m getting 14600 rows/sec, but when running the same query through psycopg2 without sqlalchemy, I’m getting 38421 rows/sec).
After some poking around, I realized that toggling sqlalchemy’s use_native_unicode parameter in the create_engine call actually makes a huge difference.
This query takes 0.5secs to retrieve 7300 rows:
from sqlalchemy import create_engine
engine = create_engine("postgresql+psycopg2://localhost...",
use_native_unicode=True)
r = engine.execute("SELECT * FROM logtable")
fetched_results = r.fetchall()
This query takes 0.19secs to retrieve the same 7300 rows:
engine = create_engine("postgresql+psycopg2://localhost...",
use_native_unicode=False)
r = engine.execute("SELECT * FROM logtable")
fetched_results = r.fetchall()
The only difference between the 2 queries is use_native_unicode. But sqlalchemy’s own docs state that it is better to keep use_native_unicode=True (http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html).
Does anyone know why use_native_unicode is making such a big performance difference? And what are the ramifications of turning off use_native_unicode?
this issue is something you need to decide based on how much non-ASCII data you’re dealing with. psycopg2’s method of decoding unicode is faster than that of SQLAlchemy’s, assuming SQLA’s C extensions are not in use, but still adds latency for a result set versus not doing any kind of unicode conversion. In the code above, SQLAlchemy’s unicode facilities are not used; these are only used when a column is mapped to the Unicode or String types, which can only happen if you are using text(), select(), or an ORM-level equivalent, where a Unicode type is mapped to those result set columns using Table metadata the “typemap” parameter of text().
Psycopg2’s native unicode facilities OTOH take effect at the cursor level so are always in effect, and apparently add some latency overall.
Below is a series of illustrations of how the different methods work. the last one is the one most similar to that of SQLAlchemy, although when using SQLAlchemy’s C extensions we are probably just a fast as psycopg2:
the timings I get:
so what’s interesting here, is that SQLA’s approach, if perhaps we used the C extensions, might actually be a better choice than psycopg2’s native approach, if in fact you don’t make a lot of use of the Unicode type and most of your string values are only pure ASCII.