I’m having a difficult time figuring out why a simple SELECT query is taking

Question

0

Asked: June 14, 20262026-06-14T17:06:23+00:00 2026-06-14T17:06:23+00:00

I’m having a difficult time figuring out why a simple SELECT query is taking

0

I’m having a difficult time figuring out why a simple SELECT query is taking such a long time with sqlalchemy using raw SQL (I’m getting 14600 rows/sec, but when running the same query through psycopg2 without sqlalchemy, I’m getting 38421 rows/sec).

After some poking around, I realized that toggling sqlalchemy’s use_native_unicode parameter in the create_engine call actually makes a huge difference.

This query takes 0.5secs to retrieve 7300 rows:

from sqlalchemy import create_engine

engine = create_engine("postgresql+psycopg2://localhost...",
                       use_native_unicode=True)
r = engine.execute("SELECT * FROM logtable")
fetched_results = r.fetchall()

This query takes 0.19secs to retrieve the same 7300 rows:

engine = create_engine("postgresql+psycopg2://localhost...",
                       use_native_unicode=False)
r = engine.execute("SELECT * FROM logtable")
fetched_results = r.fetchall()

The only difference between the 2 queries is use_native_unicode. But sqlalchemy’s own docs state that it is better to keep use_native_unicode=True (http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html).

Does anyone know why use_native_unicode is making such a big performance difference? And what are the ramifications of turning off use_native_unicode?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T17:06:24+00:00

this issue is something you need to decide based on how much non-ASCII data you’re dealing with. psycopg2’s method of decoding unicode is faster than that of SQLAlchemy’s, assuming SQLA’s C extensions are not in use, but still adds latency for a result set versus not doing any kind of unicode conversion. In the code above, SQLAlchemy’s unicode facilities are not used; these are only used when a column is mapped to the Unicode or String types, which can only happen if you are using text(), select(), or an ORM-level equivalent, where a Unicode type is mapped to those result set columns using Table metadata the “typemap” parameter of text().

Psycopg2’s native unicode facilities OTOH take effect at the cursor level so are always in effect, and apparently add some latency overall.

Below is a series of illustrations of how the different methods work. the last one is the one most similar to that of SQLAlchemy, although when using SQLAlchemy’s C extensions we are probably just a fast as psycopg2:

import psycopg2
from psycopg2 import extensions

conn = psycopg2.connect(user='scott', password='tiger', host='localhost', database='test')

cursor = conn.cursor()
cursor.execute("""
create table data (
    id SERIAL primary key,
    data varchar(500)
)
""")

cursor.executemany("insert into data (data) values (%(data)s)", [
        {"data":"abcdefghij" * 50} for i in xrange(10000)
    ])
cursor.close()


def one(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT data FROM data")
    for row in cursor:
        row[0]

def two(conn):
    cursor = conn.cursor()
    extensions.register_type(extensions.UNICODE, cursor)
    cursor.execute("SELECT data FROM data")
    for row in cursor:
        row[0]

def three(conn):
    cursor = conn.cursor()
    cursor.execute("SELECT data FROM data")
    for row in cursor:
        row[0].decode('utf-8')

def four(conn):
    cursor = conn.cursor()
    def conv_unicode(value):
        return value.decode('utf-8')
    cursor.execute("SELECT data FROM data")
    for row in cursor:
        conv_unicode(row[0])

import timeit

print "no unicode:", timeit.timeit("one(conn)", "from __main__ import conn, one", number=100)

print "native unicode:", timeit.timeit("two(conn)", "from __main__ import conn, two", number=100)

print "in Python unicode:", timeit.timeit("three(conn)", "from __main__ import conn, three", number=100)

print "more like SQLA's unicode:", timeit.timeit("four(conn)", "from __main__ import conn, four", number=100)

the timings I get:

no unicode: 2.10434007645
native unicode: 4.52875208855
in Python unicode: 4.77912807465
more like SQLA's unicode: 4.88325881958

so what’s interesting here, is that SQLA’s approach, if perhaps we used the C extensions, might actually be a better choice than psycopg2’s native approach, if in fact you don’t make a lot of use of the Unicode type and most of your string values are only pure ASCII.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having a difficult time figuring out why a simple SELECT query is taking

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply