[Python/MySQLdb] – CentOS – Linux – VPS
I have a page that parses a large file and queries the datase up to 100 times for each run. The database is pretty large and I’m trying to reduce the execution time of this script.
My SQL functions are inside a class, currently the connection object is a class variable created when the class is instantiated. I have various fetch and query functions that create a cursor from the connection object every time they are called. Would it be faster to create the cursor when the connection object is created and reuse it or would it be better practice to create the cursor every time it’s called?
import MySQLdb as mdb
class parse:
con = mdb.connect( server, username, password, dbname )
#cur = con.cursor() ## create here?
def q( self, q ):
cur = self.con.cursor() ## it's currently here
cur.execute( q )
Any other suggestions on how to speed up the script are welcome too. The insert statement is the same for all the queries in the script.
Opening and closing connections is never free, it always wastes some amount of performance.
The reason you wouldn’t want to just leave the connection open is that if two requests were to come in at the same time the second request would have to wait till the first request had finished before it could do any work.
One way to solve this is to use connection pooling. You create a bunch of open connections and then reuse them. Every time you need to do a query you check a connection out of the pool, preform the request and then put it back into the pool.
Setting all this up can be quite tedious, so I would recommend using SQLAlchemy. It has built in connection pooling, relatively low overhead and supports MySQL.
Since you care about speed I would only use the core part of SQLAlchemy since the ORM part comes is a bit slower.