Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7574203
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T16:22:42+00:00 2026-05-30T16:22:42+00:00

I need some help understanding how Python and postgres handle transactions and bulk inserts

  • 0

I need some help understanding how Python and postgres handle transactions and bulk inserts specifically when inserting several data sets in a single transaction.
Environment:

  • Windows 7 64bit
  • Python 3.2
  • Postgresql 9.1
  • psycopg2

Here is my scenario:
I am converting data from one database(oracle) into xml strings and inserting that data into a new database(postgres). This is a large dataset so I’m trying to optimize some of my inserts. A lot of this data I’m considering library type objects, so I have a library table and then tables for my xml metadata and xml content, the fields for this data are text types in the database. I pull the data out of oracle and then I am creating dictionaries of the data I need to insert. I have 3 insert statements, the first insert creates a record in the library table using a serial id, and that id is necessary for the relationship in the next two queries that insert the xml into the metadata and content tables. Here is an example of what I’m talking about:

for inputKey in libDataDict.keys():
  metaString = libDataDict[inputKey][0]
  contentString = libDataDict[inputKey][1]
  insertLibDataList.append({'objIdent':"%s" % inputKey, 'objName':"%s" % inputKey, objType':libType})
  insertMetadataDataList.append({'objIdent':inputKey,'objMetadata':metaString}) 
  insertContentDataList.append({'objIdent':inputKey, 'objContent':contentString})

dataDict['cmsLibInsert'] = insertLibDataList
dataDict['cmsLibMetadataInsert'] = insertMetadataDataList
dataDict['cmsLibContentInsert'] = insertContentDataList

sqlDict[0] = {'sqlString':"insert into cms_libraries (cms_library_ident, cms_library_name, cms_library_type_id, cms_library_status_id) \
              values (%(objIdent)s, %(objName)s, (select id from cms_library_types where cms_library_type_name = %(objType)s), \
              (select id from cms_library_status where cms_library_status_name = 'active'))", 'data':dataDict['cmsLibInsert']}

sqlDict[1] = {'sqlString':"insert into cms_library_metadata (cms_library_id, cms_library_metadata_data) values \
              ((select id from cms_libraries where cms_library_ident = %(objIdent)s), $$%(objMetadata)s$$)", \
              'data':dataDict['cmsLibMetadataInsert']}

sqlDict[2] = {'sqlString':"insert into cms_library_content (cms_library_id, cms_library_content_data) values \
              ((select id from cms_libraries where cms_library_ident = %(objIdent)s), $$%(objContent)s$$)", \
              'data':dataDict['cmsLibContentInsert']}

bulkLoadData(myConfig['pgConn'], myConfig['pgCursor'], sqlDict)

The problem I have is when I run the first query(sqlDict[0]) and do the insert everything works fine as long as I do it separate and commit before I run the next two. Ideally I would like all these queries in the same transaction, but it fails because it can’t find the id from cms_libraries table for the 2nd and 3rd queries.
Here is my current insert code:

def bulkLoadData(dbConn, dbCursor, sqlDict):
 try:
   libInsertSql = sqlDict.pop(0)
   dbSql = libInsertSql['sqlString']
   data = libInsertSql['data']
   dbCursor.executemany(dbSql, data)
   dbConn.commit()
   for sqlKey in sqlDict:
     dbSql = sqlDict[sqlKey]['sqlString']
     data = sqlDict[sqlKey]['data']
     dbCursor.executemany(dbSql, data)

   dbConn.commit()

Previously I was appending the values into the query and then running a query for each insert. When I do that I can put it all in the same transaction and it finds the generated id and everything is fine. I don’t understand why it doesn’t find the id when I do the bulk insert with executemany()? Is there a way to do the bulk insert and the other two queries in the same transaction?

I have been reading this documentation and searching stackoverflow and the internet but have not found an answer to my problem:
pyscopg docs
as well as postgres’s:
Postgresql string docs

Any help, suggestions, or comments would be appreciated.
Thanks,
Mitch

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T16:22:43+00:00Added an answer on May 30, 2026 at 4:22 pm

    You have two choices here. Either generate the IDs externally (which allows you to do your bulk inserts) or generate them from the serial (which means you have to do single entry inserts). I think it’s pretty straight-forward figuring out external ID generation and bulk loading (although I’d recommend you take a look at an ETL tool rather than hand-coding something in python). If you need to pull IDs from the serial, then you should consider server-side prepared statements.

    Your first statement should look like the following:

    dbCursor.execute("""
    PREPARE cms_lib_insert (bigint, text, text) AS 
    INSERT INTO cms_libraries (cms_library_ident, cms_library_name, cms_library_type_id, cms_library_status_id)
    VALUES ($1, $2,
        (select id from cms_library_types where cms_library_type_name = $3), 
        (select id from cms_library_status where cms_library_status_name = 'active')
    )
    RETURNING cms_library.id
    """)
    

    You’ll run this once, at startup time. Then you’ll want to be running the following EXECUTE statement on a per-entry level.

    dbCursor.execute("""
    EXECUTE cms_lib_insert(%(objIndent)s, %(objName)s, %(objType)s)
    """, {'objIndent': 345, 'objName': 'foo', 'objType': 'bar'))
    my_new_id = dbCursor.fetchone()[0]
    

    This will return the generated serial id. Going forward, I’d strongly recommend that you get away from the pattern that you’re currently following of attempting to abstract the database communications (your sqlDict approach) and go with very direct coding pattern (clever is your enemy here, it makes performance tuning harder).

    You’ll want to batch your inserts into a block size that works for performance. That means tuning your BLOCK_SIZE based on your actual behavior. Your code should look something like the following:

    BLOCK_SIZE = 500
    while not_done:
       dbCursor.begin()
       for junk in irange(BLOCK_SIZE):
           dbCursor.execute("EXECUTE cms_lib_insert(...)")
           cms_lib_id = dbCursor.fetchone()[0]     # you're using this below.
           dbCursor.executemany("EXECUTE metadata_insert(...)")
           dbCursor.executemany("EXECUTE library_insert(...)")
       dbCursor.commit()
    

    If you need to achieve performance levels higher than this, the next step is building an insert handler function which takes arrays of rows for the dependent tables. I do not recommend doing this as it quickly becomes a maintenance nightmare.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need some help in understanding a python concept. class TilePuzzleProblem(search.Problem): This class is
I'm brand new to python, and need some help understanding this code fragment: for
I need some help understanding how to utilize fgetcsv and arrays to manipulate data.
I need some help understanding this bit of code pre { white-space: pre; white-space:
I need some help in understanding when and why to use Remote Service instead
I need some help in understanding what is happening here .I am getting a
I have been trying out Cassandra and need some help in understanding a few
Need some help understanding why my concat() is failing and how to fix it.
I need some help understanding some of the points from Paul Graham’s What Made
I need some help understanding what's happening here. This code is from a models/log.py

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.