I am trying to fetch and parse an XML file into a databse. The

Question

0

Asked: June 17, 20262026-06-17T23:14:54+00:00 2026-06-17T23:14:54+00:00

I am trying to fetch and parse an XML file into a databse. The

0

I am trying to fetch and parse an XML file into a databse. The XML is compressed in GZIP. The GZIP file is ~8MB. When I run the code locally the memory on pythonw.exe builds up to level where the entire system (Windows 7) stops responding, and when I run it online it exceeds the memory limit on Google App Engine. Not sure if the file is too big or if I am doing something wrong. Any help would be very much appreciated!

from google.appengine.ext import webapp
from google.appengine.api.urlfetch import fetch
from xml.dom.minidom import parseString
import gzip
import base64
import StringIO

class ParseCatalog(webapp.RequestHandler):
user = xxx
password = yyy
catalog = fetch('url',
                    headers={"Authorization": 
                             "Basic %s" % base64.b64encode(user + ':' + password)}, deadline=600)
xmlstring = StringIO.StringIO(catalog.content)
gz = gzip.GzipFile(fileobj=xmlstring)
gzcontent = gz.read()
contentxml = parseString(gzcontent)
items = contentxml.getElementsByTagName("Product")

for item in items:
    item = DatabaseEntry()
    item.name = str(coupon.getElementsByTagName("Manufacturer")[0].firstChild.data)
    item.put()

UPDATE

So I tried to follow BasicWolf’s suggestion to switch to LXML but am having problems importing it. I downloaded the LXML 2.3 library and put it in the folder of my app (I know this is not ideal, but it’s the only way I know how to include a 3rd party library). Also, I added following to my app.yaml:

libraries:
- name: lxml
  version: "2.3"

Then I wrote the following code to test if it parses:

import lxml

class ParseCatalog(webapp.RequestHandler):
    user = xxx
    password = yyy
    catalog = fetch('url',
                    headers={"Authorization": 
                             "Basic %s" % base64.b64encode(user + ':' + password)}, deadline=600)
    items = etree.iterparse(catalog.content)

    def get(self): 
       for elem in items:
           self.response.out.write(str(elem.tag))

However this is resulting in the following error:

ImportError: cannot import name etree

I have checked other questions on this error and it seems that the fact that I run on Windows 7 might play a role. I also tried to install the pre-compiled binary packages from http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml but that didn’t change anything either.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T23:14:55+00:00

Editorial Team

2026-06-17T23:14:55+00:00Added an answer on June 17, 2026 at 11:14 pm

What do you expect? First, you read a string into the memory, then – unzip it into the memory, then – construct a DOM tree, still in the memory.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to fetch and parse an XML file into a databse. The

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply