Dive into Python: XML Processing – Here I am referring to a portion of

Question

0

Editorial Team

Asked: May 24, 20262026-05-24T05:35:41+00:00 2026-05-24T05:35:41+00:00

Dive into Python: XML Processing – Here I am referring to a portion of

0

Dive into Python: XML Processing –

Here I am referring to a portion of kgp.py program –

def getDefaultSource(self):
  xrefs = {}
  for xref in self.grammar.getElementsByTagName("xref"):
    xrefs[xref.attributes["id"].value] = 1
  xrefs = xrefs.keys()
  standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
  if not standaloneXrefs:
    raise NoSourceError, "can't guess source, and no source specified"
  return '<xref id="%s"/>' % random.choice(standaloneXrefs)

self.grammar: parsed XML representation (using xml.dom.minidom) of –

<?xml version="1.0" ?>
<grammar>
<ref id="bit">
  <p>0</p>
  <p>1</p>
</ref>
<ref id="byte">
  <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\
<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>
</ref>
</grammar>

self.refs: is the caching of all the refs of the above XML key’d by their id

I have two doubts with this code:

Doubt 1:

  for xref in self.grammar.getElementsByTagName("xref"):
    xrefs[xref.attributes["id"].value] = 1
  xrefs = xrefs.keys()

eventaully xrefs holds the id values in a list. Couldn’t we have done this simply by –

  xrefs = [xref.attributes["id"].value 
           for xref in self.grammar.getElementsByTagName("xref")]

Doubt 2:

  standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]
  ...
  return '<xref id="%s"/>' % random.choice(standaloneXrefs)

Here, we are saving the ref from self.refs which we do NOT see in our computed xrefs. But next instead of creating a <ref> element, we are creating a <xref> with the same ID. This takes us one step backward, since later we are anyway going to find the cross reference for this computed <xref> and eventually reach the <ref>. We could have just started with this <ref> in the first place.

Disclaimer

I am in no way trying to make a remark on the book. I am not even qualified for that.

I am loving every moment of reading this book. I realize few chapters have gone outdated, but I love Mark Pilgrim’s writing style and I cannot stop reading.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T05:35:42+00:00

Dive Into Python is seven years old now (published 2004), and doesn’t always contain the most modern code. So you need to go easy on it: Dive Into Python 3 might be a better bet.

Your suggestion for doubt 1 changes the meaning of the code: putting the ids into the keys of a dictionary and then getting them out again eliminates duplicates, whereas your list comprehension includes duplicates. The modern approach would be to use a set comprehension:

 xrefs = {xref.attributes["id"].value 
          for xref in self.grammar.getElementsByTagName("xref")}

but this wasn’t available in 2004.

On your doubt 2, I’m not entirely sure I see the problem. Yes, in some sense this is a waste, but on the other hand the code already has a handler for the xref case, so it makes sense to re-use that handler rather than add an extra special case.

There are several other bits of code in that example that could be modernized. For example,

source and source or self.getDefaultSource()

would now be source or self.getDefaultSource(). And the line

standaloneXrefs = [e for e in self.refs.keys() if e not in xrefs]

would be better expressed as a set difference operation, something like:

standaloneXrefs = set(self.refs) - set(xrefs)

But that’s what happens as languages become more expressive: old code starts to look rather inelegant.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Dive into Python: XML Processing – Here I am referring to a portion of

Disclaimer

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply