I wanted to ask what known existing Python 2.x libraries there are for parsing

Question

0

Asked: May 16, 20262026-05-16T11:59:24+00:00 2026-05-16T11:59:24+00:00

I wanted to ask what known existing Python 2.x libraries there are for parsing

0

I wanted to ask what known existing Python 2.x libraries there are for parsing an XML document with built-in DTD without automatically expanding the entities. (File in question for those curious: JMdict.)

It seems lxml has some option for not parsing the entities, but last I tried, the entities just ended up being converted to blanks. I just googled this and found pxdom as another alternative which I may try, but since it’s pure Python it seems far slower than I’d like.

Anything else out there?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T11:59:24+00:00

It seems that the use case is rather abnormal; not expanding entities seems to go against the way parsers are generally supposed to work according to the XML spec.

So, I think it’s easiest to just kludge this perhaps. I’ve manually extracted the tags via re.finditer, and have made a dictionary of the mappings. From here, it’s just a matter of scanning the parsed output and doing the right thing for my app. Good enough for my use case I think.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wanted to ask what known existing Python 2.x libraries there are for parsing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply