I am trying to parse Wikipedia XML Dump using Parse-MediaWikiDump-1.0.4 along with Wikiprep.pl script.

Question

0

Asked: May 15, 20262026-05-15T03:54:52+00:00 2026-05-15T03:54:52+00:00

I am trying to parse Wikipedia XML Dump using Parse-MediaWikiDump-1.0.4 along with Wikiprep.pl script.

0

I am trying to parse Wikipedia XML Dump using “Parse-MediaWikiDump-1.0.4” along with “Wikiprep.pl” script. I guess this script works fine with ver0.3 Wiki XML Dumps but not with the latest ver0.4 Dumps. I get the following error.

Can’t locate object method “page” via package “Parse::MediaWikiDump::Pages” at wikiprep.pl line 390.

Also, under the “Parse-MediaWikiDump-1.0.4” documentation @ http://search.cpan.org/~triddle/Parse-MediaWikiDump-1.0.4/lib/Parse/MediaWikiDump/Pages.pm, I read “LIMITATIONS Version 0.4 This class was updated to support version 0.4 dump files from a MediaWiki instance but it does not currently support any of the new information available in those files.”

Any work arounds would help me get to the next level.

Note: one may wonder why cannot we directly use SAX or STAX parser instead, wikipedia dump is a 25GB plus single file, stack/memory issues are obvious. Hence, the above perl script resolves this issue but currently I am stuck with this version problem.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T03:54:53+00:00

Editorial Team

2026-05-15T03:54:53+00:00Added an answer on May 15, 2026 at 3:54 am

Any streaming parser should work just fine (DOM parsers would blow up). Try XML::Twig, just remember to flush (if you want to print out the XML) or purge (if you don’t care about the XML) after every major record.

Or just use XML::Parser directly. That is what both XML::Twig and Parse::MediaWikiDump are using under the hood to parse the XML.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse Wikipedia XML Dump using Parse-MediaWikiDump-1.0.4 along with Wikiprep.pl script.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply