I’m using a RegEx on an XML dump of a Wikipedia article. The Regex

Question

0

Editorial Team

Asked: May 16, 20262026-05-16T01:13:13+00:00 2026-05-16T01:13:13+00:00

I’m using a RegEx on an XML dump of a Wikipedia article. The Regex

0

I’m using a RegEx on an XML dump of a Wikipedia article.

The Regex is = {{[a-zA-Z0-9_\(\)\|\?\s\-\,\/\=\[\]\:.]+}}

I want to detect all the text wrapped with {{ and }}.
But instead of detecting 56 matched which I got from simple search with {{, it only detects 45.

a sample block it doesn’t detect is, {{cite journal | last = Heeks | first = Richard | year = 2008 | title = Meet Marty Cooper - the inventor of the mobile phone | journal = BBC | volume = 41 | issue = 6 | url = http://news.bbc.co.uk/2/hi/programmes/click_online/8639590.stm | pages = 26–33 | doi = 10.1109/MC.2008.192 }} ..

but it detects, {{cite web | title = Of Cigarettes and Cellphones | last = Ulyseas | first = Mark | date = 2008-01-18 | url = http://www.thebalitimes.com/2008/01/18/of-cigarettes-and-cellphones/ | publisher = The Bali Times | accessdate = 2008-02-24 }}

can anyone please detect me the problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T01:13:13+00:00

Some of the escaping is superfluous, but I don’t think that’s the real problem.

I recommend trying \w instead of a-zA-Z0-9_, especially because in .NET regex \w also recognizes Unicode letter (unless it’s in ECMAScript compliant mode).

Another alternative is that if the text part can not contain } (which right now it can’t anyway), you can also use simply {{[^}]+}}.

The [^...] is a negated character class. [^}] matches anything but }.

References

regular-expressions.info/Character Class

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using a RegEx on an XML dump of a Wikipedia article. The Regex

Leave an answerCancel reply

1 Answer

References

Related questions

Leave an answer
Cancel reply