I’m using a RegEx on an XML dump of a Wikipedia article.
The Regex is = {{[a-zA-Z0-9_\(\)\|\?\s\-\,\/\=\[\]\:.]+}}
I want to detect all the text wrapped with {{ and }}.
But instead of detecting 56 matched which I got from simple search with {{, it only detects 45.
a sample block it doesn’t detect is, {{cite journal | last = Heeks | first = Richard | year = 2008 | title = Meet Marty Cooper - the inventor of the mobile phone | journal = BBC | volume = 41 | issue = 6 | url = http://news.bbc.co.uk/2/hi/programmes/click_online/8639590.stm | pages = 26–33 | doi = 10.1109/MC.2008.192 }} ..
but it detects, {{cite web | title = Of Cigarettes and Cellphones | last = Ulyseas | first = Mark | date = 2008-01-18 | url = http://www.thebalitimes.com/2008/01/18/of-cigarettes-and-cellphones/ | publisher = The Bali Times | accessdate = 2008-02-24 }}
can anyone please detect me the problem?
Some of the escaping is superfluous, but I don’t think that’s the real problem.
I recommend trying
\winstead ofa-zA-Z0-9_, especially because in .NET regex\walso recognizes Unicode letter (unless it’s in ECMAScript compliant mode).Another alternative is that if the text part can not contain
}(which right now it can’t anyway), you can also use simply{{[^}]+}}.The
[^...]is a negated character class.[^}]matches anything but}.References
Related questions