If you need it more than once, you could write…

Question

0

Editorial Team

Asked: May 13, 20262026-05-13T09:08:29+00:00 2026-05-13T09:08:29+00:00

I’m having a hard time understanding this regex stuff… I have a string like

0

I’m having a hard time understanding this regex stuff…

I have a string like this:

<wn20schema:NounSynset rdf:about="&dn;synset-56242" rdfs:label="{saddelmageri_1}">

I want to use findall() and groups to get this:

['56242','saddelmageri']

I can match the number with something like “synset-[0-9]” and the word with something like “{(.*?)}” but how do I write it to get the above result?

And here’s a follow-up question – some lines look like this:

<wn20schema:NounSynset rdf:about="&dn;synset-2589" rdfs:label="**{cykel_3: trehjulet cykel; tricykel,1_1}**">

In this case I want to extract the stuff between the {} with this result:

['2589', ['cykel', 'trehjulet cykel', 'tricykel']]

so that I can drop it in a dictionary later as a key(2589) : value([‘cykel’, ‘trehjulet cykel’, ‘tricykel’]) pair.

Any thoughts?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T09:08:29+00:00

Since this appears to be xml data, you would be better off using an xml parser, since parsing xml with regular expressions is very, very difficult to do right.

However, since you specifically asked for a regular expression…

Your specifications are a bit imprecise, and with regular expressions you need to be very precise in what constitutes a match. For example, will the rdfs:label value always have a _1 that you want to strip off? Will there always only be one of these blocks of data per line, or multiple per line? Also, is the order of the result important?

Here’s a quick hack that might give you close to what you want:

import re
data=r'<wn20schema:NounSynset rdf:about="&dn;synset-56242" rdfs:label="{saddelmageri_1}">"'

matches=re.findall('synset-([0-9]+).*label="{(.*)_1}"', data)
print "matches:", matches

When I run the above, I get the following output, which is a list of two-tuples containing the two strings you wanted (though in a different order):

matches: [('56242', 'saddelmageri')]

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions