I try to understand the regex in python. How can i split the following

Question

0

Asked: June 5, 20262026-06-05T19:46:21+00:00 2026-06-05T19:46:21+00:00

I try to understand the regex in python. How can i split the following

0

I try to understand the regex in python. How can i split the following sentence with regular expression?

"familyname, Givenname A.15.10"

this is like the phonebook in python regex http://docs.python.org/library/re.html. The person maybe have 2 or more familynames and 2 or more givennames. After the familynames exist ‘, ‘ and after givennames exist ”. the last one is the office of the person. What i did until know is

 import re
 file=open('file.txt','r')
 data=file.readlines()
 for i in range(90):
person=re.split('[,\.]',data[i],maxsplit=2)
print(person)

it gives me a result like this

 ['Wegner', ' Sven Ake G', '15.10\n']

i want to have something like

 ['Wegner', ' Sven Ake', 'G', '15', '10']. any idea?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T19:46:23+00:00

In the regex world it’s often easier to “match” rather than “split”. When you’re “matching” you tell the RE engine directly what kinds of substrings you’re looking for, instead of concentrating on separating characters. The requirements in your question are a bit unclear, but let’s assume that

“surname” is everything before the first comma
“name” is everything before the “office”
“office” consists of non-space characters at the end of the string

This translates to regex language like this:

rr = r"""
    ^         # begin
    ([^,]+)   # match everything but a comma
    (.+?)     # match everything, until next match occurs
    (\S+)     # non-space characters
    $         # end
"""

Testing:

import re
rr = re.compile(rr, re.VERBOSE)
print rr.findall("de Batz de Castelmore d'Artagnan, Charles Ogier W.12.345")
# [("de Batz de Castelmore d'Artagnan", ', Charles Ogier ', 'W.12.345')]

Update:

rr = r"""
    ^         # begin
    ([^,]+)   # match everything but a comma
    [,\s]+    # a comma and spaces
    (.+?)     # match everything until the next match
    \s*       # spaces
    ([A-Z])   # an uppercase letter
    \.        # a dot
    (\d+)     # some digits
    \.        # a dot
    (\d+)     # some digits
    \s*       # maybe some spaces or newlines
    $         # end
"""

import re
rr = re.compile(rr, re.VERBOSE)
s = 'Wegner, Sven Ake G.15.10\n' 
print rr.findall(s)
# [('Wegner', 'Sven Ake', 'G', '15', '10')]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I try to understand the regex in python. How can i split the following

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply