I need help on some regex problem with chinese characters in python.
“拉柏多公园” is the correct form of the word, but in a text i found “拉柏 多公 园”, what regex should i use to replace the characters.
import re
name = "拉柏多公园"
line = "whatever whatever it is then there comes a 拉柏 多公 园 sort of thing"
line2 = "whatever whatever it is then there comes another拉柏 多公 园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏 多公 园sort of thing"
line4 = "whatever whatever it is then there comes a拉柏 多公 园sort of thing"
firstchar = "拉"
lastchar = "园"
i need to replace the strings in the lines so that the output line will look like this
line = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
line2 = "whatever whatever it is then there comes another 拉柏多公园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏多公园 sort of thing"
line4 = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
i tried these to but the regex is badly structured:
reline = line.replace (r"firstchar*lastchar", name) #
reline2 = reline.replace (" ", " ")
print reline2
can someone help to correct my regex?
Thanks
(I assume you’re using python 3, since you’re using unicode characters in regular strings. For python 2, add
ubefore each string literal.)Python 3
So you can replace each string with
Python 2
So you can replace each string with
Discussion
The result will be surrounded by spaces. More generally, if you want it to work at the start or end of the line, or before commas or periods, you’ll have to replace
' ' + name + ' 'with something more sophisticated.Edit: fixed. Of course, you have to use the
relibrary function.