HI take a look at the following code snippet on Python 2.7:
# -*- coding: utf-8 -*-
content = u"<p>和製英語とかカタカナ英語、<br/>ジャパングリッシュなどと呼ばれる英語っぽいけど実は英語じゃない言葉があります。</p>"
#print content
print content.replace(u"<p>",u"<div>").replace(u"</p>",u"</div>").replace(u"<br/>",u"")
print content.replace("<p>","<div>").replace("</p>","</div>").replace("<br/>","")
print content.replace(r"<p>",r"<div>").replace(r"</p>",r"</div>").replace(r"<br/>",r"")
The result is the same:
<div>和製英語とかカタカナ英語、ジャパングリッシュなどと呼ばれる英語っぽいけど実は英語じゃない言葉があります。</div>
My questions is: is there any difference between the three “replace” statements? (u, r or none?) Which one is the best?
The first one is best. The second two options have to implicitly convert their byte strings to Unicode to do the replacement on the Unicode content string. Otherwise, with the strings provided, the result happens to be the same. If the replacement strings contained non-ASCII characters, there would be a UnicodeDecodeError on the second two because the default codec for the conversion is
asciion Python 2.X.Note the speed difference as well: