I am getting a different output in python 2 and 3 when I execute same code for regular expression.
Suppose this is the data that I want which is located somewhere in the webpage.
source = ['\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e',
'\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e',
'\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e',
'\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e']
So, when I run the below code in python 2.6, it works perfect. I’m getting exact output like above.
match = re.findall("\x1e\x1e\S+",source)
But when I execute it in python 3.3 like:
match = re.findall("\x1e\x1e\S+", str(source))
I’m getting the output of the match variable like:
['\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log','\x1e\x1e5.5.30-log']
So, would you please tell me that why it’s not taking whole string in python 3? Why it’s skipping \x1epcofiowa@localhost\x1epcofiowa_pci\x1e each time? I want output like python 2.6.
So, I am clueless at this moment. I’m waiting for your reply. Thanks.
Seems like
\Sbehave differently inPython 2andPython 3.According to Python 3 re module docs: –
Now, since
\x1e(equivalent toU+001E, that comes after your\x1e\x1e5.5.30-logis aunicode whitespacecharacter – reference to activestate, so it is not matched by\Sin Python 3.Whereas in Python 2: –
So, it only considers the
ASCIIcharacter set for matching non-whitespace, and hence it matches\x1e.