How to split the following string with all ‘\x000’, ‘\x001’, ‘\x002’?
I tried the regular expression like the following but it didn’t work!
z = re.compile(r'[\x000\x001\x002\x003\x004\x005]:')
line = '114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"\x000:1342434156.712809 get_cache http://www.nownews.com/\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww\x000:1342434156.731564 new version\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]'
z.split(line)
EDIT1
There are \x000, \x001, \x002…. in the string. I want to split the string with these characters.
The expected output should be:
['114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"', '\x000:1342434156.712809 get_cache http://www.nownews.com/', '\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww', '\x000:1342434156.731564 new version', '\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]']
\x000is a two-byte string, consisting of\x00(hex0x00) and0(hex0x30).Therefore, you can’t use it in a character class like this. But
works. By enclosing the regex into parentheses, the delimiters will also become part of the resulting list, although not directly joined to the part of the string they have split off (as in your edited question).
If you do want to keep the delimiters as part of the resulting strings, you can’t use
.split(). Instead, use.findall():Explanation: