Having an utf-8 encoded string like this: bar = hello ｡◕‿‿◕｡ and a bytes

Question

0

Asked: May 25, 20262026-05-25T16:11:04+00:00 2026-05-25T16:11:04+00:00

Having an utf-8 encoded string like this: bar = hello ｡◕‿‿◕｡ and a bytes

0

Having an utf-8 encoded string like this:

bar = "hello ｡◕‿‿◕｡"

and a bytes offset that tells me at which byte I have to split the string:

bytes_offset = 9

how can I split the bar string in two parts resulting in:

>>first_part 
'hello ｡' <---- #9 bytes 'hello \xef\xbd\xa1'
>>second_part 
'◕‿‿◕｡'

In a nutshell:
given a bytes offset, how can I transform it in the actual char index position of an utf-8 encoded string?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T16:11:04+00:00

UTF-8 Python 2.x strings are basically byte strings.

# -*- coding: utf-8 -*- 

bar = "hello ｡◕‿‿◕｡"
assert(isinstance(bar, str))

first_part = bar[:9]
second_part = bar[9:]
print first_part
print second_part

Yields:

hello ｡
◕‿‿◕｡

Python 2.6 on OSX here but I expect the same from 2.7. If I split on 10 or 11 instead of 9, I get ? characters output implying that it broke the sequence of bytes in the middle of a multibyte character sequence; splitting on 12 moves the first “eyeball” to the first part of the string.

I have PYTHONIOENCODING set to utf8 in the terminal.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Having an utf-8 encoded string like this: bar = hello ｡◕‿‿◕｡ and a bytes

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply