I’m using SimpleDB for my application. Everything goes well unless the limitation of one

Question

0

Asked: June 2, 20262026-06-02T18:19:59+00:00 2026-06-02T18:19:59+00:00

I’m using SimpleDB for my application. Everything goes well unless the limitation of one

0

I’m using SimpleDB for my application. Everything goes well unless the limitation of one attribute is 1024 bytes. So for a long string I have to chop the string into chunks and save it.

My problem is that sometimes my string contains unicode character (chinese, japanese, greek) and the substr() function is based on character count not byte.

I tried to use use bytes for byte semantic or later
substr(encode_utf8($str), $start, $length) but it does not help at all.

Any help would be appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T18:20:01+00:00

UTF-8 was engineered so that character boundaries are easy to detect. To split the string into chunks of valid UTF-8, you can simply use the following:

my $utf8 = encode_utf8($text);
my @utf8_chunks = $utf8 =~ /\G(.{1,1024})(?![\x80-\xBF])/sg;

Then either

# The saving code expects bytes.
store($_) for @utf8_chunks;

or

# The saving code expects decoded text.
store(decode_utf8($_)) for @utf8_chunks;

Demonstration:

$ perl -e'
    use Encode qw( encode_utf8 );

    # This character encodes to three bytes using UTF-8.
    my $text = "\N{U+2660}" x 342;

    my $utf8 = encode_utf8($text);
    my @utf8_chunks = $utf8 =~ /\G(.{1,1024})(?![\x80-\xBF])/sg;

    CORE::say(length($_)) for @utf8_chunks;
'
1023
3

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using SimpleDB for my application. Everything goes well unless the limitation of one

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply