I’m implementing a network client that sends messages to a server. The messages are

Question

0

Asked: May 26, 20262026-05-26T04:22:17+00:00 2026-05-26T04:22:17+00:00

I’m implementing a network client that sends messages to a server. The messages are

0

I’m implementing a network client that sends messages to a server. The messages are streams of bytes, and the protocol requires that I send the length of each stream beforehand.

If the message that I am given (by the code using my module) is a byte string, then the length is given easily enough by length $string. But if it’s a string of characters, I’ll need to massage it to get the raw bytes. What I’m doing now is basically this:

my $msg = shift;   # some message from calling code
my $bytes;
if ( utf8::is_utf8( $msg ) ) { 
    $bytes = Encode::encode( 'utf-8', $msg );
} else { 
    $bytes = $msg;
}

my $length = length $bytes;

Is this the correct way to handle this? It seems to work so far, but I haven’t done any serious testing yet. What potential pitfalls are there with this approach?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:22:17+00:00

You shouldn’t really be guessing at what your input is. Define your code to accept either byte strings or Unicode character strings, and leave it to the caller to convert the input to the proper format (or provide some way for the caller to specify which kind of strings they’re providing).

If you define your code to accept byte strings, then any characters above \xFF are an error.

If you define your code to accept Unicode character strings, then you can convert them to bytes with Encode::encode_utf8() (and should do so regardless of how they’re internally represented by Perl).

In any case, calling utf8::is_utf8() is usually a mistake — your program should not care about the internal representation of strings, only about the actual data (a sequence of characters) they contain. Whether some of those characters (in particular, those in the range \x80 to \xFF) are internally represented by one or two bytes should not matter.

Ps. Reading perldoc Encode may help to clarify issues with bytes and characters in Perl.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m implementing a network client that sends messages to a server. The messages are

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply