gb2312 is a double byte character set, using mb_strlen() to check a single chinese

Question

0

Asked: June 13, 20262026-06-13T09:40:24+00:00 2026-06-13T09:40:24+00:00

gb2312 is a double byte character set, using mb_strlen() to check a single chinese

0

gb2312 is a double byte character set, using mb_strlen() to check a single chinese character will return 2, but for 2 more characters,sometimes the result is weird, anybody know why? how can I get the right length?

<?php
header('Content-type: text/html;charset=utf-8');//
$a="大";
echo mb_strlen($a,'gb2312'); // output 2
echo mb_strlen($a.$a,'gb2312'); // output 3 , it should be 4
echo mb_strlen($a.'a','gb2312'); // output 2, it should be 3
echo mb_strlen('a'.$a,'gb2312'); // output 3, 
?>

thanks deceze, your document is very helpful, people know little about encoding like me should read it.What every programmer absolutely, positively needs to know about encodings and character sets to work with text

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T09:40:25+00:00

Your string is probably stored as UTF-8.

The UTF-8 code for "大" is E5 A4 A7 (according to this webpage), so:

$a       // 3 bytes, gb2312 -> 2 char (1 + 0.5)
$a . $a  // 6 bytes, gb2312 -> 3 char
$a . 'a' // 4 bytes, gb2312 -> 2 char
'a' . $a // 4 bytes, first byte is <128 so will be interpreted as one
         // single character, gb2312 -> 3 char

This is just a guess, but perfectly make sense to me if thinking this way. You can probably refer to this wikipedia page.

If you really want to test, I recommend you to create a separated file saved in gb2312 encoding, and use fopen or whatever to read it. Then you will be sure that it is in the desired encoding.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

gb2312 is a double byte character set, using mb_strlen() to check a single chinese

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply