I’m trying to parse utf-8 encoded text files uploaded via a multipart/form-data form. I

Question

0

Asked: May 30, 20262026-05-30T05:42:10+00:00 2026-05-30T05:42:10+00:00

I’m trying to parse utf-8 encoded text files uploaded via a multipart/form-data form. I

0

I’m trying to parse utf-8 encoded text files uploaded via a multipart/form-data form. I have built a small .txt file where I have entered some tab delimited (meaningless) text in both latin and Japanese characters (I copy/pasted the Jpz characters from a Jpz retail site).

All I am trying at this point is to replace new lines by (LINE) and tabs by (TAB). Here is my code:

...
$text=file_get_contents($_FILES['upload']['tmp_name']);

$LineArray=array('\r\n','\n\r','\r','\n');
foreach ($LineArray as $value){
  $pieces=(mb_split($value,$text));
  $text=implode ("(LINE)",$pieces);
}
echo "Here is the modified text:<br/>";
echo $text;
echo "<br/>";
var_dump($text);

$tab='\t';
$pieces=(mb_split($tab,$text));
$text=implode ("(TAB)",$pieces);
echo "Here is the modified text:<br/>";
echo $text;
echo "<br/>";
var_dump($text);
...

Here is a vardump of the text before modification:

string 'John    Fitzgerald  Kennedy

Winston     Churchill

John    Edgar   Hoover

素材の 生地を柿渋で染 めた和柄パンツです





火车票 火车票 火车票 火车票



' (length=175)

The first line of Asian characters has 2 tabs, the last line of the file has 3 tabs.

Here is a vardump of the text after all modifications:

string 'John(TAB)Fitzgerald(TAB)Kennedy(LINE)Winston(TAB)(TAB)Churchill(LINE)John(TAB)Edgar(TAB)Hoover(LINE)素材の 生地を柿渋で染(TAB)めた和柄パンツです(LINE)(LINE)(LINE)火车票  火车票 火车票 火车票(LINE)(LINE)' (length=235)

How come my code can only identify one of the tabs in the Japanese text part?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T05:42:11+00:00

Editorial Team

2026-05-30T05:42:11+00:00Added an answer on May 30, 2026 at 5:42 am

mb_split uses the value of mb_regex_encoding to determine what encoding to process the string in. This value is probably not set to UTF-8 and hence mb_split doesn’t expect/work on the correct encoding. Try setting the mb_regex_encoding to UTF-8.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse utf-8 encoded text files uploaded via a multipart/form-data form. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply