I have a PHP script which fetches data from Salesforce via the Salesforce API and writes the output to a file using file_put_contents. The data is a mixture of Korean characters and English characters.
When I run the script on a box (1) running Red Hat Enterprise Linux ES release 4 (Nahant Update 8) with PHP 5.2.8 and a similar box (2) running PHP 5.3.6 the spaces in between the Korean characters disappear.
e.g. (Using K to represent a Korean character and E to represent an English character)
EEEEEEEEK KKK KKKK EEE KKKK is appearing as EEEEEEEEKKKKKKKK EEE KKKK
However when I run the script on a box (3) running CentOs with PHP 5.3.5 or on (4) my local windows machine with PHP 5.3.6 the text in the file is correct.
Can anyone suggest what the problem might be?
EDIT – Originally I was accessing the php script via a browser however to (hopefully) simplify the problem I am currently storing the output in a text file and downloading it to my windows machine.
EDIT – Hex version
Original text – CFD란 무엇입니까?
Hex from (1) – 43 46 44 eb 9e 80 eb ac b4 ec 97 87 ec 9e 85 eb 8b 88 ea b9 8c 3f
Hex from (3) – 43 46 44 eb 9e 80 20 eb ac b4 ec 97 87 ec 9e 85 eb 8b 88 ea b9 8c 3f
EDIT – Code used to select text (with user, pass, table, id and path omitted)
<?php
ini_set("soap.wsdl_cache_enabled", "0");
require_once ("../soapclient/SforcePartnerClient.php");
require_once ("../soapclient/SforceHeaderOptions.php");
$partner_wsdl = "../soapclient/new-partner.wsdl.xml";
$client = new SforcePartnerClient();
$client->createConnection($partner_wsdl);
$loginResult = $client->login('--user--', '--pass--');
$query = "Select Name FROM --table-- WHERE Id = '--id--'";
$response = $client->query($query);
echo'<pre>';print_r($response);echo'</pre>';
$queryResult = new QueryResult($response);
foreach ($queryResult->records as $qr) {
$content = $qr->fields->Name;
file_put_contents('--path--',$content);
}
?>
After more research I discovered a function in the SforcePartnerClient.php
is returning different values depending on the box used.
Box 1 and 2:
Box 3 and 4:
When this is combined with / parsed / converted using the XML parser (later in the file) and WSDL file the XML parser strips any white space that occurs between consecutive &#xxxxx; s – I believe this is related to a bug https://bugs.php.net/bug.php?id=33240 to avoid this I suggest commenting out line 364 of SforcePartnerClient.php
Unfortunately I do not know if this will have any adverse effects on other code using SforcePartnerClient.php.