I have a form element, called metaDescription:
//inside the form
$description = $this -> createElement('text', 'metaDescription')
-> setLabel('Description:')
-> setRequired(false)
-> addFilter('StringTrim')
-> addValidator('StringLength', array(0, 300))
-> addErrorMessage('Invalid description.');
$this->addElement($description);
Whenever this form loads, I initialize it with a default value pulled from the database:
$form->setDefault('metaDescription', $oldPage->getMetaDescription());
This works perfectly fine.
However, I now want to htmlencode any input description when someone sends the form and html_entity_decode the default value that is pulled from the database so that the characters are shown in their original shape again.
I did this like so when handling form input:
//handle post
if ($request->isPost()) {
if ($form->isValid($request->getPost())) {
$page = new Application_Model_PagePainter(array(
'metaDescription' => htmlentities($form->getValue('metaDescription'))
));
$pageMapper->save($page);
....
And I now set the default value like so:
$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription()));
At first, this seems to work fine as well. When I send for example woord1, woord2, me&you as the description, this is correctly saved as woord1, woord2, me&you in the database and correctly displayed again as woord1, woord2, me&you. However, when I set a strange character like ó, eg. wóórd1 this is correctly saved in the database as wóórd1 but then something strange happens: when the form is displayed again, the default value is empty. When I look at the source, it is indeed empty: <input type="text" name="metaDescription" id="metaDescription" value="" />.
This would make me believe that for some reason html_entity_decode($oldPage->getMetaKeywords()) returns an empty string. However, when I echo it it returns the correct result: wóórd1, yet the setDefault has no effect. When I remove the html_entity_decode the setDefault works correct again and the value is shown in the form, but without the decoded html entity.
Why is this html entity decode causing the form value to be empty for such strange characters?
Reply to vstm
For debugging purposes, I unset encoding like so:
$this->view->setEscape(array($this, 'myEscape'));
public function myEscape($inputString)
{
return $inputString;
}
Unfortunately, the problem remains the same as explained earlier. Just to clarify, I encode the value before putting it in the database like so:
'metaDescription' => htmlentities($form->getValue('metaDescription'), ENT_COMPAT, 'UTF-8')
And I decode the value after getting it out of the database like so:
$form->setDefault('metaDescription', html_entity_decode($oldPage->getMetaDescription(), ENT_COMPAT, 'UTF-8'));
Very interestingly however, is that it does seem related to the UTF8 encoding, because when I change the encoding to
'metaDescription' => htmlentities($form->getValue('metaDescription'), ENT_COMPAT 'ISO-8859-1')
while keeping decoding at UTF8, an input tést will result in the input box showing tést rather than an empty value which is the case when setting both methods to UTF8.
Does this help you?
I knew it hat something to do with the Zend framework doing its own escaping using htmlspecialchars and utf-8 (unless you change that with the view
setEscape/setEncodingmethods). And indeed when you do this:$testis empty at the end.So you have to call html_entity_decode with “utf-8” or change the views encoding to “iso-8859-1” (or whatever your encoding is). I think supplying “utf-8” is the better option.
War against the encodings
To make this work you have also take care of what encoding the browser is using because otherwise you either write garbage in your database, render garbage in your output or both (or nothing, if you hand over the wrong charset to certain PHP-functions). (bear with me)
So first you have to ensure what encoding the browser is using. This can be achieved by:
So check out the content-type meta tag in your HTML-output and what encoding it is suggesting. If there is no content-type meta information or it doesn’t include the charset information then you should add one, preferably with utf-8, in your layout (if you’re not using layout now is a good time to start with it). This is important otherwise you don’t know for sure what encoding your input is or what encoding you have to deliver to the browser. That means something like that is after your opening
<head>-Tag of every page returned by your application:In the following examples we assume you choose utf-8, but you might use whatever is appropriate – if you change the values accordingly (that means s/UTF-8/your encoding/g).
Now, when retrieving data from the browser you know what charset you have to supply for the
htmlentitiescall (utf-8):So that means that
$form->getValue('metaDescription')returns an utf-8 encoded string which has to be converted to an HTML-entities string, which is exactly what we want.So in the database is now the non-threatening string with no umlauts, accents or whatever.
Now we take a look at the editing-part. There you must decode the HTML-entities so the user must not deal with them. The output string has to be encoded with our desired charset (yes, right: utf-8):
So now you have assigned the utf-8 encoded string returned by
html_entity_decodetometaDescriptionnow we only have to get past thathtmlspecialcharscall which is called by default if someone uses$view->escape().The last step is to ensure that the
Zend_View‘sencodeis aware of our encoding (this is optional if you are using utf-8 since this is already the default). Either set it for a specific view in the controller with$this->view->setEncoding('UTF-8')or for all views in thebootstrap.php:If someone now calls
$view->escape()it also expects an utf-8 string as input. You should be able to remove thesetEscapecall with the “null” escape.If you followed all these steps you should now have all special characters with umlauts, accents and graves restored as desired (or I have now disgraced myself).
So every function receives the encoding it expects, otherwise it returns the infamous empty string (pseudo flow-chart):
htmlentities($browserData, ,'UTF-8')-> expects UTF-8 returns ASCII without umlauts or other fancy stuffhtml_entity_decode($dbData, ,'UTF-8')-> expects ASCII, returns UTF-8 encoded$view->escape():htmlspecialchars-> expects UTF-8, returns UTF-8tl;dr / recap