I know this must be a relatively simple problem, but Google has failed me. Say I have the following simple PHP document (no discussion on security, SQL injection, XSS, etc. This is just a simple example to illustrate my encoding problem):
<?php
if(!empty($_POST['message'])) {
file_put_contents($filename, $_POST['message']);
}
?>
<!DOCTYPE html>
<html>
<head><meta http-equiv="content-type" content="text/html;charset=utf-8"/></head>
<body>
<form method="post" action="?">
<textarea name="message">
<?php echo htmlentities(file_get_contents($filename))?>
</textarea>
<input type="submit"/>
</form>
</body>
</html>
Now, I enter a Σ into the form and submit. When the page reloads, the textarea is filled with Σ instead of Σ.
I understand why this is (to a degree), but I do not know how to fix the post to stop it from happening. Any ideas?
htmlentitiesdefaults to assuming ISO-8859-1 as input, but you feed it utf-8…, so a correct way would behtmlentities($string, ENT_COMPAT,"UTF-8");In this case I’d rather go for
htmlspecialcharsthough, other entities shouldn’t be needed.