Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8164795
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T19:23:43+00:00 2026-06-06T19:23:43+00:00

The webserver is serving responses with utf-8 encoding, all files are saved with utf-8

  • 0

The webserver is serving responses with utf-8 encoding, all files are saved with utf-8 encoding, and everything I know of setting has been set to utf-8 encoding.

Here’s a quick program, to test if the output works:

<?php
$html = <<<HTML
<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <title>Test!</title>
</head>
<body>
    <h1>☆ Hello ☆ World ☆</h1>
</body>
</html>
HTML;

$dom = new DOMDocument("1.0", "utf-8");
$dom->loadHTML($html);

header("Content-Type: text/html; charset=utf-8");
echo($dom->saveHTML());

The output of the program is:

<!DOCTYPE html>
<html><head><meta charset="utf-8"><title>Test!</title></head><body>
    <h1>&acirc;&#152;&#134; Hello &acirc;&#152;&#134; World &acirc;&#152;&#134;</h1>
</body></html>

Which renders as:

☆ Hello ☆ World ☆


What could I be doing wrong? How much more specific do I have to be to tell the DOMDocument to handle utf-8 properly?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T19:23:44+00:00Added an answer on June 6, 2026 at 7:23 pm

    DOMDocument::loadHTML() expects a HTML string.

    HTML uses the ISO-8859-1 encoding (ISO Latin Alphabet No. 1) as default per it’s specs. That is since longer, see 6.1. The HTML Document Character Set. In reality that is more the default support for Windows-1252 in common webbrowsers.

    I go back that far because PHP’s DOMDocument is based on libxml and that brings the HTMLparser which is designed for HTML 4.0.

    I’d say it’s safe to assume then that you can load an ISO-8859-1 encoded string.

    Your string is UTF-8 encoded. Turn all characters higher than 127 / h7F into HTML Entities and you’re fine. If you don’t want to do that your own, that is what mb_convert_encoding with the HTML-ENTITIES target encoding does:

    • Those characters that have named entities, will get the named entitiy. € -> &euro;
    • The others get their numeric (decimal) entity, e.g. ☆ -> &#9734;

    The following is a code example that makes the progress a bit more visible by using a callback function:

    $html = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function($match) {
        list($utf8) = $match;
        $entity = mb_convert_encoding($utf8, 'HTML-ENTITIES', 'UTF-8');
        printf("%s -> %s\n", $utf8, $entity);
        return $entity;
    }, $html);
    

    This exemplary outputs for your string:

    ☆ -> &#9734;
    ☆ -> &#9734;
    ☆ -> &#9734;
    

    Anyway, that’s just for looking deeper into your string. You want to have it either converted into an encoding loadHTML can deal with. That can be done by converting all outside of US-ASCII into HTML Entities:

    $us_ascii = mb_convert_encoding($utf_8, 'HTML-ENTITIES', 'UTF-8');
    

    Take care that your input is actually UTF-8 encoded. If you have even mixed encodings (that can happen with some inputs) mb_convert_encoding can only handle one encoding per string. I already outlined above how to more specifically do string replacements with the help of regular expressions, so I leave further details for now.

    The other alternative is to hint the encoding. This can be done in your case by modifying the document and adding a

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    

    which is a Content-Type specifying a charset. That is also best practice for HTML strings that are not available via a webserver (e.g. saved on disk or inside a string like in your example). The webserver normally set’s that as the response header.

    If you don’t care the misplaced warnings, you can just add it in front of the string:

    $dom = new DomDocument();
    $dom->loadHTML('<meta http-equiv="content-type" content="text/html; charset=utf-8">'.$html);
    

    Per the HTML 2.0 specs, elements that can only appear in the <head> section of a document, will be automatically placed there. This is what happens here, too. The output (pretty-print):

    <!DOCTYPE html>
    <html>
      <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8">
        <meta charset="utf-8">
        <title>Test!</title>
      </head>
      <body>
        <h1>☆ Hello ☆ World ☆</h1>    
      </body>
    </html>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an apache webserver serving 2 URLs. I use VirtualHosts to configure it
I'm writing a pretty basic webserver (well, trying) and while it's now serving HTML
Our development webserver has started giving 'connection reset' errors to browsers for the same
I have inherited a webserver already serving some websites. I am trying to migrate
Any ideas why the built in asp.net webserver insists on serving /default.aspx whenever you
I'm having issues serving out static (image) files from an Azure + MVC 3
I am interested in a scenario where webservers serving a PHP application is set
I'm having trouble serving static files through my web server running mod_wsgi and dJango.
I need to create a webserver that will respond to GET requests by serving
I have set up IIS 7.5 to statically serve some files, and some of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.