I run the following code: $page = ‘<p>Ä</p>’; $DOM = new DOMDocument; $DOM->loadHTML($page); echo

Question

0

Asked: June 12, 20262026-06-12T03:00:18+00:00 2026-06-12T03:00:18+00:00

I run the following code: $page = ‘<p>Ä</p>’; $DOM = new DOMDocument; $DOM->loadHTML($page); echo

0

I run the following code:

$page = '<p>Ä</p>';
$DOM = new DOMDocument;
$DOM->loadHTML($page);
echo 'source:'.$page;
echo 'dom: '.$DOM->getElementsByTagName('p')->item (0)->textContent;

and it outputs the following:

source: Ä

dom: Ã

so, I don’t understand why when the text comes through DOMDocument its encoding becomes broken?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T03:00:20+00:00

Editorial Team

2026-06-12T03:00:20+00:00Added an answer on June 12, 2026 at 3:00 am

DOMDocument appears to be treating the input as UTF-8. In this conversion, Ä becomes Ã„. Here’s the catch: That second character does not exist in ISO-8859-1, but does exist in Windows-1252. This is why you are seeing no second character in your output.

You can fix this by calling utf8_decode on the output of textContent, or using UTF-8 as your page’s character encoding.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I run the following code: $page = ‘<p>Ä</p>’; $DOM = new DOMDocument; $DOM->loadHTML($page); echo

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply