html Tidy gives this as output for some reason:
<?xml version='1.0' encoding='utf-16'?> <?xml version='1.0' encoding='utf-16'?> <!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'> <html xmlns='http://www.w3.org/1999/xhtml'> <head> <meta name='generator' content= 'HTML Tidy for Linux/x86 (vers 11 February 2007), see www.w3.org' /> <meta name='vs_targetSchema' content='http://schemas.microsoft.com/intellisense/ie5' /> ...rest of document
So there are 2 xml headers, and of the wrong type (not UTF-8). Is there a way to remove the 2nd header, change it to UTF-8, and also remove the DOCTYPE with XSL?
I think that it would be better to fix the original problem. Do you use the HTML Tidy library?
Try setting output-encoding to utf8 and add-xml-decl to false. The DOCTYPE node can be suppressed by setting the doctype property to omit.