Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8965163
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T16:45:18+00:00 2026-06-15T16:45:18+00:00

Is there a way to locate an encoding problem within an XML file? I’m

  • 0

Is there a way to locate an encoding problem within an XML file? I’m trying to parse such a file (let’s call it doc) with the XML library in R, but there seems to be a problem with the encoding.

xmlInternalTreeParse(doc, asText=TRUE)
Error: Document labelled UTF-16 but has UTF-8 content.
Error: Input is not proper UTF-8, indicate encoding!
Error: Premature end of data in tag ...

and a list of tags with presumably premature end of data follows. However, I’m pretty sure that no premature ends exist in this document.

Ok, so next try:

doc <- iconv(doc, to="UTF-8")
doc <- sub("utf-16", "utf-8", doc)
xmlInternalTreeParse(doc, asText=T)
Error: Premature end of data in tag...

and again a list of tags follows along with line numbers. I’ve checked the lines and I can’t find any errors.

Another suspicion: the “µ”-character that occurs in the document might cause the error. So next try:

doc <- iconv(doc, to="UTF-8")
doc <- gsub("µ", "micro", doc)
doc <- sub("utf-16", "utf-8", doc)
xmlInternalTreeParse(doc, asText=T)
Error: Premature end of data in tag...

Any other suggestions for debugging?

EDIT: After having spent two days with trying to fix the error, I still haven’t found a solution. However, I think I have narrowed down the possible answers. Here is what I’ve found:

  • copying the XML string from the source database into a file and saving it as a separate xml file in Notepad++ –> Document labelled UTF-16 but has UTF-8 content.

  • changing <?xml version="1.0" encoding="utf-16"?> to <?xml version="1.0" encoding="utf-8"?> (or encoding="latin1") within this file –> no error

  • reading XML string from database via doc <- sqlQuery(myconn, query.text, stringsAsFactors = FALSE); doc <- doc[1,1], manipulating it with str_sub(doc, 35, 36) <- "8" or str_sub(doc, 31, 36) <- "latin1" and then trying to parse it with xmlInternalTreeParse(doc) –> Premature end of data in tag...

  • reading the XML string from database as above and then trying to parse it with xmlInternalTreeParse(doc) –> Document labelled UTF-16 but has UTF-8 content. Input is not proper UTF-8, indicate encoding ! Bytes: 0xE4 0x64 0x2E 0x20 Premature end of data in tag... (list of tags follows).

  • reading the XML string from database as above and parsing with xmlInternalTreeParse(doc, encoding="latin1") –> Premature end of data in tag...

  • using doc <- iconv(doc[1,1], to="UTF-8") or to="latin1" before parsing doesn’t change anything

I would appreciate any suggestions very much.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T16:45:19+00:00Added an answer on June 15, 2026 at 4:45 pm

    The encoding problem occurred because the encoding of the original XML file and the encoding within the SQL database where the XML content was stored as longtext didn’t match. Substituting the specification of the encoding within the XML string and converting this string solved the problem:

    doc <- sqlQuery(myconn, query.text, stringsAsFactors = FALSE)
    doc <- iconv(doc[1,1], to="UTF-8")
    doc <- sub("utf-16", "utf-8", doc)
    doc <- xmlInternalTreeParse(doc, asText = TRUE)
    

    Truncating of the XML string during retrieval from the database turned out to be a separate problem. The solution is provided here: How to retrieve a very long XML-string from an SQL database with R?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

is there a way to locate a specific file in the fielview of director
Is there way to get file from windows xp command prompt? I tried to
Is there way to copy a file into Plone with WebDAV and have Plone
Is there a way to get the current folder path from within a XSLT
Is there a way to locate a TCP Server running under local network using
Is there a way to locate all variables inside a scope? e.g. var localScope
is there any 'easy' way to locate the specific syntax unit(like an intra dct
I got a problem with the encoding of a text file. If I open
Is there a way to write to a spefic location in a text file?
Aside from the GL Support, is there a way to override locale settings with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.