I have a script to get an XML file from an SQL database. Here is how I do this:
library(RODBC)
library(XML)
myconn <- odbcConnect("mydsn")
query.text <- "SELECT xmlfield FROM db WHERE id = 12345"
doc <- sqlQuery(myconn, query.text, stringsAsFactors=FALSE)
doc <- iconv(doc[1,1], from="latin1", to="UTF-8")
doc <- xmlInternalTreeParse(doc, encoding="UTF-8")
However, the parsing didn’t work for a particular database row, although it worked when I copied the content of this field into a separate file and parsed from the file. After two days “trial-and-error” I identified the main problem. It seems that querying short XML files this way doesn’t cause any problems, but when I query larger files, the string gets chopped off after 65534 characters. Therefore, the end of the XML file is missing and the file can’t be parsed.
I thought this might be an overall restriction of the ODBC connections on my computer. However, another programme that also uses ODBC to get the same XML field from the same database does this without any problems. So I guess it’s an R-specific problem.
Any ideas how to fix it?
I’ve written to the package author and have finally received the following answer:
This manual can be found in the
docdirectory of theRODBCpackage. This information is not contained within the reference manual.As in the meantime I’ve found a good solution to retrieve my data without using
RODBC, I haven’t tried to recompile this package. But I hope this answer will be helpful for those having trouble with the same issue.