I’m trying to learn R’s XML package. I’m trying to create a data.frame from books.xml sample xml data file. Here’s what I get:
library(XML)
books <- "http://www.w3schools.com/XQuery/books.xml"
doc <- xmlTreeParse(books, useInternalNodes = TRUE)
doc
xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x))))
xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " "))
xpathSApply(doc, "//book/child::*", xmlValue)
Each of these xpathSApply’s don’t get me even close to my intention. How should one proceed toward a well formed data.frame?
Ordinarily, I would suggest trying the
xmlToDataFrame()function, but I believe that this will actually be fairly tricky because it isn’t well structured to begin with.I would recommend working with this function:
One problem is that there are multiple authors per book, so you will need to decide how to handle that when you’re structuring your data frame.
Once you have decided what to do with the multiple authors issue, then it’s fairly straight forward to turn your book list into a data frame with the
ldply()function in plyr (or just use lapply and convert the return value into a data.frame by using do.call(“rbind”…).Here’s a complete example (excluding author):
Here’s what it looks like with author included. You need to use
ldplyin this instance since the list is “jagged”…lapply can’t handle that properly. [Otherwise you can uselapplywithrbind.fill(also courtesy of Hadley), but why bother whenplyrautomatically does it for you?]: