I have a single file that contains multiple concatenated XML files like so:
<?xml version ... ?>
<!DOCTYPE ... >
...
<?xml version ... ?>
<!DOCTYPE ... >
...
<?xml version ... ?>
<!DOCTYPE ... >
...
Is there any way to parse the file as is, using Nokogiri, as opposed to slicing the file up?
You need to slice it into individual documents, but that is an easy thing to do.
Ruby’s
String.splitmethod makes it easy. For instance if variablefoocontains the text, thenfoo.split("<?xml version ... ?>\n")will return an array you can loop over:Parse each of those chunks and you’ll be on your way. You might need to prepend the XML DECL statement to make Nokogiri happy, but I think it’ll do OK without it.