Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7823009
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T08:05:57+00:00 2026-06-02T08:05:57+00:00

So my problem is virtually identical to this previous StackOverflow question , but I’m

  • 0

So my problem is virtually identical to this previous StackOverflow question, but I’m reasking the question because I don’t like the accepted answer.

I’ve got a file of concatenated XML documents:

<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>
...
<?xml version="1.0" encoding="UTF-8"?>
<someData>...</someData>

I’d like to parse out each one.

As far as I can tell, I can’t use scala.xml.XML, since that depends on the one document per file/string model.

Is there a subclass of Parser I can use for parsing XML documents from an input source? Because then I could just do something like many1 xmldoc or some such.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T08:05:59+00:00Added an answer on June 2, 2026 at 8:05 am

    Ok, I came up with an answer I’m more happy with.

    Basically I try to parse the XML using a SAXParser, just like scala.xml.XML.load does, but watch for SAXParseExceptions that indicate that the parser encountered a <?xml in the wrong place.

    Then, I grab whatever root element has been parsed already, rewind the input just enough, and restart the parse from there.

    // An input stream that can recover from a SAXParseException 
    object ConcatenatedXML {
      // A reader that can be rolled back to the location of an exception
      class Relocator(val re : java.io.Reader)  extends java.io.Reader {
        var marked = 0
        var firstLine : Int = 1
        var lineStarts : IndexedSeq[Int] = Vector(0)
        override def read(arr : Array[Char], off : Int, len : Int) = { 
          // forget everything but the start of the last line in the
          // previously marked area
          val pos = lineStarts(lineStarts.length - 1) - marked
          firstLine += lineStarts.length - 1
    
          // read the next chunk of data into the given array
          re.mark(len)
          marked = re.read(arr,off,len)
    
          // find the line starts for the lines in the array
          lineStarts = pos +: (for (i <- 0 until marked if arr(i+off) == '\n') yield (i+1))
    
          marked
        }
        override def close { re.close }
        override def markSupported = false
        def relocate(line : Int, col : Int , off : Int) {
          re.reset
          val skip = lineStarts( line - firstLine ) + col + off
          re.skip(skip)
          marked = 0
          firstLine = 1
          lineStarts = Vector(0)
        }
      }
    
      def parse( str : String ) : List[scala.xml.Node] = parse(new java.io.StringReader(str))
      def parse( re : java.io.Reader ) : List[scala.xml.Node] = parse(new Relocator(re))
    
      // parse all the concatenated XML docs out of a file
      def parse( src : Relocator ) : List[scala.xml.Node] = {
        val parser = javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser
        val adapter = new scala.xml.parsing.NoBindingFactoryAdapter
    
        adapter.scopeStack.push(scala.xml.TopScope)
        try {
    
          // parse this, assuming it's the last XML doc in the string
          parser.parse( new org.xml.sax.InputSource(src), adapter )
          adapter.scopeStack.pop
          adapter.rootElem.asInstanceOf[scala.xml.Node] :: Nil
    
        } catch {
          case (e : org.xml.sax.SAXParseException) => {
            // we found the start of another xmldoc
            if (e.getMessage != """The processing instruction target matching "[xX][mM][lL]" is not allowed."""
                || adapter.hStack.length != 1 || adapter.hStack(0) == null){
              throw(e)
            }
    
            // tell the adapter we reached the end of a document
            adapter.endDocument
    
            // grab the current root node
            adapter.scopeStack.pop
            val node = adapter.rootElem.asInstanceOf[scala.xml.Node]
    
            // reset to the start of this doc
            src.relocate(e.getLineNumber, e.getColumnNumber, -6)
    
            // and parse the next doc
            node :: parse( src )
          }
        }
      }
    }
    
    println(ConcatenatedXML.parse(new java.io.BufferedReader(
      new java.io.FileReader("temp.xml")
    )))
    println(ConcatenatedXML.parse(
      """|<?xml version="1.0" encoding="UTF-8"?>
         |<firstDoc><inner><innerer><innermost></innermost></innerer></inner></firstDoc>
         |<?xml version="1.0" encoding="UTF-8"?>
         |<secondDoc></secondDoc>
         |<?xml version="1.0" encoding="UTF-8"?>
         |<thirdDoc>...</thirdDoc>
         |<?xml version="1.0" encoding="UTF-8"?>
         |<lastDoc>...</lastDoc>""".stripMargin
    ))
    try {
      ConcatenatedXML.parse(
        """|<?xml version="1.0" encoding="UTF-8"?>
           |<firstDoc>
           |<?xml version="1.0" encoding="UTF-8"?>
           |</firstDoc>""".stripMargin
      )
      throw(new Exception("That should have failed"))
    } catch {
      case _ => println("catches really incomplete docs")
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am hiding multiple items like this: $('#item1,#item2,#item3').hide('slow',function() { console.log('hidden'); }); Problem is that
I am having virtually the same problem as this: C# Update combobox bound to
I have a question that's not really a problem, but something that made me
Problem solved, see below Question I'm working in Flex Builder 3 and I have
My question is similar to ASP.NET 2 projects to share same files , but
I've thought that mate is virtually the same as 'open -a TextMate.app', but I
This is trouble shooting question. Our application's development environment is VS2005 C/C++, VB6 based
This design problem is turning out to be a bit more interesting than I'd
Without the use of LIKE in Dynamic Linq it virtually renders it useless to
Problem: I am using Windows as a guest operating system in a Virtual Machine

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.