Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6251157
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T13:33:43+00:00 2026-05-24T13:33:43+00:00

I need to create an XML document from a piece of plain text and

  • 0

I need to create an XML document from a piece of plain text and the begin and end offsets of each XML element that should be inserted. Here are a few test cases I’d like it to pass:

val text = "The dog chased the cat."
val spans = Seq(
    (0, 23, <xml/>),
    (4, 22, <phrase/>),
    (4, 7, <token/>))
val expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
assert(expected === spansToXML(text, spans))

val text = "aabbccdd"
val spans = Seq(
    (0, 8, <xml x="1"/>),
    (0, 4, <ab y="foo"/>),
    (4, 8, <cd z="42>3"/>))
val expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
assert(expected === spansToXML(text, spans))

val spans = Seq(
    (0, 1, <a/>),
    (0, 0, <b/>),
    (0, 0, <c/>),
    (1, 1, <d/>),
    (1, 1, <e/>))
assert(<a><b/><c/> <d/><e/></a> === spansToXML(" ", spans))

My partial solution (see my answer below) works by string concatenation and XML.loadString. That seems hacky, and I’m also not 100% sure this solution works correctly in all the corner cases…

Any better solutions? (For what it’s worth, I’d be happy to switch to anti-xml if that would make this task easier.)

Updated 10 Aug 2011 to add more test cases and provide a cleaner specification.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T13:33:44+00:00Added an answer on May 24, 2026 at 1:33 pm

    Given the bounty you put forward, I studied your problem for some time and came up with the following solution, which succeeds on all your testcases.
    I would really like getting my answer accepted – please tell me if there’s anything wrong with my solution.

    Some comments:
    I left the commented out print statement inside, if you wanna figure what’s going on during execution.
    In addition to your specification, I do preserve their existing children (if any) – there’s a comment where this is done.

    I do not build the XML nodes manually, I modify the ones passed in. To avoid splitting the opening and closing tag, I had to change the algorithm quite a lot, but the idea of sorting spans by begin and -end comes from your solution.

    The code is somewhat advanced Scala, especially when I build the different Orderings I need. I did simplify it somewhat from the first version I got.

    I avoided creating a tree representing the intervals, by using a SortedMap, and filtering the intervals after extraction. This choice is somewhat suboptimal; however, I heard that there are “better” data structures for representing nested intervals, like interval trees (they are studied in computational geometry), but they’re quite complex to implement, and I don’t think it’s needed here.

    /**
     * User: pgiarrusso
     * Date: 12/8/2011
     */
    
    import collection.mutable.ArrayBuffer
    import collection.SortedMap
    import scala.xml._
    
    object SpansToXmlTest {
        def spansToXML(text: String, spans: Seq[(Int, Int, Elem)]) = {
            val intOrdering = implicitly[Ordering[Int]] // Retrieves the standard ordering on Ints.
    
            // Sort spans decreasingly on begin and increasingly on end and their label - this processes spans outwards.
            // The sorting on labels matches the given examples.
            val spanOrder = Ordering.Tuple3(intOrdering.reverse, intOrdering, Ordering.by((_: Elem).label))
    
            //Same sorting, excluding labels.
            val intervalOrder = Ordering.Tuple2(intOrdering.reverse, intOrdering)
            //Map intervals of the source string to the sequence of nodes which match them - it is a sequence because
            //multiple spans for the same interval are allowed.
            var intervalMap = SortedMap[(Int, Int), Seq[Node]]()(intervalOrder)
    
            for ((start, end, elem) <- spans.sorted(spanOrder)) {
                //Only nested intervals. Interval nesting is a partial order, therefore we cannot use the filter function as an ordering for intervalMap, even if it would be nice.
                val nestedIntervalsMap = intervalMap.until((start, end)).filter(_ match {
                    case ((intStart, intEnd), _) => start <= intStart && intEnd <= end
                })
                //println("intervalMap: " + intervalMap)
                //println("beforeMap: " + nestedIntervalsMap)
    
                //We call sorted to use a standard ordering this time.
                val before = nestedIntervalsMap.keys.toSeq.sorted
    
                // text.slice(start, end) must be split into fragments, some of which are represented by text node, some by
                // already computed xml nodes.
                val intervals = start +: (for {
                    (intStart, intEnd) <- before
                    boundary <- Seq(intStart, intEnd)
                } yield boundary) :+ end
    
                var xmlChildren = ArrayBuffer[Node]()
                var useXmlNode = false
    
                for (interv <- intervals.sliding(2)) {
                    val intervStart = interv(0)
                    val intervEnd = interv(1)
                    xmlChildren.++=(
                        if (useXmlNode)
                            intervalMap((intervStart, intervEnd)) //Precomputed nodes
                        else
                            Seq(Text(text.slice(intervStart, intervEnd))))
                    useXmlNode = !useXmlNode //The next interval will be of the opposite kind.
                }
                //Remove intervals that we just processed
                intervalMap = intervalMap -- before
    
                // By using elem.child, you also preserve existing xml children. "elem.child ++" can be also commented out.
                var tree = elem.copy(child = elem.child ++ xmlChildren)
                intervalMap += (start, end) -> (intervalMap.getOrElse((start, end), Seq.empty) :+ tree)
                //println(tree)
            }
            intervalMap((0, text.length)).head
        }
    
        def test(text: String, spans: Seq[(Int, Int, Elem)], expected: Node) {
            val res = spansToXML(text, spans)
            print("Text: \"%s\", expected:\n%s\nResult:\n%s\n\n" format (text, expected, res))
            assert(expected == res)
        }
        def test1() =
            test(
                text = "The dog chased the cat.",
                spans = Seq(
                    (0, 23, <xml/>),
                    (4, 22, <phrase/>),
                    (4, 7, <token/>)),
                expected = <xml>The <phrase><token>dog</token> chased the cat</phrase>.</xml>
            )
    
        def test2() =
            test(
                text = "aabbccdd",
                spans = Seq(
                    (0, 8, <xml x="1"/>),
                    (0, 4, <ab y="foo"/>),
                    (4, 8, <cd z="42>3"/>)),
                expected = <xml x="1"><ab y="foo">aabb</ab><cd z="42>3">ccdd</cd></xml>
            )
    
        def test3() =
            test(
                text = " ",
                spans = Seq(
                    (0, 1, <a/>),
                    (0, 0, <b/>),
                    (0, 0, <c/>),
                    (1, 1, <d/>),
                    (1, 1, <e/>)),
                expected = <a><b/><c/> <d/><e/></a>
            )
    
        def main(args: Array[String]) {
            test1()
            test2()
            test3()
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to manipulate an existing XML document, and create a new one from
I need to create an XML schema that looks something like this: <xs:element name=wrapperElement>
I need to create an XML document for a website service that shows products
I have an XML document that needs to pass text inside an element with
I need to create XML in Perl. From what I read, XML::LibXML is great
I need to create an XML schema that validates a tree structure of an
I need to create an XML schema definition (XSD) that describes Java objects. I
I need to create program that creates a XML schema like below using System.Xml.XmlSchema
I have an XML document generated from an external application, but that application does
Now I'm working on a web project. I need to create a XML document

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.