Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 782225
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T20:21:13+00:00 2026-05-14T20:21:13+00:00

Here’s a simplified version of a problem I’m working on: I have a bunch

  • 0

Here’s a simplified version of a problem I’m working on: I have a bunch of xml data that encodes information about people. Each person is uniquely identified by an ‘id’ attribute, but they may go by many names. For example, in one document, I might find

<person id=1>Paul Mcartney</person>
<person id=2>Ringo Starr</person>

And in another I might find:

<person id=1>Sir Paul McCartney</person>
<person id=2>Richard Starkey</person>

I want to use xquery to produce a new document that lists every name associated with a given id. i.e.:

<person id=1>
    <name>Paul McCartney</name>
    <name>Sir Paul McCartney</name>
    <name>James Paul McCartney</name>
</person>
<person id=2>
    ...
</person>

The way I’m doing this now in xquery is something like this (pseudocode-esque):

let $ids := distinct-terms( [all the id attributes on people] )
for $id in $ids
    return <person id={$id}>
    {
    for $unique-name in distinct-values
            (
            for $name in ( [all names] )
            where $name/@id=$id
            return $name
            )
        return <name>{$unique-name}</name>
    }
    </person>

The problem is that this is really slow. I imagine the bottleneck is the innermost loop, which executes once for every id (of which there are about 1200). I’m dealing with a fair bit of data (300 MB, spread over about 800 xml files), so even a single execution of the query in the inner loop takes about 12 seconds, which means that repeating it 1200 times will take about 4 hours (which might be optimistic – the process has been running for 3 hours so far). Not only is it slow, it’s using a whole lot of virtual memory. I’m using Saxon, and I had to set java’s maximum heap size to 10 GB (!) to avoid getting out of memory errors, and it’s currently using 6 GB of physical memory.

So here’s how I’d really like to do this (in Pythonic pseudocode):

persons = {}
for id in ids:
    person[id] = set()
for person in all_the_people_in_my_xml_document:
    persons[person.id].add(person.name)

There, I just did it in linear time, with only one sweep of the xml document. Now, is there some way to do something similar in xquery? Surely if I can imagine it, a reasonable programming language should be able to do it (he said quixotically). The problem, I suppose, is that unlike Python, xquery doesn’t (as far as I know) have anything like an associative array.

Is there some clever way around this? Failing that, is there something better than xquery that I might use to accomplish my goal? Because really, the computational resources I’m throwing at this relatively simple problem are kind of ridiculous.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T20:21:14+00:00Added an answer on May 14, 2026 at 8:21 pm

    This unfortunately is a shortcoming in XQuery 1.0

    XQuery 1.1 adds the group by clause to the syntax to resolve this problem, and your problem would be resolved with:

    for $person in /person
    let $id = $person/@id
    group by $id
    return  <people id="{$id}">{
              for $name in distinct-values($person)
              return <name>{$name}</name>
            }</people>
    

    Unfortunately XQuery 1.1 is not widely implemented, so for the moment you are stuck without the group by clause.

    As a developer on XQSharp I cannot speak for any other implementations, but we have spent a lot of time tweaking our optimizer to spot common group-by patterns in XQuery 1.1 and perform them with the algorithm you have specified.

    In particular, the following version of your query:

    declare variable $people as element(person, xs:untyped)* external;
    
    for $id in distinct-values($people/@id)
    return <people id="{$id}">{
              for $person in $people
              where $person/@id = $id
              return <name>{$person}</name>
           }</people>
    

    is spotted as a group-by, as is evidenced by the following query plan:

    library http://www.w3.org/2005/xpath-functions external;
    library http://www.w3.org/2001/XMLSchema external;
    declare variable $people external;
    
    for $distinct-person in $people
    let $id := http://www.w3.org/2005/xpath-functions:data($distinct-person/attribute::id)
    group by
      $id
    aggregate
      element {name} { fs:item-sequence-to-node-sequence($distinct-person) }
    as
      $:temp:19
    return
      element {person} { (attribute {id} { $id } , fs:item-sequence-to-node-sequence($:temp:19)) }
    

    Note that the type annotation as element(person, xs:untyped)* is required, as without knowing that the nodes are untyped (not validated against a schema), the query processor has no way of knowing that $person/@id doesn’t have multiple items in its data value. XQSharp does not yet support group by expressions where each node can have more than one key. However in this case a left outer join is still spotted, and so the complexity should be roughly n log n and not quadratic as you are experiencing.

    Unfortunately though adding in the distinct-values around the set of people in the group (to filter out duplicate names) seems to stop XQSharp from finding the join; this has been filed as a bug. For now this could be solved by doing the query in two passes – grouping the names by id, and removing duplicate names.

    In summary, there is not a better approach in XQuery 1.0, but some implementations (eg. XQSharp) will be able to evaluate this efficiently. If in doubt, check the query plan.

    For a more detailed look at the join optimizations performed by XQSharp, take a look at this blog post.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here I had a problem that I am adding contact from the address book
here a a piece of code that is supposed to loop over and over
Here is an example table: ID time data type 0 0100 xyz 0 1
Here is my object constructor static class Edge { int source; // source node
Here is my code: PtyView *v = [[PtyView alloc] init]; [v sendData([charlieImputText stringValue])]; in
Here is shown how to use styles in CamelCase, but how to use styles
here i am again in my self learning hibernate and personal experiment project to
Here is a synopsis of my code which is a moderately complex WinForms GUI.
Here is my situation. My solution structure is as follows. Project Used to handle
Here's my code $string = preg_replace(/rad\:([0-9]+)px\;\s+\/\*\sALT\[(.+)\*\/|rad\:([0-9]+)px\;/,($2?$2:$1),$string); Basically, in the regex I've got a pipe

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.