Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 98261
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T00:09:43+00:00 2026-05-11T00:09:43+00:00

Is there any way to select a subset from a large set based on

  • 0

Is there any way to select a subset from a large set based on a property or predicate in less than O(n) time?

For a simple example, say I have a large set of authors. Each author has a one-to-many relationship with a set of books, and a one-to-one relationship with a city of birth.

Is there a way to efficiently do a query like ‘get all books by authors who were born in Chicago’? The only way I can think of is to first select all authors from the city (fast with a good index), then iterate through them and accumulate all their books (O(n) where n is the number of authors from Chicago).

I know databases do something like this in certain joins, and Endeca claims to be able to do this ‘fast’ using what they call ‘Record Relationship Navigation’, but I haven’t been able to find anything about the actual algorithms used or even their computational complexity.

I’m not particularly concerned with the exact data structure… I’d be jazzed to learn about how to do this in a RDBMS, or a key/value repository, or just about anything.

Also, what about third or fourth degree requests of this nature? (Get me all the books written by authors living in cities with immigrant populations greater than 10,000…) Is there a generalized n-degree algorithm, and what is its performance characteristics?

Edit:

I am probably just really dense, but I don’t see how the inverted index suggestion helps. For example, say I had the following data:

DATA 1.  Milton        England 2.  Shakespeare   England 3.  Twain         USA  4.  Milton        Paridise Lost 5.  Shakespeare   Hamlet 6.  Shakespeare   Othello 7.  Twain         Tom Sawyer 8.  Twain         Huck Finn  INDEX 'Milton'         (1, 4) 'Shakespeare'    (2, 5, 6) 'Twain'          (3, 7, 8) 'Paridise Lost'  (4) 'Hamlet'         (5) 'Othello'        (6) 'Tom Sawyer'     (7) 'Huck Finn'      (8) 'England'        (1, 2) 'USA'            (3) 

Say I did my query on ‘books by authors from England’. Very quickly, in O(1) time via a hashtable, I could get my list of authors from England: (1, 2). But then, for the next step, in order retrieve the books, I’d have to, for EACH of the set {1, 2}, do ANOTHER O(1) lookup: 1 -> {4}, 2 -> {5, 6} then do a union of the results {4, 5, 6}.

Or am I missing something? Perhaps you meant I should explicitly store an index entry linking Book to Country. That works for very small data sets. But for a large data set, the number of indexes required to match any possible combination of queries would make the index grow exponentially.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T00:09:44+00:00Added an answer on May 11, 2026 at 12:09 am

    For joins like this on large data sets, a modern RDBMS will often use an algorithm called a list merge. Using your example:

    1. Prepare a list, A, of all authors who live in Chicago and sort them by author in O(Nlog(N)) time.*
    2. Prepare a list, B, of all (author, book name) pairs and sort them by author in O(Mlog(M)) time.*
    3. Place these two lists ‘side by side’, and compare the authors from the ‘top’ (lexicographically minimum) element in each pile.
      • Are they the same? If so:
        • Output the (author, book name) pair from top(B)
        • Remove the top element of the B pile
        • Goto 3.
      • Otherwise, is top(A).author < top(B).author? If so:
        • Remove the top element of the A pile
        • Goto 3.
      • Otherwise, it must be that top(A).author > top(B).author:
        • Remove the top element of the B pile
        • Goto 3.

    * (Or O(0) time if the table is already sorted by author, or has an index which is.)

    The loop continues removing one item at a time until both piles are empty, thus taking O(N + M) steps, where N and M are the sizes of piles A and B respectively. Because the two ‘piles’ are sorted by author, this algorithm will discover every matching pair. It does not require an index (although the presence of indexes may remove the need for one or both sort operations at the start).

    Note that the RDBMS may well choose a different algorithm (e.g. the simple one you mentioned) if it estimates that it would be faster to do so. The RDBMS’s query analyser generally estimates the costs in terms of disk accesses and CPU time for many thousands of different approaches, possibly taking into account such information as the statistical distributions of values in the relevant tables, and selects the best.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 83k
  • Answers 83k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer I solve it myself. The problem arises when you compile… May 11, 2026 at 4:46 pm
  • Editorial Team
    Editorial Team added an answer Making a request to the server every 0.5 seconds is… May 11, 2026 at 4:46 pm
  • Editorial Team
    Editorial Team added an answer The PHP shmop functions are just wrappers for the underlying… May 11, 2026 at 4:46 pm

Related Questions

I am looking for a way to quickly compare the state of a database
I want to ask how other programmers are producing Dynamic SQL strings for execution
Has anyone tried to dynamically select which properties they want to write to an
I am asking this out of a curiosity. Basically my question is when you

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.