Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8138853
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T11:38:13+00:00 2026-06-06T11:38:13+00:00

I’m implementing a service where each user must have his own json/document database. Beyond

  • 0

I’m implementing a service where each user must have his own json/document database. Beyond letting the user to query json documents by example, the database must also support ACID transactions involving multiple documents, so I have discarded using Couch/Mongo or other NoSQL databases(can’t use RavenDB since it must run on Unix systems).

With that in mind I’ve been trying to figure a way to implement that on top of a SQL database. Here’s what I have came up with so far:

CREATE TABLE documents (
  id INTEGER PRIMARY KEY,
  doc TEXT
);

CREATE TABLE indexes (
  id INTEGER PRIMARY KEY,
  property TEXT,
  value TEXT,
  document_id INTEGER
)

Each user would have a database with these two tables, and the user would have to declare which fields he needed to query so the system could properly populate the ‘Indexes’ table. So if user ‘A’ configures his account to enable queries by ‘name’ and ‘age’, everytime that user inserts a document that has a ‘name’ or ‘age’ property the system would also insert a record to the ‘indexes’ table, where the ‘property’ column would contain name/age , ‘value’ would contain the property value and ‘document_id’ would point to the corresponding document.

For example, let’s say the user inserts the following doc:

'{"name" : "Foo", "age" 43}'

This would result in a insert to the ‘documents’ table and two more inserts to the ‘indexes’ table:

INSERT INTO documents (id,doc) VALUES (1, '{"name" : "Foo", "age" 43}');
INSERT INTO indexes (property, value, document_id) VALUES ('name', 'foo', 1);
INSERT INTO indexes (property, value, document_id) VALUES ('age', '43', 1);

Then let’s say that user ‘A’ sent the service the following query:

'{"name": "Foo", "age": 43}' //(the queries are also json documents).

This query would be translated to the following SQL:

SELECT doc FROM documents
WHERE id IN (SELECT document_id FROM indexes
             WHERE document_id IN (SELECT document_id FROM indexes
                                   WHERE property = 'name' AND value = 'Foo')
             AND property = 'age' AND value = '43') 

My questions:

  • Knowing that the user may be able to use a high number of conditions in his queries(lets say 20-30 AND conditions), which would cause the subquery nesting be very high, how efficient would the above SELECT query be on most database systems(postgres, mysql…)?
  • Is the above solution viable for a database that will eventually contain millions/billions of json documents?
  • Is there a better way to meet my requirements?
  • Is there scalable document database that can do ACID transactions involving multiple documents and runs on Unix systems?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T11:38:15+00:00Added an answer on June 6, 2026 at 11:38 am

    Your indexes table is a what is known as Entity-Attribute-Value.

    EAV tables are fine for storing information and recalling it when you know the entity. (In your case, finding all the indexes rows when you know the document_id.)

    But they are terrible the other way around: Supplying Attribute-Value combinations to search for an Entity. Which is exactly what you have in your final query. As more and more entities share the same attribute-value combinations (such as name=foo) the query performance degrades.

    So, to answer your first two questions:
    1. The query, as written, requires n sub-queries when searching for n properties. This will scale very poorly as n grows.
    2. As the number of records grows it will degrade, especially with millions/billions records.

    In general, if you read about EAV, people strongly recommend shying away from it.

    And, worse still, there isn’t really a good alternative within SQL. The standard way to optimise a search is with an index, which can easily be modelled as a sorted data-set. But you would then need many indexes:
    – An index on (fieldX, fieldY, fieldZ) is great if you search on all three columns.
    – But it sucks if you have to search on just fieldZ.

    If you can re-model this with a traditional table, with a fixed number of columns, and have the space to apply every index combination you would ever need, that would be you most performant model.

    If you can’t fix the number of columns (new properties coming along all the time) and/or you don’t have space for all the different combinations of index, you seem to be stuck with EAV. Which will work, but it will not scale very well in terms of ‘instantaneous’ results.

    NOTE: If you do stick with EAV, have you tested this query structure?

      SELECT
        document_id
      FROM
        indexes
      WHERE
           (property = 'name' AND value = 'Foo')
        OR (property = 'age'  AND value = '43' )
      GROUP BY
        document_id
      HAVING
        COUNT(*) = 2
    

    This assumes that (document_id, property, value) is unique. Otherwise one document could have ('name', 'foo') twice, and so pass the COUNT(*) clause.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I have a reasonable size flat file database of text documents mostly saved in
I have a view passing on information from a database: def serve_article(request, id): served_article
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
I have a jquery bug and I've been looking for hours now, I can't
Basically, what I'm trying to create is a page of div tags, each has
this is what i have right now Drawing an RSS feed into the php,
I have this code to decode numeric html entities to the UTF8 equivalent character.
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.