Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 751017
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T14:37:04+00:00 2026-05-14T14:37:04+00:00

How to design data storage for huge tagging system (like digg or delicious)? There

  • 0

How to design data storage for huge tagging system (like digg or delicious)?

There is already discussion about it, but it is about centralized database. Since the data is supposed to grow, we’ll need to partition the data into multiple shards soon or later. So, the question turns to be: How to design data storage for partitioned tagging system?

The tagging system basically has 3 tables:

Item (item_id, item_content)

Tag (tag_id, tag_title)

TagMapping(map_id, tag_id, item_id)

That works fine for finding all items for given tag and finding all tags for given item, if the table is stored in one database instance. If we need to partition the data into multiple database instances, it is not that easy.

For table Item, we can partition its content with its key item_id. For table Tag, we can partition its content with its key tag_id. For example, we want to partition table Tag into K databases. We can simply choose number (tag_id % K) database to store given tag.

But, how to partition table TagMapping?

The TagMapping table represents the many-to-many relationship. I can only image to have duplication. That is, same content of TagMappping has two copies. One is partitioned with tag_id and the other is partitioned with item_id. In scenario to find tags for given item, we use partition with tag_id. If scenario to find items for given tag, we use partition with item_id.

As a result, there is data redundancy. And, the application level should keep the consistency of all tables. It looks hard.

Is there any better solution to solve this many-to-many partition problem?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T14:37:05+00:00Added an answer on May 14, 2026 at 2:37 pm

    I doubt there is a single approach that optimizes all possible usage scenarios. As you said, there are two main scenarios that the TagMapping table supports: finding tags for a given item, and finding items with a given tag. I think there are some differences in how you will use the TagMapping table for each scenario that may be of interest. I can only make reasonable assumptions based on typical tagging applications, so forgive me if this is way off base!

    Finding Tags for a Given Item

    A1. You’re going to display all of the tags for a given item at once

    A2. You’re going to ensure that all of an item’s tags are unique

    Finding Items for a Given Tag

    B1. You’re going to need some of the items for a given tag at a time (to fill a page of search results)

    B2. You might allow users to specify multiple tags, so you’d need to find some of the items matching multiple tags

    B3. You’re going to sort the items for a given tag (or tags) by some measure of popularity

    Given the above, I think a good approach would be to partition TagMapping by item. This way, all of the tags for a given item are on one partition. Partitioning can be more granular, since there are likely far more items than tags and each item has only a handful of tags. This makes retrieval easy (A1) and uniqueness can be enforced within a single partition (A2). Additionally, that single partition can tell you if an item matches multiple tags (B2).

    Since you only need some of the items for a given tag (or tags) at a time (B1), you can query partitions one at a time in some order until you have as many records needed to fill a page of results. How many partitions you will have to query will depend on how many partitions you have, how many results you want to display and how frequently the tag is used. Each partition would have its own index on tag_id to answer this query efficiently.

    The order you pick partitions in will be important as it will affect how search results are grouped. If ordering isn’t important (i.e. B3 doesn’t matter), pick partitions randomly so that none of your partitions get too hot. If ordering is important, you could construct the item id so that it encodes information relevant to the order in which results are to be sorted. An appropriate partitioning scheme would then be mindful of this encoding. For example, if results are URLs that are sorted by popularity, then you could combine a sequential item id with the Google Page Rank score for that URL (or anything similar). The partitioning scheme must ensure that all of the items within a given partition have the same score. Queries would pick partitions in score order to ensure more popular items are returned first (B3). Obviously, this only allows for one kind of sorting and the properties involved should be constant since they are now part of a key and determine the record’s partition. This isn’t really a new limitation though, as it isn’t easy to support a variety of sorts, or sorts on volatile properties, with partitioned data anyways.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there an reference about how to properly design Table Storage data? What to
How to add a FunctionImportMapping programmatically using classes from the System.Data.Entity.Design namespace? This blog
Data Access Objects (DAOs) are a common design pattern, and recommended by Sun. But
Let's say i have a database design like this: for the data: id |
I am trying to find what is the usual design/approach for static/global! data access/storage
I am trying to figure out how you design data storage in a document
I need to design a data structure that supports the following operations: search element
I have to design a data structure that is to be used in a
I have a problem here that requires to design a data structure that takes
I have this data in cell A1: Majestic Properties Design District, LLC, Ste. 101,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.