Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6566859
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T14:16:00+00:00 2026-05-25T14:16:00+00:00

I was asked by the interviewer to design a system to store gigabytes of

  • 0

I was asked by the interviewer to design a system to store gigabytes of data and the system also has to support some kind of query.

Description:

There are massive amount of records generated in an IDC, each record is composed of a url, an IP which visits the url, and the time when the visit occurs. The record can probably be stated as a struct like this, but I’m not sure which data type should I pick to represent them:

struct Record {
    url;  //char *
    IP;   //int?
    visit_time;   //time_t or simply a number?
}

Requirements:

Design a system to store 100 billion records, and also the system gotta support 2 kinds of query at least:

First, given a time period (t1, t2) and a IP, query how many urls this IP has visited in the given period.

Second, given a time period (t1, t2) and a url, query how many times this url has been visited.

I was stumbled, and here is my stupid solution:

Analysis:

because every query is performed upon a given period of time, so:

1.Create a set, put all visit time into the set, and keep the set ordered according to the time’s value from older to latest.

2.Create a hash table using hash(visit_time) as the key, this hash table is called time-hash-table, then each node in a specific bucket has 2 pointers pointing to another 2 hash-tables respectively.

3.The another 2 hash-tables would be a ip-hash-table and a url-hash-table.

ip-hash-table uses hash(ip) as the key and all the ips in the same ip-hash-table have the same visit-time;

url-hash-table uses hash(url) as the key and all the urls in the same url-hash-table have the same visit-time.

Give a drawing as follows:

time_hastbl
  []
  []
  []-->[visit_time_i]-->[visit_time_j]...[visit_time_p]-->NIL
  []                     |          |
  []               ip_hastbl       url_hastbl
                      []               []
                      :                :
                      []               []
                      []               []

So, when doing the query upon (t1, t2):

  1. find the closest match from the time set, let’s say the match is (t1′, t2′), then all the valid visit time will fall into the part of set starting from t1′ to t2′;

  2. for each visit-time t in the time set[t1′:t2′], do hash(t) and find t’s ip_hastbl or url_hastbl, then count and log how many times the given ip or url appears.

Questions:

1.My solution is stupid, hope you can give me another solution.

2.with respect to how to store the massive records on disk, any advice? I thought of B-tree, but how to use it or is B-tree applicable in this system?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T14:16:01+00:00Added an answer on May 25, 2026 at 2:16 pm

    I believe the interviewer was expecting a distributed computing based solution, esp when “100 billion records” are involved. With the limited knowledge of Distributed Computing I have, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

As some one mentioned in other forum that interviewer has asked the question given
I was asked the following Question: How would you store the data given below(which
Interviewer asked me about this today ...is there an answer ?
In an interview for some company, I was asked this question. What design patterns
I had an interview question that asked how I would design a system to
Last day I have been interviewed and the interviewer asked me a) what is
Recently I faced few interview questions.The interviewer asked the to give the detailed answer.
Its a interview question. Interviewer asked this basic shell script question when he understand
Today, I went for an interview and the interviewer asked me how I would
I was asked the question in an interview. The interviewer told me to assume

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.