Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8227489
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T16:08:36+00:00 2026-06-07T16:08:36+00:00

I am trying to code for our server in which I have to find

  • 0

I am trying to code for our server in which I have to find users access type by URL.

Now, at the beginning, we see 100 millions distinct URL’s are accessed per day. Now, by the time going it became nearly 600 millions distinct URL’s per day.

For 100 millions, what we did is following:

1) Building a HashMap using parallel array whose key are URL’s one part (represented as LONG) and values are URL’s other part (represented as INT) – key can have multiple values.

2) Then search the HashMap to find how many time URL accessed.

Now, as the HashTable become larger, what we did is following:

1) Build two/three separate HashTable, and load and store it (on general file system) to find how many times URL accessed.

Now, issue is,

1) Though the HashTable performance is quite nice, code takes more time while loading/storing HashTable (we are using File Channel, takes 16-19 seconds to load/store HashTable – 200 millions entry- as load factor is 0.5)

What we are trying to ask is:

1) Any comment how to solve this issue ?

2) How to reduce load/store time (I asked before but seems File Channel is the best way) ?

3) Is storing a large HashTable (more than memory) and caching it repeatedly will be a nice solution ? If so, how to do that (at least some pointers). We tried it by using

RandomAccessFile raf = new RandomAccessFile("array.dat", "rw");
IntBuffer map = raf.getChannel().map(FileChannel.MapMode.READ_WRITE, 0, 1 << 30).order(ByteOrder.nativeOrder()).asIntBuffer();

However, gives worser performance than previous.

Thanks.

NB:

1) As per previous suggestions of Stack Overflow, we use some NoSQL DB like TokyoCabinet but from our experience, a custom HashTable gives better performance than it on 100 millions key-value pairs.

2) Pre-read data for disk caching is not possible because when system starts our application will start working and on next day when system starts.

What We forgot to mention is:

1) As our application is a part of project and to be applied on a small campus, so we assume URL accessed is not more than 800 million. So, you can think 600/700 data value is fixed.

2) Our main concern is performance.

3) We have to run our application locally.

Edit: code of our hashmap can be found here.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T16:08:39+00:00Added an answer on June 7, 2026 at 4:08 pm

    It might be best to access the table as a memory-mapped buffer. That way, you could simply implement random access to the file, without worrying about loading and storing, and leave caching to the operating system. I see that your current implementation already does use memory-mapped access for reading and writing, but it still loads things into the java heap in between. Avoid this data duplication and copying! Treat the backing file itself as the data structure, and only access the portions of it that you actually need, only when you need them.

    Within that file, hash maps will work if you are really really sure that hash collisions are not an issue. Otherwise I’d go for a B+ tree there, with nodes about the size of your hard disk pages. That way, each disk access will yield a lot more of usable data than just a single key, thus resulting in a more shallow tree and less individual disc operations.

    I guess others will have implemented stuff like this, but if you prefer your own hash map implementation, you might prefer to write your own memory-mapped B+ trees as well.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to read a URL within our corporate network. Spesifically the server I'm
I am trying to configure our in house development code as a symbol server,
We are trying to port our code from HPX to AIX but getting core
I'm trying to track down an issue in our MFC code that looks like
I'm trying to work with how our system spits out its code for a
I am trying to retrieve CLOB data from our Oracle database. the code is
A very small portion of our codebase is some legacy Java code. I'm trying
I am having an issue converting type. I was trying code like this (minimal,
I´m trying to code a tooltip (Yes I know, I have my reasons to
We have a .NET application which, on a certain server, keeps making various TCP

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.