Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7811781
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T04:17:29+00:00 2026-06-02T04:17:29+00:00

Question: Which data structure is more efficient when calculating n most frequent words in

  • 0

Question: Which data structure is more efficient when calculating n most frequent words in a text file. Hash tables or Priority Queues?

I’ve previously asked a question related to this subject however after the creative responses I got confused and I’ve decided on two data types that I actually implement easily; Hash table vs Priority Queues

Priority Queue Confusion: To be honest, I’ve listened to a lecture from youtube related to priority queues, understood it’s every component, however when it comes to its applicability, I get confused. Using a binary heap I can easily implement the priority queue however my challenge is the match its components usage to frequency problem.

My Hash table Idea: Since in here deciding the on hash table’s size was a bit uncertain I’ve decided to go with what makes more sense to me: 26. Due to the number of letters in alphabet. In addition, with a good hash function it would be efficient. However reaching out and out again for linked lists (using separate chaining for collusion) and incrementing its integer value by 1 ,in my opinion, wouldn’t be efficient.

Sorry for the long post, but as fellow programmers which one would you recommend. If priority queue can you simply give me ideas to relate it to my question, if hash table could anything be done to make it even more efficient ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T04:17:32+00:00Added an answer on June 2, 2026 at 4:17 am

    A hash table would be the faster of the two choices offered, besides making more sense. Rather than choosing the size 26, if you have an estimate of the total number of unique words (and most people’s vocabularies outside of technical specialized terms is not a lot bigger than 10,000 – 20,000 is really big, and 30,000 is for people who make a hobby of collecting words), make the size big enough that you don’t expect to ever fill it so the probability of a collision is low – not more than 25%. If you want to be more conservative, implement a function to rehash the contents of the table into a table of twice the original size (and make the size a prime, so only approximately twice the original size).

    Now since this is tagged C++, you might ask yourself why you aren’t just using a multiset straight out of the standard template library. It will keep a count of how many of each word you enter into it.

    In either case you’ll need to make a separate pass to find which of the words are the n most frequent, as you only have the frequencies, not the rank order of the frequencies.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This question is more a semantic-algorithmic-data-structure question than a F# syntactically question. I have
The general question is which module structure is more convenient when adding instances for
I have a question regarding the some data which is being transfered from one
[EDIT: this question is about Mozilla Audio Data API which is no longer considered
I've got a data structure which consists of linked nodes. You can think of
I'm looking for a more efficient way to reprioritize items in a priority queue.
Here is design of my graph data structure which i've developed from scratch. Now
I was hoping someone could tell me which is the more efficient and/or correct
In this earlier question , the OP asked for a data structure similar to
Simple question which I can't seem to find an answer of: I have two

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.