Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7895341
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T07:37:12+00:00 2026-06-03T07:37:12+00:00

Just to clarify that I also think the title is a bit silly. We

  • 0

Just to clarify that I also think the title is a bit silly. We all know that most built-in functions of the language are really well written and fast (there are ones even written by assembly). Though may be there still are some advices for my situation. I have a small project which demonstrates the work of a search engine. In the indexing phase, I have a filter method to filter out unnecessary things from the keywords. It’s here:

bool Indexer::filter(string &keyword)
{
    // Remove all characters defined in isGarbage method
    keyword.resize(std::remove_if(keyword.begin(), keyword.end(), isGarbage) - keyword.begin());

    // Transform all characters to lower case
    std::transform(keyword.begin(), keyword.end(), keyword.begin(), ::tolower);

    // After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
    if (keyword.size() == 0 || stopwords_.find(keyword) != stopwords_.end())
        return false;

    return true;
}

At first sign, these functions (alls are member functions of STL container or standard function) are supposed to be fast and not take many time in the indexing phase. But after profiling with Valgrind, the inclusive cost of this filter is ridiculous high: 33.4%. There are three standard functions of this filter take most of the time for that percentage: std::remove_if takes 6.53%, std::set::find takes 15.07% and std::transform takes 7.71%.

So if there are any thing I can do (or change) to reduce the instruction times cost by this filter (like using parallellizing or something like that), please give me your advice. Thanks in advance.

UPDATE: Thanks for all your suggestion. So in brief, I’ve summarize what I need to do is:
1) Merge tolower and remove_if into one by construct my own loop.
2) Use unordered_set instead of set for faster find method.
Thus I’ve chosen Mark_B‘s as the right answer.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T07:37:13+00:00Added an answer on June 3, 2026 at 7:37 am

    First, are you certain that optimization and inlining are enabled when you compile?

    Assuming that’s the case, I would first try writing my own transformer that combines removing garbage and lower-casing into one step to prevent iterating over the keyword that second time.

    There’s not a lot you can do about the find without using a different container such as unordered_set as suggested in a comment.

    Is it possible for your application that doing the filtering really just is a really CPU-intensive part of the operation?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I think the title is rather self explanatory but just to clarify I am
This might be a dumb question. I think I already know the answer, just
Just to clarify, when I say multiple assigment, parallel assignment, destructuring bind I mean
Just to clarify, by specifying something like VARCHAR(45) means it can take up to
I just want to clarify one thing. This is not a question on which
Just getting started with backbone.js, and one of the things I've noticed is that
Just a note upfront: I'm a bit of a regex newbie. Perhaps a good
Sorry for the rather vague title, I couldn't think up a better description for
Think the title describes my thoughts pretty well :) I've seen a lot of
Hello to all that read I am self learning C++ from a text book.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.