In one of my programs for school, I use the following function to count the frequency of identifiers in a string, separated by newlines and #:
Input:
dog
cat
mouse
#
rabbit
snake
#
Function:
//assume I have the proper includes, and am using namespace std
vector< pair<string,int> > getFreqcounts(string input) {
vector<string> items = splitString(input,"\n");
map<string,int> counts;
for (int i=0; i<items.size(); i++) {
if (items[i] == "#") continue;
counts[items[i]] = 0;
}
for (int i=0; i<items.size(); i++) {
if (items[i] == "#") continue;
counts[items[i]]++;
}
return vector< pair<string,int> > (counts.begin(),counts.end());
}
I would like to at the very least
- remove the double for loop
- find a better way to get a
vector< pair<string,int> >
Any ideas?
BTW, this is NOT homework. The real homework will use this function, but this is purely out of my own curiosity and desire to have “better” code.
You can get rid of the first for loop by simply deleting it. It accomplishes nothing useful. When/if the subscript into the map creates a new item, that item will have the chosen key, and your associated int will be initialized to zero automatically.
Personally, I’d probably do things a bit differently, using a stringstream instead of your
SplitString(). I’m hesitant about posting code, but I guess I’ll trust you…Edit: I honestly didn’t pay a whole lot of attention to efficiency as I was writing this, but I think Steve Jessop’s comment on it is pretty accurate. As long as the input is small, it won’t make any real difference. If the input is really big, the fact that this only uses an extra copy of one word at a time could save enough memory to be meaningful.
The solution Steve gave in his reply looks pretty nice too though. Since it also processes words as they’re produced, I’d expect it to have characteristics similar to the code above. If you can break the string into words faster than
stringstreamdoes, it’ll undoubtedly be faster. Given the number of virtual functions that get in the way with iostreams, there’s a pretty good chance of that — but unless you’re dealing with a lot of text there’s not much chance of it making a significant difference. Of course, exactly what qualifies as significant is open to question. To put it in perspective, I ran some similar code across a word list I had handy. Using code pretty close to what’s above, it processes text at a little over 10 megabytes a second.