I have been given an exercise about anagrams and it looked so really easy that I have a doubt I am missing something.
The solution I implemented is the one I will present shortly, and I wanted to ask you if you could think of any optimization, change of approach or problem with my solution.
I implemented the algorithm in Java.
Now, the exercise.
As input I have a text and as output I should return whether each line of this text is an anagram of each other line.
That is, for input:
A Cab Deed Huffiest Minnows Loll
A Cab Deed Huffiest Minnow Lolls
A Cab Deed Shuffles Million Wont
A Cab Deed Shuffles Million Town
The program should return True.
For input:
A Cab Deed Huffiest Minnows Loll
A Cab Deed Huffiest Minnow Lolls hi
A Cab Deed Shuffles Million Wont
A Cab Deed Shuffles Million Town
the output will have to be False (because of the second line, of course).
Now, what I thought is pretty straightforward:
- I create 2 HashMap: ref and cur.
- I parse the first line of the text, filling ref. I will count only alphabetical letters.
- for each other line, I parse the line into cur and check if cur.equals(ref): if so return false
- if I get to the end of the text, it means that each line is an anagram of each other line, so I return true.
And…this would be it.
I tried it with an input text of 88000 lines, and it works pretty fast.
Any comments? Suggestions? Optimizations?
Thank you very much for the help.
Another option is:
.equals)I suspect your way is faster though.
EDIT:
Since @nibot disagrees with my even suggesting this, and I’m not one to argue back and forth without proof, here’s three solutions.
They’re all implemented very similarly:
The ? part is one of:
HashMapof character countsI ran them all with this:
My file is similar to the one the OP posted, but made significantly longer, with a non-anagram about 20 lines from the end to ensure that the algorithms all work.
I consistently get results like this:
The
HashMapone could be significantly improved if:HashMap<char, int>HashMapand a way to get-and-increment (so there would only be one lookup instead of 2)But, these aren’t in the standard library, so I’m ignoring them (just like most programmers using Java would).
The moral of the story is that big O isn’t everything. You need to consider the overhead and the size of n. In this case, n is fairly small, and the overhead of a
HashMapis significant. With longer lines, that would likely change, but unfortunately I don’t feel like figuring out where the break-even point is.And if you still don’t believe me, consider that GCC uses insertion sort in some cases in its C++ standard library.