I have been working on the problems presented in Python Challenge. One of the problems asks to sift through a mess of characters and pick out the rarest character/s.
My methodology was to read the characters from a text file, store the characters/occurrence as a key/value pair in a dictionary. Sort the dictionary by value and invert the dictionary where the occurrence is the key and the string of characters is the value. Assuming that the rarest character occurs only once, I return the values where the key of this inverted dictionary equals one.
The input(funkymess.txt) is like this:
%%$@$^_#)^)&!_+]!*@&^}@@%%+$&[(_@%+%$*^@$^!+]!&#)*}{}}!}]$[%}@[{@#_^{*……
The code is as follows:
from operator import itemgetter
characterDict = dict()
#put the characters in a dictionary
def putEncounteredCharactersInDictionary(lineStr):
for character in lineStr:
if character in characterDict:
characterDict[character] = characterDict[character]+1
else:
characterDict[character] = 1
#Sort the character dictionary
def sortCharacterDictionary(characterDict):
sortCharDict = dict()
sortsortedDictionaryItems = sorted(characterDict.iteritems(),key = itemgetter(1))
for key, value in sortsortedDictionaryItems:
sortCharDict[key] = value
return sortCharDict
#invert the sorted character dictionary
def inverseSortedCharacterDictionary(sortedCharDict):
inv_map = dict()
for k, v in sortedCharDict.iteritems():
inv_map[v] = inv_map.get(v, [])
inv_map[v].append(k)
return inv_map
f = open('/Users/Developer/funkymess.txt','r')
for line in f:
#print line
processline = line.rstrip('\n')
putEncounteredCharactersInDictionary(processline)
f.close()
sortedCharachterDictionary = sortCharacterDictionary(characterDict)
#print sortedCharachterDictionary
inversedSortedCharacterDictionary = inverseSortedCharacterDictionary(sortedCharachterDictionary)
print inversedSortedCharacterDictionary[1]r
Can somebody take a look and provide me with some pointers on whether I am on the right track here and if possible provide some feedback on possible optimizations/best-practices and potential refactorings both from the language as well as from an algorithmic standpoint.
Thanks
Refactoring: A Walkthrough
I want to walk you through the process of refactoring. Learning to program is not just about knowing the end result, which is what you usually get when you ask a question on Stack Overflow. It’s about how to get to that answer yourself. When people post short, dense answers to a question like this it’s not always obvious how they arrived at their solutions.
So let’s do some refactoring and see what we can do to simplify your code. We’ll rewrite, delete, rename, and rearrange code until no more improvements can be made.
Simplify your algorithms
Python need not be so verbose. It is usually a code smell when you have explicit loops operating over lists and dicts in Python, rather than using list comprehensions and functions that operate on containers as a whole.
Use defaultdict to store character counts
A
defaultdict(int)will generate entries when they are accessed if they do not exist. This let’s us eliminate the if/else branch when counting characters.Sorting dicts
Dictionaries don’t guarantee any ordering on their keys. You cannot assume that the items are stored in the same order that you insert them. So sorting the dict entries and then putting them right back into another dict just scrambles them right back up.
This means that your function is basically a no-op. After you sort the items you will need to keep them as a list of tuples to retain their sorting order. Removing that code we can then reduce this method down to a single line.
Inverting dicts
Given the previous comment you won’t actually have a dict any more after sorting. But assuming you did, this function is one of those cases where explicit looping is discouraged. In Python, always be thinking how you can operate over collections all at once rather than one item at a time.
All in one line we (1) iterate over the key/value pairs in the dict; (2) switch them and create inverted value/key tuples; (3) create a dict out of these inverted tuples.
Comment and name wisely
Your method names are long and descriptive. There’s no need to repeat the same information in comments. Use comments only when your code isn’t self-descriptive, such as when you have a complex algorithm or an unusual construct that isn’t immediately obvious.
On the naming front, your names are unnecessarily long. I would stick with far less descriptive names, and also make them more generic. Instead of
inverseSortedCharacterDictionary, try justinvertedDict. That’s all that method does, it inverts a dict. It doesn’t actually matter if it’s passed a sorted character dict or any other type of dict.As a rule of thumb, try to use the most generic names possible so that your methods and variables can be as generic as possible. More generic means more reusable.
Reduce volume
Using temporary variables and helper methods is a good programming practice, and I applaud you for doing so in your program. However, now that we have them simple enough that each one is only one or two lines we probably don’t even need them any more.
Here’s your program body after changing the functions as above:
And then let’s just go ahead and inline those helper methods since they’re so simple. Here’s the final program after all the refactoring:
Final program