I am in the process of writing a search engine for the experience and the knowledge. Right now, I am in the process of building a crawler and its accompanying utilities. One of these is the URL normalizer. This is what I am trying to build right now, and more specifically I am stuck at the point where I have to make a method to take a url, and capitalize letters that follow a ‘%’ sign. My code so far:
def escape_sequence_capitalization(url):
''' The method that capitalizes letters in escape sequences.
All letters within a percent - encoding triplet (e.g. '%2C') are case
insensitive and should be capitalized.
'''
next_encounter = None
url_list = []
while True:
next_encounter = url.find('%')
if next_encounter == -1:
break
for letter in url[:next_encounter]:
url_list.append(letter)
new_character = url[next_encounter + 1].upper()
url_list.append(new_character)
url = url[next_encounter:]
for letter in url:
url_list.append(letter)
return ''.join(url_list)
Can someone please guide me to where my error is? I would be grateful. Thank you.
EDIT: this is what I am trying to achieve:
http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b
By static analysis, it loops forever because your
while Truenever breaks. So where can it break? Only at thebreakstatement only if thenext_encounterbecomes equal to -1; so you can deduce that it never does.Why doesn’t it? Try a
print next_encounterafterurl.find. You’ll quickly see thatdoes almost what you hope it will, only it gives you one character more than you hoped.
Why did I present it this way? Mostly because the value of
printis often underrated by people learning the language.