For experimental and learning purposes. I was trying to create a sorting algorithm from a hash function that gives a value biased on alphabetical sequence of the string, it then would ideally place it in the right place from that hash. i tryed looking for a hash-biased sorting function but only found one for integers and would be a memory hog if adapted for my purposes.
The reasoning is that theoretically if done right this algorithm can achieve O(n) speeds or nearly so.
So here is what i have worked out in python so far:
letters = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,
'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,
's':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25,
'A':0,'B':1,'C':2,'D':3,'E':4,'F':5,'G':6,'H':7,'I':8,'J':9,
'K':10,'L':11,'M':12,'N':13,'O':14,'P':15,'Q':16,'R':17,
'S':18,'T':19,'U':20,'V':21,'W':22,'X':23,'Y':24,'Z':25}
def sortlist(listToSort):
listLen = len(listToSort)
newlist = []
for i in listToSort:
k = letters[i[0]]
for j in i[1:]:
k = (k*26) + letters[j]
norm = k/pow(26,len(i)) # get a float hash that is normalized(i think thats what it is called)
# 2nd part
idx = int(norm*len(newlist)) # get a general of where it should go
if newlist: #find the right place from idx
if norm < newlist[idx][1]:
while norm < newlist[idx][1] and idx > 0: idx -= 1
if norm > newlist[idx][1]: idx += 1
else:
while norm > newlist[idx][1] and idx < (len(newlist)-1): idx += 1
if norm > newlist[idx][1]: idx += 1
newlist.insert(idx,[i,norm])# put it in the right place with the "norm" to ref later when sorting
return newlist
i think that the 1st part is good, but the 2nd part needs help. so the Qs would be what would be the best way to do something like this or is it even possible to get O(n) time (or near that) out of this?
the testing i did with an 88,000 word list took prob about 5 min, 10,000 took about 30 sec it got a lot worse as the list count went up.
if this idea actually works out then i would recode it in C to get some real speed and optimizations.
The 2nd part is there only because it works – even if slow, and i cant think of a better way to do it for the life of me, i would like to replace it with something that would not have to do the other loops if at all possible.
thank for any advice or ideas that you could give.
On sorting in O(n): you can’t do it generally for all inputs, period. It is simply, fundamentally, mathematically impossible.
Here’s the nice, short information-theoretic proof of impossibility: to sort, you have to be able to distinguish among the n! possible orderings of the input; to do so, you have to get log2(n!) bits of data; to do that, you need to do O(log (n!)) comparisons, which is O(n log n). Any sorting algorithm that claims to run in O(n) is either running on specialized data (e.g. data with a fixed number of bits), or is not correct.
Implementing a sorting algorithm is a good learning exercise, but you may want to stick to existing algorithms until you are comfortable with the concepts and methods commonly employed. It might be rather frustrating otherwise if the algorithm doesn’t work.
Have fun learning!
P.S. Python’s built-in
timsortalgorithm is really good on a lot of real-world data. So, if you need a general sorting algorithm for production code, you can usually rely on.sort/sortedto be fast enough for your needs. (And, if you can understandtimsort, you’ll do better than 90% of the Python-wielding population 🙂