So-called natural sort is meant to address the following problem: when users expect
file1.txt
file2.txt
file3.txt
file10.txt
file11.txt
“usual” sort instead produces:
file1.txt
file10.txt
file11.txt
file2.txt
file3.txt
which is inconvenient and isn’t “natural”.
Now we recently faced a situation when users complained about this very same problem and we considered employing natural sort. However the following problem arised. Consider the following set of strings:
file1file100.txt
file2file99.txt
...
file99file2.txt
file100file1.txt
in which there’s more than one identifiable number sequence and those sequences are in opposite to each other. How should natural sort deal with such sets (I mean what should the result be, not how to implement that)?
The one that comes first wins, surely.
Usual sort lexicographically sorts filenames as sequences of characters (well, perhaps with special treatment of file extensions, although that might be implemented just by ordering
.first among characters) :'f', 'i', 'l', 'e', '1', 'f', 'i', 'l', 'e', '1', '0', '0'.Natural sort lexicographically sorts filenames as sequences of tokens, where each token is either a character or a number:
'f', 'i', 'l', 'e', 1, 'f', 'i', 'l', 'e', 100. Comparison between characters is normal character order, comparison between numbers is normal integer order, and comparison between a character and a number places numbers before any character (except.). Finally you need to break the tie betweenfile1andfile01, so the “numbers” aren’t quite just numbers, they do need to “know” their original representation in case it gets that far.I’d actually sort of advise against asking the users. If they have a really strong opinion how they want their files sorted then OK, fair enough. Otherwise they might not actually know exactly what they “should” expect, so it makes more sense for an analyst/programmer to figure out what’s “normal” than for a user to do so. Of course you can “ask” them indirectly via usability testing, if it’s a big enough deal to be worth it. I find that if you ask users the wrong questions, they feel pressured to guess answers, and there’s no point coding something arbitrary just because it’s what the user representative thought of on the spot.
Whatever users think the rules should be, chances are what they’ll actually get on with best is whatever their OS does by default when listing files in its file manager, file dialogs, and that sort of thing. So I’d offer them that (or perhaps the closest to that I can code without wasting a lot of their money on minor edge cases), and if they’re still not happy find out why.