I want to sort a text column, where the numeric components should be sorted as numbers. The sorted result should look like this:
chr1
chr1,chr1
chr1,chr2
chr1,chr10
chr2
chr2,chr1
chr2,chr2
chr2,chr10
chr6
chr6,chr1
chr6_ux9
chr6_ux9,chr1
chr7
chr10
chr10,chr1
chr10,chr2
chr10,chr10
chr21
chr21,chr1
chr21,chr2
chr21,chr10
chrx
chrx,chr1
chrx,chr2
chrx,chr10
chry
chry,chr1
chry,chr2
chry,chr10
chrmt
chrmt,chr1
chrmt,chr2
chrmt,chr10
chr25
chr25,chr1
chr25,chr2
chr25,chr10
The following rules apply:
chrxis treated aschr22chryis treated aschr23chrmtis treated aschr24chr6_ux9is a special case that should come afterchr6
I tried different ways but not able to find the perfect solution. Please help me if anyone has an idea.
I think I understand now, what you are looking for. You want the numeric components sorted as numbers, not strings. This should work for you:
Sorts columns as depicted in the questin. The textual component (‘chr’) turns out to be redundant noise. After applying all replacements, I strip the noise and cast to a numeric array which can be used in the
ORDER BYclause.While performing the listed substitutions, the special case for
chr6_ux9forces the use ofreal[]instead of the simpler and fasterint[], because theintegertype leaves no room between6and7. You also have one column with a space instead of a comma. I added a substitution for that, too. But that’s probably just a typo. After removing the irrelevant stringchr, only comma-separated numbers remain, which can be cast toreal[].BTW,
replace()is very fast. I have functions with dozens ofreplace()operations in a row that still perform fast. (regexp_replace()is much slower.)Alternative answer for sorting individual elements
For a sorted output of all values as strings:
chr6_ux9comes afterchr6automatically in this scenario.