I have data in this format (tab separated columns)…
Name_00001 Annotation_1 gene 329412 330509 . - . ID=TAXON_00029;Name=TAXON_00029
Name_00001 Annotation_1 mRNA 329412 330509 . - . ID=TAXON_00029T0;Parent=TAXON_00029
Name_00001 Annotation_1 exon 193957 194252 . + . ID=exon:TAXON_00022T0:1;Parent=TAXON_00022T0
Name_00001 Annotation_1 exon 195500 197118 . + . ID=exon:TAXON_00022T0:2;Parent=TAXON_00022T0
…and I would like to sort by the first 5 numbers in column 9 in ascending order. I.e. so the file looks like this:
Name_00001 Annotation_1 exon 193957 194252 . + . ID=exon:TAXON_00022T0:1;Parent=TAXON_00022T0
Name_00001 Annotation_1 exon 195500 197118 . + . ID=exon:TAXON_00022T0:2;Parent=TAXON_00022T0
Name_00001 Annotation_1 gene 329412 330509 . - . ID=TAXON_00029;Name=TAXON_00029
Name_00001 Annotation_1 mRNA 329412 330509 . - . ID=TAXON_00029T0;Parent=TAXON_00029
I have tried to use Unix or Perl, but my skills are very limited…
If you really want to sort by the first 5 digits in column 9,
that is something you cannot literally explain to
sort, but as you suggested,perlis more flexible:This assumes the five digits always exist and are always consecutive but you can take it from there.
It’s better to put the code into a file, add
use warnings; use strict;and increase legibility: