I need some help with text manipulation.
I do have data like this:
29554 31109 “ENSG00000243485.1” 1555
29554 31097 “ENSG00000243485.1” 1543
29554 30039 “ENSG00000243485.1” 485
30564 30667 “ENSG00000243485.1” 103
30267 30667 “ENSG00000243485.1” 400
30976 31109 “ENSG00000243485.1” 133
89295 133566 “ENSG00000238009.2” 44271
89295 120932 “ENSG00000238009.2” 31637
120775 120932 “ENSG00000238009.2” 157
112700 112804 “ENSG00000238009.2” 104
92091 92240 “ENSG00000238009.2” 149
28269867 28269929 “ENSG00000248451.1” 62
28270383 28270486 “ENSG00000248451.1” 103
28273195 28273372 “ENSG00000248451.1” 177
28275308 28275354 “ENSG00000248451.1” 46
…………………
I have to print the line with the biggest value per group.
There is group name in column 4 and values are in column 5.
As I imagine it should go like this:
1. Separating groups from each other;
2. Selecting biggest value;
3. Printing the whole line.
Preferred output for the example should be:
29554 31109 “ENSG00000243485.1” 1555
89295 133566 “ENSG00000238009.2” 44271
28273195 28273372 “ENSG00000248451.1” 177
Hope someone could help me with this in awk or sed.
This should do in
bashandawk: