I do have text manipulation problem that I need to solve in awk, sed & shell.
My text looks like this:
>Sample_1
100 101
aaattattacaaaaataattacaaattattacaaaaagaattattacaaaaagaattacaaaa
-1.60 .(((((((.....)))))))........................................... []
>Sample_2
1 35
aattattacaaaaagaattattacaaaaagaatta
0.00 ................................... _
>Sample_3
1 123
gctcacacctgtaatcccagcactttgggaggctgagg
-27.80 ((((.....))))......((((((.(((...))))))).)[][][[][]]
-26.40 (((((.((...(((((..((((((....))......... [[][]][]
-25.80 ((((.....)))).....((((((............... [][][][[][]]
123 145
ctgaggcaggcagatcacgaggtcacgagatcaa
-26.20 (((.....)))))) [][][[][]]
-25.90 ....((((..((....)) [][[][]]
-25.70 ..(((..((....))..(()) [[][]][[][]]
145 256
gtaatcccagcactttgggaggctgaggcaggcaga
0.00 ........................................... _
256 342
-25.00 ..((....((((.....((((((...)))....))... [[][]]
-24.00 ..((.((((.((((())... [[][][]]
-23.70 .((((((...(((((..((.. [[][]][]
I want to:
- Extract Sample name (
>Sample_1); - Extract numeric value that goes after the sample name (it’s either 0 or negative value);
- From the negative value group (e.g.
-27.80;-26.40;-25.80) extract number that goes first (it’s the most negative value).
Perfect output would look like this:
>Sample_1
-1.60
>Sample_2
0.00
>Sample_3
-27.80
-26.20
0.00
-25.00
I tried to do this in awk printing $1, grepping ‘>’, 0 & negative values, but wasn’t able to diverge column into groups & and to extract the most negative value.
awk '{print $1}' file | egrep -i '>|0.00|-'
You tagged your question with
sedandawk, but if you’re O.K. with Perl instead, you could write: