I am a beginner in Python (I am a biologist) and I have a file with the results from a particular software and i would like to parse the result using python. From the following output I would like to get just the score and would like to split the sequence into individual amino acids.
no. score Sequence
1 0.273778 FFHH-YYFLHRRRKKCCNNN-CCCK---HQQ---HHKKHV-FGGGE-EDDEDEEEEEEEE-EE--
2 0.394647 IIVVIVVVVIVVVVVVVVVV-CCCVA-IVVI--LIIIIIIIIYYYA-AVVVVVVVAAAAV-AST-
3 0.456667 FIVVIVVVVIXXXXIGGGGT-CCCCAV -------------IVBBB-AAAAAA--------AAAA-
4 0.407581 MMLMILLLLMVVAIILLIII-LLLIVLLAVVVVVAAAVAAVAIIII-ILIIIIIILVIMKKMLA-
5 0.331761 AANSRQSNAAQRRQCSNNNR-RALERGGMFFRRKQNNQKQKKHHHY-FYFYYSNNWWFFFFFFR-
6 0.452381 EEEEDEEEEEEEEEEEEEEE-EEEEESSTSTTTAEEEEEEEEEEEE-EEEEEEEEEEEEEEEEE-
7 0.460385 LLLLLLLLMMIIILLLIIII-IIILLVILMMEEFLLLLILIVLLLM-LLLLLLLLLLVILLLVL-
8 0.438680 ILILLVVVVILVVVLQLLMM-QKQLIVVLLVIIMLLLLMLLSIIIS-SMMMILFFLLILIIVVL-
9 0.393291 QQQDEEEQAAEEEDEKGSSD-QQEQDDQDEEAAAHQLESSATVVQR-QQQQQVVYTHSTVTTTE-
From the above table,I would like to get a table with the same number,score but the sequences separated individually (columnwise)
so it should look like
no. score amino acid(1st column)
1 0.273778 F
2 0.395657 I
3 0.456667 F
another table representing the second column of amino acids
no score amino acid (2nd column)
1 0.273778 F
2 0.395657 I
3 0.456667 I
third table representing the third column of amino acids and fourth table for 4th column of amino acids and so on
Thanks in advance for the help
From your example I guess that:
Here is my code sample, it reads data from
input.datand writes results toresult-column-<number>.dat:Notable functions used in this example: