I’m trying to write a Python script which takes a special type of file as input.
This file contains information about multiple genes, and the information about one gene is written over multiple lines, where the number of lines is not the same for every gene. An example would be:
gene join(373616..374161,1..174)
/locus_tag="AM1_A0001"
/db_xref="GeneID:5685236"
CDS join(373616..374161,1..174)
/locus_tag="AM1_A0001"
/codon_start=1
/transl_table=11
/product="glutathione S-transferase, putative"
/protein_id="YP_001520660.1"
/db_xref="GI:158339653"
/db_xref="GeneID:5685236"
/translation="MKIVSFKICPFVQRVTALLEAKGIDYDIEYIDLSHKPQWFLDLS
PNAQVPILITDDDDVLFESDAIVEFLDEVVGTPLSSDNAVKKAQDRAWSYLATKHYLV
QCSAQRSPDAKTLEERSKKLSKAFGKIKVQLGESRYINGDDLSMVDIAWLPLLHRAAI
IEQYSGYDFLEEFPKVKQWQQHLLSTGIAEKSVPEDFEERFTAFYLAESTCLGQLAKS
KNGEACCGTAECTVDDLGCCA"
gene 241..381
/locus_tag="AM1_A0002"
/db_xref="GeneID:5685411"
CDS 241..381
/locus_tag="AM1_A0002"
/codon_start=1
/transl_table=11
/product="hypothetical protein"
/protein_id="YP_001520661.1"
/db_xref="GI:158339654"
/db_xref="GeneID:5685411"
/translation="MLINPEDKQVEIYRPGQDVELLQSPSTISGADVLPEFSLNLEWI
WR"
gene 388..525
/locus_tag="AM1_A0003"
/db_xref="GeneID:5685412"
CDS 388..525
/locus_tag="AM1_A0003"
/codon_start=1
/transl_table=11
/product="hypothetical protein"
/protein_id="YP_001520662.1"
/db_xref="GI:158339655"
/db_xref="GeneID:5685412"
/translation="MKEAGFSENSRSREGQPKLAKDAAIAKPYLVAMTAELQIMATET
L"
What I want, now, is to create a list of dictionaries, where every dictionary contains the information about one gene, like this:
gene_1 = {"locus": /locus_tag, "product": /product, ...}
gene_2 = {"locus": /locus_tag, "product": /product, ...}
I have absolutely no idea how I could make Python know when one gene/dictionary is finished and the next should start.
Can someone please help me? Is there a way to do this?
For clarification: I know how to extract the information I want, save it in variables and get it into the dictionary. I just don’t know how to tell Python to create one dictionary per gene.
In case someone is interested in the beginner’s solution I found with the help of the comments I received, here it is: