I am new to using regular expression in python. I am having trouble figuring out how to do the following:
I have a bunch of text description as strings that looks like this:
FX0XST001ALF89 OLIGO: Bacillus_cand1=ATGCGGTTCAAAATGTTATC
FILE:/home/AAFC-AAC/fungs/biodiversity/pipelines/454PipelineOutput/v7_newest_testrun_full/rs75/plate1/FX0XST001.MID13/FX0XST001.MID13.sff.trim.fasta
Project: SAGES SFF: FX0XST001 SFF.MID: FX0XST001.MID13
Plate: 1.1 MID_all: MID13 MID: 13 Sample: BK104
Collector: BK Year: 2008 Week: Year_Week:
Location: Ottawa_ON City: Ottawa Province: ON Crop:
Treatment: Substrate_all: Air Substrate: Air Target: Bacteria
Forward Primer: Bac16S27F Reverse Primer: Bac16S690R Taq: T
I want to be able extract the categories inside this large string and store them into a database or something, for example:
Year: 2008
Sample: BK104
Collector: BK
etc...
How can I use regular expression in python to achieve this?
I am thinking of using search:
match = re.search(r'Sample:\w\w\w\w\w', theTextDescription)
The problem is the length of the text in each ‘field’ is different. I don’t really know how to take that into consideration
something like this, you can use
\w+to match characters to any number of length:or may be store it in a dictionary: