Possible Duplicate:
finding a specific data from a text file in matlab
I already opened the text file titled ‘gos.txt’
using the following code:
s={};
fid = fopen('gos.txt');
tline = fgetl(fid);
while ischar(tline)
s=[s;tline];
tline = fgetl(fid);
end
I got the result as follows:
s =
'[Term]'
'id: GO:0008150'
'name: biological_process'
'namespace: biological_process'
'alt_id: GO:0000004'
'alt_id: GO:0007582'
[1x243 char]
[1x445 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_yeast'
'subset: gosubset_prok'
'synonym: "biological process" EXACT []'
'synonym: "biological process unknown" NARROW []'
'synonym: "physiological process" EXACT []'
'xref: Wikipedia:Biological_process'
'[Term]'
'id: GO:0016740'
'name: transferase activity'
'namespace: molecular_function'
[1x326 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_metagenomics'
'subset: goslim_pir'
'subset: goslim_plant'
'subset: gosubset_prok'
'xref: EC:2'
'xref: Reactome:REACT_25050 "Molybdenum ion transfer onto molybdopterin, Homo sapiens"'
'//is_a: GO:0003674 ! molecular_function'
'is_a: GO:0008150 ! molecular_function (added by Zaid, To be Removed Later)'
'//relationship: part_of GO:0008150 ! biological_process'
'[Term]'
'id: GO:0016787'
'name: hydrolase activity'
'namespace: molecular_function'
[1x186 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'subset: goslim_metagenomics'
'subset: goslim_plant'
'subset: gosubset_prok'
'xref: EC:3'
'//is_a: GO:0003674 ! molecular_function'
'is_a: GO:0016740 ! molecular_function (added by Zaid, to be removed later)'
'relationship: part_of GO:0008150 ! biological_process'
'[Term]'
'id: GO:0006810'
'name: transport'
'namespace: biological_process'
'alt_id: GO:0015457'
'alt_id: GO:0015460'
[1x255 char]
'subset: goslim_aspergillus'
'subset: goslim_candida'
'synonym: "small molecule transport" NARROW []'
'synonym: "solute:solute exchange" NARROW []'
'synonym: "transport accessory protein activity" RELATED [GOC:mah]'
'is_a: GO:0016787 ! biological_process'
'relationship: part_of GO:0008150 ! biological_process'
.
.
.
.
the step after is how to take a certain charater and put it in a vector.. for example: I want to put all lines contains ‘id: GO:*******’ and put them in a vector, also I want to get ‘is_a: GO:*******’ to a vector , note that I don’t want to the characters after that in the same line .
You can easily use
regexphere – it works for cells:extracts all lines that start with
id: GO. Thecellfuncall alone gives you a vector of 0/1, where 1 means that a string insmatches your query.Similar line finds ones that contain
is_a: GO:. Cutting unnecessary characters from the strings can also be done withregexp.Extracting parts of the strings can be done using the
'tokens'parameter ofregexp: