How can I tokenize strings like this in c:
char str1[] = " property :: content | label ";
char str2[] = "property::content";
char str3[] = "content";
[edit]
I have tried the folowwing:
char str[] = " property :: content | label ";
char *property, *content, *label;
property = strtok(str, "::");
content = strtok(NULL, "|");
label = strtok(NULL, "|");
printf ("%s %s %s\n", property, content, label);
but it splits strings according to a char so it would work for the pipe character used to delimit labels. However the “::” delimiter is a string, not a char. I don’t know how to deal with it.
[edit 2]
I also have this code:
char sentence [] = "property :: content | label";
char property [30];
char content [30];
char label [30];
sscanf (sentence, "%s :: %s | %s", property, content, label);
printf ("<span property=\"%s\" content=\"%s\">%s</span>\n", property, content, label);
I’m just wondering how I can dynamically set the size of each char array…
Thanks.
What you need is a basic lexer
The best way to know it is to pick up a compiler book and read on that.
In short, you would need a bunch of regular expressions and start matching your strings against the regular expressions until you find the biggest matching one with its corresponding dfa in a final state.
Alternatively, if every token is separated by space, you can simply use
strtokandstrcmpto distinguish between special words (such as::) and the rest of the input.After the lexical analysis is done, you’d need a parser. I don’t know your application, so your parser could turn out to be really simple, but otherwise, this answer might help you kick off.