I am trying to write lex code which will take a string as input, and parse through a long dictionary file to find the longest word in that dictionary which is made up of only the letters in that string. Each letter in the string can be used zero or more times, meaning the word “in” would be valid for “input”. Here is what I have so far:
%{
#include <stdio.h>
%}
%option noyywrap
%%
[input]+ {
printf("This is the longest I think: %s\n", yytext);
}
.|\n {}
%%
int main(void)
{
yylex();
return 0;
}
However, this really does not do what I expect it to do. This code goes through and prints the matching portions of every word in the dictionary, so I get output like “i”, “iu”, “inu”, etc., and these obviously aren’t valid words. Anyone know how to fix this?
You could use the beginning-of-line and end-of-line markers as part of your regular expression to require that the entire line is matched, not just a part of it. Try changing your regex from
[input]+toYou will then need some separate logic to track the longest string you’ve found so far, but judging from the code you have above I think this more directly addresses your question at hand.
Hope this helps!