I am trying to build a scanner for AWK source code using (F)Lex analysis. I have been able to identify AWK keyworkds, comments, string literals, and digits however I am stuck on how to generate regular expressions for matching variable instance names since these are quite dynamic.
Could someone please help me develop a regular expression for matching AWK variables.
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html provides definition for the AWK language.
Variables must start with a letter but can be alphanumerical without regard to case. The only special character that can be used is an underscore (“_”). I apologize I am not very experienced with REGEX let alone regular expressions for FLEX.
Thank you for your help.
Alphabetic or underscore to start, followed by zero or more alphanumerics or underscore.
Special cases will be fields, which are prefixed by
$:and also
You’ll have to decide how you’re going to deal with those.