Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8572737
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T19:05:34+00:00 2026-06-11T19:05:34+00:00

I am putting together the last pattern for my flex scanner for parsing AWK

  • 0

I am putting together the last pattern for my flex scanner for parsing AWK source code.

I cannot figure out how to match the regular expressions used in the AWK source code as seen below:

{if ($0 ~ /^\/\// ){ #Match for "//" (Comment)

or more simply:

else if ($0 ~ /^Department/){

where the AWK regular expression is encapsulated within “/ /”.

All of the Flex patterns I have tried so far match my entire input file. I have tried changing the precedence of the regex pattern and have found no luck. Help would be greatly appreciated!!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T19:05:36+00:00Added an answer on June 11, 2026 at 7:05 pm

    regexing regexen must be a meme somewhere. Anyway, let’s give it a try.

    A gawk regex consists of:

    • /

    • any number of regex components

    • /

    A regex component (simplified form — Note 1) is one of the following:

    • any character other than /, [ or \

    • a \ followed by any single character (we won’t get into linefeeds just now, though.

    • a character class (see below)

    Up to here it’s easy. Now for the fun part.

    A character class is:

    • [ or [^ or [] or [^] (Note 2)

    • any number of character class components

    • ]

    A character class component is (theoretically, but see below for the gawk bug) one of the following:

    • any single character other than ] or \ (Note 3)

    • a \ followed by any single character

    • a character class

    • a collation class

    A character class is: (Note 5)

    • [:

    • a valid class name, which afaik is always a sequence of alpha characters, but it’s maybe safer not to make assumptions.

    • :]

    A collation class is mostly unimplemented but partially parsed. You could probably ignore them, because it seems like gawk doesn’t get them right yet (Note 4). But for what it’s worth:

    • [.

    • some multicharacter collation character, like ‘ij’ in Dutch locale (I think).

    • .]

    or an equivalence class:

    • [=

    • some character, or maybe also a multicharacter collation character

    • =]

    An important point is the [/] does not terminate the regex. You don’t need to write [\/]. (You don’t need to do anything to implement that. I’m just mentioning it.).


    Note 1:

    Actually, the intepretation of \ and character classes, when we get to them, is a lot more complicated. I’m just describing enough of it for lexing. If you actually want to parse the regexen into their bits and pieces, it’s a lot more irritating.

    For example, you can specify an arbitrary octet with \ddd or \xHH (eg \203 or \x4F). However, we don’t need to care, because nothing in the escape sequence is special, so for lexing purposes it doesn’t matter; we’ll get the right end of the lexeme. Similary, I didn’t bother describing character ranges and the peculiar rules for - inside a character class, nor do I worry about regex metacharacters (){}?*+. at all, since they don’t enter into lexing. You do have to worry about [] because it can implicitly hide a / from terminating the regex. (I once wrote a regex parser which let you hide / inside parenthesized expressions, which I thought was cool — it cuts down a lot on the kilroy-was-here noise (\/) — but nobody else seems to think this is a good idea.)


    Note 2:

    Although gawk does \ wrong inside character classes (see Note 3 below), it doesn’t require that you use them, so you can still use Posix behaviour. Posix behaviour is that the ] does not terminate the character class if it is the first character in the character class, possibly following the negating ^. The easiest way to deal with this is to let character classes start with any of the four possible sequences, which is summarized as:

    \[^?]?
    

    Note 3:

    gawk differs from Posix ERE’s (Extended Regular Expressions) in that it interprets \ inside a character class as an escape character. Posix mandates that \ loses its special meaning inside character classes. I find it annoying that gawk does this (and so do many other regex libraries, equally annoying.) It’s particularly annoying that the gawk info manual says that Posix requires it to do this, when it actually requires the reverse. But that’s just me. Anyway, in gawk:

    /[\]/]/
    

    is a regular expression which matches either ] or /. In Posix, stripping the enclosing /s out of the way, it would be a regular expression which matches a \ followed by a / followed by a ]. (Both gawk and Posix require that ] not be special when it’s not being treated as a character class terminator.)


    Note 4:

    There’s a bug in the version of gawk installed on my machine where the regex parser gets confused at the end of a collating class. So it thinks the regex is terminated by the first second / in:

    /[[.a.]/]/
    

    although it gets this right:

    /[[:alpha:]/]/
    

    and, of course, putting the slash first always works:

    /[/[:alpha:]]/
    

    Note 5:

    Character classes and collating classes and friends are a bit tricky to parse because they have two-character terminators. “Write a regex to recognize C /* */ comments” used to be a standard interview question, but I suppose it not longer is. Anyway, here’s a solution (for [:…:], but just substitute : for the other punctuation if you want to):

    [[]:([^:]|:*[^]:])*:+[]]   // Yes, I know it's unreadable. Stare at it a while.
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm putting together some code to essentially replace the contents of a div when
So I'm putting together a little code metrics report based off of usage data
I am putting together a fairly complex regular expression. One part of the expression
I have spent the last few hours putting togeather the following code after reading
My code: http://jsfiddle.net/hayleyeaston/nkfms/4/ I am putting together a postcode lookup where a user inputs
I've spent the last few days putting together a game that runs as an
I putting together a page that will display a set of stored values. I
I'm putting together a demo web-app. I've created a certificate signed by my own
I am putting together a fairly simple web app that uses user inputted data
I'm just putting together a little POC project, and I'm having some weird issues

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.