Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5937343
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T15:31:43+00:00 2026-05-22T15:31:43+00:00

I need help from the Regex wizards out there. I am trying to write

  • 0

I need help from the Regex wizards out there. I am trying to write a simple parser that can tokenize the options list of a Snort rule (Snort, the IDS/IPS software). Problem is, I can’t seem to find a workable formula that breaks apart the individual rule options based on their terminating semi-colon. The formulas that I have cooked up grab all options between parenthesis into a single capture group.

I am using the excellent RegExr tool at the GSkinner site with some of the below sample rule options from Emerging Threats (I parsed off the rule header — that’s easy to tokenize):

(msg:"ET DELETED Majestic-12 Spider Bot User-Agent (MJ12bot)"; flow:to_server,established; content:"|0d 0a|User-Agent\: MJ12bot|0d 0a|"; classtype:trojan-activity; reference:url,www.majestic12.co.uk/; reference:url,doc.emergingthreats.net/2003409; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/POLICY/POLICY_Majestic-12; sid:2003409; rev:4;)
(msg:"ET DELETED Majestic-12 Spider Bot User-Agent Inbound (MJ12bot)"; flow:to_server,established; content:"|0d 0a|User-Agent\: MJ12bot"; classtype:trojan-activity; reference:url,www.majestic12.co.uk/; reference:url,doc.emergingthreats.net/2007762; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/POLICY/POLICY_Majestic-12; sid:2007762; rev:4;)
(msg:"ET POLICY McAfee Update User Agent (McAfee AutoUpdate)"; flow:to_server,established; content:"User-Agent|3a| "; http_header; nocase; content:"McAfee AutoUpdate"; http_header; pcre:"/User-Agent\x3a[^\n]+McAfee AutoUpdate/i"; classtype:not-suspicious; reference:url,doc.emergingthreats.net/2003381; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/POLICY/POLICY_McAffee; sid:2003381; rev:6;)
(msg:"ET DELETED Metacafe.com family filter off"; flow:established,to_server; content:"POST"; http_method; content:"Host|3a| www.metacafe.com"; http_header; fast_pattern:6,16; content:"submit=Continue+-+I%27m+over+18"; classtype:policy-violation; reference:url,doc.emergingthreats.net/2006367; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/POLICY/POLICY_Metacafe; sid:2006367; rev:7;)

And this is the formula:

([a-zA-Z0-9_:]+(?:[\w\s.,\-/=<>+!\[\]\(\)\{\}\"|\\;'?`~@#$%^&*])+;)

The problem is, it doesn’t handle colons. So two of the rules above will not have their ‘content’ options properly parsed. But on RegExr, each option will be highlighted in blue, including the terminating semi-colon, but NOT the space after the semi-colon. If I fed this into .NET, I should be able to do a Regex.Split and break apart all the tokens correctly.

If I add the colon to the character list, then on RegExr, the entire set of rules will get tokenized as a single blob of text, which is not what I want. Further attempts to tweak the formula result in Adobe Flash crashing, indicating I’m hitting a bug in either Flash or RegExr.

I’ve not ruled out writing my own string tokenizer, but I was hoping regex could save me from dealing with things like counting my open quotations, escaped characters, whitespace, etc.

Snort rule options typically come in the following format:

option:value;
option:"string value";
option:!"negated string value";
option:>num;
option:param1,param2,param3;

But several options tend to have more ‘exotic’ formats for their value, like byte_test. And everyone’s favourite, ‘pcre’, which is basically an option for performing perl-compatible regex’s. So any such tokenizer has to avoid getting confused if it runs into the ‘pcre’ keyword with regex in it.

Thoughts?

Edit:
This below is REALLY close:

([\w]+:?(?:[\x20]|)?(?:[\x00-\xff])*?;)

But, according to RegExr, it gets messed by pcre syntax:

(msg:"ET WEB_SPECIFIC_APPS Horde 3.0.9-3.1.0 Help Viewer Remote PHP Exploit"; flow:established,to_server; content:"/services/help/"; nocase; http_uri; pcre:"/module=[^\;]*\;.*\"/UGi"; classtype:web-application-attack; reference:url,www.milw0rm.com/exploits/1660; reference:cve,2006-1491; reference:bugtraq,17292; reference:url,doc.emergingthreats.net/2002867; reference:url,www.emergingthreats.net/cgi-bin/cvsweb.cgi/sigs/WEB_SPECIFIC_APPS/WEB_Horde; sid:2002867; rev:9; http_method;)

In the above, every single option is highlighted as a distinct grouping, except ]*\;.*\"/. I would think that \x00-\xff would get it all, but it appears that I am using a lazy match. A greedy match gets everything, including all the spaces between options, which I do not want. So I need to somehow modify the regex to handle tokenizing pcre text.

Edit2:This does the trick:

([\w]+:?(?:[\x20]|)?(?<!\\)\"?.*?(?<!\\)\"?;)

I had to play with a few example regex’s that work with quoted strings. Finally realized that I am staring at negative look-behinds that avoid quotes that are escaped. This seems to solve any other escaped character, too, because escaped characters only appear inside unescaped quotes.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T15:31:43+00:00Added an answer on May 22, 2026 at 3:31 pm

    No need for lookaround. Just carefully write the regex to precisely match what you need. This is made much clearer (and easier to maintain) by writing this in verbose free-spacing mode like so: (Although VB.NET syntax makes it awkward to do so)

    Dim RegexObj As New Regex(
        "# Match set of Snort rules enclosed within parentheses." & chr(10) & _
        "\(                              # Literal opening parentheses." & chr(10) & _
        "(?:                             # Group for one or more rules." & chr(10) & _
        "  \w+                           # Required rule name." & chr(10) & _
        "  (?:                           # Group for optional rule value." & chr(10) & _
        "    :                           # Rule name/values separated by :" & chr(10) & _
        "    (?:                         # Group for rule value alternatives." & chr(10) & _
        "      ""                        # Either a double quoted string," & chr(10) & _
        "      [^""\\]*                  # {normal} Use ""Unrolling the Loop""." & chr(10) & _
        "      (?:                       # Begin {(special normal*)*} construct." & chr(10) & _
        "        \\.                     # {special} == escaped anything." & chr(10) & _
        "        [^""\\]*                # More {normal*} non-quote, non-escapes." & chr(10) & _
        "      )*                        # Finish {(special normal*)*} construct." & chr(10) & _
        "      ""                        # Closing quote." & chr(10) & _
        "    | '[^'\\]*(?:\\.[^'\\]*)*'  # or a single quoted string," & chr(10) & _
        "    | [^;]+                     # or one or more non semi-colons." & chr(10) & _
        "    )                           # End group for rule value options." & chr(10) & _
        "  )?                            # Rule value is optional." & chr(10) & _
        "  ; \s*                         # Rule ends with ;, optional ws." & chr(10) & _
        ")+                              # One or more rules." & chr(10) & _
        "\)                              # LiteraL closing parentheses.", 
        RegexOptions.IgnorePatternWhitespace)
    Dim MatchResults As Match = RegexObj.Match(SubjectString)
    While MatchResults.Success
        ' matched text: MatchResults.Value
        ' match start: MatchResults.Index
        ' match length: MatchResults.Length
        MatchResults = MatchResults.NextMatch()
    End While
    

    This regex demonstrates use of Jeffrey Friedl’s “Unrolling the Loop” efficiency technique for correctly matching quoted strings which may contain escaped characters. (See: MRE3)

    Oh yeah, one more thing… Icarus has found you!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.