Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8544263
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T12:30:54+00:00 2026-06-11T12:30:54+00:00

I used the regex given in perlfaq6 to match and remove javascript comments, but

  • 0

I used the regex given in perlfaq6 to match and remove javascript comments, but it results in segmentation fault when the string is too long. The regex is –

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

Can it be improved to avoid segmentation fault ?

[EDIT]

Long input:

<ent r=\"6\" t=\"259\" w=\"252\" /><ent r=\"6\" t=\"257\" w=\"219\" />

repeated about a 1000 times.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T12:30:55+00:00Added an answer on June 11, 2026 at 12:30 pm

    I suspect the trouble is partly that your ‘C code’ isn’t very much like C code. In C, you can’t have the sequence \" outside a pair of quotes, single or double, for example.

    I adapted the regex to make it readable and wrapped into a trivial script that slurps its input and applies the regex to it:

    #!/usr/bin/env perl
    
    ### Original regex from PerlFAQ6.
    ### s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
    
    undef $/;  # Slurp input
    
    while (<>)
    {
        print "raw: $_";
    
        s%
            /\*[^*]*\*+([^/*][^*]*\*+)*/    # Simple C comments
         |  //([^\\]|[^\n][\n]?)*?\n        # C++ comments, allowing for backslash-newline continuation
         |  (
                "(\\.|[^"\\])*"             # Double-quoted strings
            |   '(\\.|[^'\\])*'             # Single-quoted characters
            |   .[^/"'\\]*                  # Anything else
            )
         %    defined $3 ? $3 : ""
         %egsx;
    
        print "out: $_";
    }
    

    I took your line of non-C code, and created files data.1, data.2, data.4, data.8, …, data.1024 with the appropriate number of lines in each. I then ran a timing loop.

    $ for x in 1 2 4 8 16 32 64 128 256 512 1024
    > do
    >     echo
    >     echo $x
    >     time perl xx.pl data.$x > /dev/null
    > done
    $
    

    I’ve munged the output to give just the real time for the different file sizes:

       1    0m0.022s
       2    0m0.005s
       4    0m0.007s
       8    0m0.013s
      16    0m0.035s
      32    0m0.130s
      64    0m0.523s
     128    0m2.035s
     256    0m6.756s
     512    0m28.062s
    1024    1m36.134s
    

    I did not get a core dump (Perl 5.16.0 on Mac OS X 10.7.4; 8 GiB main memory). It does begin to take a significant amount of time. While it was running, it was not growing; during the 1024-line run, it was using about 13 MiB of ‘real’ memory and 23 MiB of ‘virtual’ memory.

    I tried Perl 5.10.0 (the oldest version I have compiled on my machine), and it used slightly less ‘real’ memory, essentially the same ‘virtual’ memory, and was noticeably slower (33.3s for 512
    lines; 1m 53.9s for 1024 lines).

    Just for comparison purposes, I collected some C code that I had lying around in the test directory to create a file of about 88 KiB, with 3100 lines of which about 200 were comment lines. This compares with the size of the data.1024 file which was about 77 KiB. Processing that took between 10 and 20 milliseconds.

    Summary

    The non-C source you have makes a very nasty test case. Perl shouldn’t crash on it.

    Which version of Perl are you using, and on which platform? How much memory does your machine have. However, total quantity of memory is unlikely to be the issue (24 MiB is not an issue on most machines that run Perl). If you have a very old version of Perl, the results might be different.


    I also note that the regex does not handle some pathological C comments that a C compiler must handle, such as:

    /\
    \
    * Yes, this is a comment *\
    \
    /
    /\
    \
    / And so is this
    

    Yes, you’d be right to reject any code submitted for review that contained such comments.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have used the excellent gskinner.com/RegExr/ tool to test my string matching regex but
I used the following regex: $regex = '/<a href=\([^\]*)\>(.*)<\/a>/iU'; but it always fail to
In JavaScript, can (?=regex) and (?!regex) be used in the middle of a regular
I would like to generate a regex for a string like S67-90. I used
I've used regex for ages but somehow I managed to never run into something
I have string like 8.123.351 (Some text here) I have used the Regex /([0-9,]+(\.[0-9]{2,})+(\.[0-9]{2,})?)/
I have used preg_match to check whether a string matches a given regular expression,
This regex comes from Atwood and is used to filter out anchor tags with
I used it in C# var validString = new Regex(@^[a-z][a-z\d!@#$%\^&*()\-+]{0,7}$(?<=\d\D+), RegexOptions.Compiled); I am trying
When you are using a Regex instance, that is used in a method where

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.