Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6720793
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T09:15:14+00:00 2026-05-26T09:15:14+00:00

Let’s say you have a HTML file with a couple duplicate scripts, meaning multiple

  • 0

Let’s say you have a HTML file with a couple duplicate scripts, meaning multiple external script tags for the same resource, like loading jquery 3 times on the page. Is there an efficient regular expression that can remove the duplicates but keep the first one in place. The duplicates will be all with the same exact src name.

Language is PHP and here is a good example:

Before:

<script src="js/jquery.js" type="text/javascript"></script>
    some content
<script src="js/jquery.js" type="text/javascript"></script>
    more content
<script src="js/jquery.js" type="text/javascript"></script>

After:

<script src="js/jquery.js" type="text/javascript"></script>
    some content
    more content
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T09:15:14+00:00Added an answer on May 26, 2026 at 9:15 am

    Disclaimer:

    Many will rightfully state that using regular expressions to parse non-regular languages such as HTML is fraught with peril. And they are correct. The only way to reliably parse these languages is with a parser specifically designed for the task. A solution using regular expressions will typically have many special cases of subject text that will cause it to fail, resulting in false positives, and missing matches.

    That said…

    If one insists upon using regular expressions to process HTML/XML markup, and they are aware of the inherent limitations, there are ways to craft a regex solution that can minimize these potential pitfalls, and do a “pretty good” job (depending on the specific requirements of the question). However, to correctly handle many of the rare (but valid and possible) edge cases (e.g. correctly handling HTML tag attributes containing <> angle brackets for instance), the correct regex can frequently be rather complex and not for the faint-of-heart.

    Understanding the following regex solution requires a fairly deep understanding of the regex language and the underlying mechanics of the regex engine. There are certainly examples of markup text that will cause it to fail, but the following solution should do pretty good job for many cases of typical markup.

    Here is a tested PHP function that removes SCRIPT elements having duplicate SRC attribute values:

    // Strip all SCRIPT elements having duplicate SRC URLs.
    function stripDuplicateScripts($text) {
        $re = '%
            # Match duplicate SCRIPT element having same SRC attribute URL.
            (                   # $1: Everything up to duplicate SCRIPT element.
              <script           # literal start of script open tag
              (?:               # Zero or more attributes before SRC.
                \s+             # Whitespace required before attribute.
                (?!src\b)       # Assert this attribute is not "SRC".
                [\w\-.:]+       # Non-SRC attribute name.
                (?:             # Attribute value is optional.
                  \s*=\s*       # Value separated by =, optional ws.
                  (?:           # Group attribute value alternatives.
                    "[^"]*"     # Either a double quoted value,
                  | \'[^\']*\'  # or a single quoted value,
                  | [\w\-.:]+   # or an unquoted value.
                  )             # End group of value alternatives.
                )?              # Attribute value is optional.
              )*                # Zero or more attributes before SRC.
              \s+               # Whitespace required before SRC attrib.
              src               # Required SRC attribute name.
              \s*=\s*           # Value separated by =, optional ws.
              ([\'"])           # $2: Attrib value opening quote.
              ((?:(?!\2).)+)    # $3: SRC attribute value (a URL).
              \2                # Attrib value closing quote.
              (?:               # Zero or more attributes after SRC.
                \s+             # Whitespace required before attribute.
                [\w\-.:]+       # Attribute name.
                (?:             # Attribute value is optional.
                  \s*=\s*       # Value separated by =, optional ws.
                  (?:           # Group attribute value alternatives.
                    "[^"]*"     # Either a double quoted value,
                  | \'[^\']*\'  # or a single quoted value,
                  | [\w\-.:]+   # or an unquoted value.
                  )             # End group of value alternatives.
                )?              # Attribute value is optional.
              )*                # Zero or more attributes after SRC.
              \s*               # Optional whitespace before tag close.
              >                 # End of SCRIPT open tag.
              </script\s*>      # SCRIPT close tag.
              .*?               # Stuff up to duplicate script element.
            )                   # End $1: Everything up to duplicate SCRIPT.
            <script             # literal start of script open tag
            (?:                 # Zero or more attributes before SRC.
              \s+               # Whitespace required before attribute.
              (?!src\b)         # Assert this attribute is not "SRC".
              [\w\-.:]+         # Non-SRC attribute name.
              (?:               # Attribute value is optional.
                \s*=\s*         # Value separated by =, optional ws.
                (?:             # Group attribute value alternatives.
                  "[^"]*"       # Either a double quoted value,
                | \'[^\']*\'    # or a single quoted value,
                | [\w\-.:]+     # or an unquoted value.
                )               # End group of value alternatives.
              )?                # Attribute value is optional.
            )*                  # Zero or more attributes before SRC.
            \s+                 # Whitespace required before SRC attrib.
            src                 # Required SRC attribute name.
            \s*=\s*             # Value separated by =, optional ws.
            ([\'"])             # $4: Attrib value opening quote.
            \3                  # This script must have duplicate SRC URL.
            \4                  # Attrib value closing quote.
            (?:                 # Zero or more attributes after SRC.
              \s+               # Whitespace required before attribute.
              [\w\-.:]+         # Attribute name.
              (?:               # Attribute value is optional.
                \s*=\s*         # Value separated by =, optional ws.
                (?:             # Group attribute value alternatives.
                  "[^"]*"       # Either a double quoted value,
                | \'[^\']*\'    # or a single quoted value,
                | [\w\-.:]+     # or an unquoted value.
                )               # End group of value alternatives.
              )?                # Attribute value is optional.
            )*                  # Zero or more attributes after SRC.
            \s*                 # Optional whitespace before tag close.
            >                   # End of SCRIPT open tag.
            </script\s*>        # SCRIPT close tag.
            \s*                 # Strip whitespace following duplicate.
            %six';
        while (preg_match($re, $text)) {
            $text = preg_replace($re, '$1', $text);
        }
        return $text;
    }
    

    The function above uses one regex which is applied recursively until no matches are found. Although at first glance the regex looks like a monster, its actually quite straight-forward (if you are well versed in regex syntax) and most of the text consists of descriptive comments. The complexity of this regex is required to handle the variety of attribute/value formats allowed by HTML. For example, the SCRIPT tags may have any number of attributes before and after the SRC attribute. The SRC attribute value may be single or double quoted. All other attributes may have values that are either quoted or unquoted and may have no value at all. Quoted attributes may contain <> angle brackets.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let say I have some code HTML code: <ul> <li> <h1>Title 1</h1> <p>Text 1</p>
Let's say I have the following models class Photo(models.Model): tags = models.ManyToManyField(Tag) class Tag(models.Model):
Let's say you create a wizard in an HTML form. One button goes back,
Let's say I'm building a data access layer for an application. Typically I have
Let's say you have a class called Customer, which contains the following fields: UserName
Let's say we have a simple function defined in a pseudo language. List<Numbers> SortNumbers(List<Numbers>
Let's say I have a drive such as C:\ , and I want to
Let's say that we have an ARGB color: Color argb = Color.FromARGB(127, 69, 12,
Let's say I have a dataset in an ASP.NET website (.NET 3.5) with 5
Let's say I have a simple Login servlet that checks the passed name and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.