Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7733365
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T06:58:10+00:00 2026-06-01T06:58:10+00:00

I’m trying to insert a space after each semi-colon, unless the semi-colon is part

  • 0

I’m trying to insert a space after each semi-colon, unless the semi-colon is part of an HTML entity. The examples here are short, but my strings can be quite long, with several semi-colons (or none).

Coca‑Cola =>     Coca‑Cola  (‑ is a non-breaking hyphen)
Beverage;Food;Music => Beverage; Food; Music

I found the following regular expression that does the trick for short strings:

<?php
$a[] = 'Coca&#8209;Cola';
$a[] = 'Beverage;Food;Music';
$regexp = '/(?:&#?\w+;|[^;])+/';
foreach ($a as $str) {
    echo ltrim(preg_replace($regexp, ' $0', $str)).'<br>';
}
?>

However, if the string is somewhat large, the preg_replace above actually crashes my Apache server (The connection to the server was reset while the page was loading.) Add the following to the sample code above:

$a[] = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. '.
   'In blandit metus arcu. Fusce eu orci nulla, in interdum risus. '.
   'Maecenas ut velit turpis, eu pretium libero. Integer molestie '.
   'faucibus magna sagittis posuere. Morbi volutpat luctus turpis, '.
   'in pretium augue pellentesque quis. Cras tempor, sem suscipit '.
   'dapibus lacinia, dolor sapien ultrices est, eget laoreet nibh '.
   'ligula at massa. Cum sociis natoque penatibus et magnis dis '.
   'parturient montes, nascetur ridiculus mus. Phasellus nulla '.
   'dolor, placerat non sem. Proin tempor tempus erat, facilisis '.
   'euismod lectus pharetra vel. Etiam faucibus, lectus a '.
   'scelerisque dignissim, odio turpis commodo massa, vitae '.
   'tincidunt ante sapien non neque. Proin eleifend, lacus et '.
   'luctus pellentesque;odio felis.';

The code above (with the large string) crashes Apache but works if I run PHP on the command line.

Elsewhere in my program I use preg_replace on much larger strings without problem, so I’m guessing something in the regular expression overwhelms PHP/Apache.

So, is there a way to ‘fix’ the regex so it works on Apache with large strings or is there another, safer, way to do this?

I’m using PHP 5.2.17 with Apache 2.0.64 on Windows XP SP3, if it’s any help. (Unfortunately, upgrading either PHP or Apache is not an option for now.)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T06:58:12+00:00Added an answer on June 1, 2026 at 6:58 am

    I would suggest this match expression:

    \b(?<!&)(?<!&#)\w+;
    

    …which matches a series of characters (letters, numbers, and underscore) which is not preceded by an ampersand (or an ampersand followed by a hash symbol) but which is followed by a semicolon.

    it breaks down to mean:

    \b          # assert that this is a word boundary
    (?<!        # look behind and assert that you cannot match
     &          # an ampersand
    )           # end lookbehind
    (?<!        # look behind and assert that you cannot match
     &#         # an ampersand followed by a hash symbol
    )           # end lookbehind
    \w+         # match one or more word characters
    ;           # match a semicolon
    

    replace with the string '$0 '

    let me know if this doesn’t work for you

    Of course, you could also use [a-zA-Z0-9] instead of \w to avoid matching a semicolon, but I don’t think that would ever give you any trouble

    Also, you might need to escape the hash symbol as well (because that is the regex comment symbol), like so:

    \b(?<!&)(?<!&\#)\w+;
    

    EDIT Not sure, but I’m guessing that putting the word boundary at the beginning is going to make it a bit more efficient (and thus less likely to crash your server), so I changed that in the expressions and the break-down…

    EDIT 2 … and a bit more info on why your expression might be making your server crash: Catastrophic Backtracking — I think this applies (?) hmmm…. good info nonetheless

    FINAL EDIT if you are looking to only add a space after a semicolon if there is not already whitespace after it (i.e. add one in the case of pellentesque;odio but not in the case of pellentesque; odio), then add an additional lookahead at the end, which will prevent extra unnecessary spaces being added:

    \b(?<!&)(?<!&\#)\w+;(?!\s)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I am trying to understand how to use SyndicationItem to display feed which is
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
For some reason, after submitting a string like this Jack’s Spindle from a text
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I am trying to render a haml file in a javascript response like so:
I have this code to decode numeric html entities to the UTF8 equivalent character.
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.