Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3426962
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T06:46:44+00:00 2026-05-18T06:46:44+00:00

I’m looking for a way to split a string containing HTML in to two

  • 0

I’m looking for a way to split a string containing HTML in to two halves. Requirements:

  • Split a string by a number of chars
  • Must not split in the middle of a word
  • Must not include HTML chars when calculating where to split the string

For example take the following string:

<p>This is a test string that contains <strong>HTML</strong> tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>

Say I want to split at char position 39, middle of word HTML (not counting html), I would want the function to split the string in to the following two parts:

<p>This is a test string that contains <strong>HTML</strong></p>

and

<p>tags and text content. This string needs to be split without slicing through the <em>middle</em> of a word and must preserve the validity of the HTML, i.e. not split in the middle of a tag, and make sure closing tags are respected correctly.</p>

Notice in the above two example results I would require the the HTML validity be respected, so the closing </strong> and </p> tags were added. Also a starting <p> tag was added to second half as one it closed at the end of the string.

I found this function on StackOverflow to truncate a string by a number of text chars and preserve HTML, but it only goes halfway to want I need, as I need to split in to two halves.

function printTruncated($maxLength, $html)
{
    $printedLength = 0;
    $position = 0;
    $tags = array();

    while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
    {
        list($tag, $tagPosition) = $match[0];

        // Print text leading up to the tag.
        $str = substr($html, $position, $tagPosition - $position);
        if ($printedLength + strlen($str) > $maxLength)
        {
            print(substr($str, 0, $maxLength - $printedLength));
            $printedLength = $maxLength;
            break;
        }

        print($str);
        $printedLength += strlen($str);

        if ($tag[0] == '&')
        {
            // Handle the entity.
            print($tag);
            $printedLength++;
        }
        else
        {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/')
            {
                // This is a closing tag.

                $openingTag = array_pop($tags);
                assert($openingTag == $tagName); // check that tags are properly nested.

                print($tag);
            }
            else if ($tag[strlen($tag) - 2] == '/')
            {
                // Self-closing tag.
                print($tag);
            }
            else
            {
                // Opening tag.
                print($tag);
                $tags[] = $tagName;
            }
        }

        // Continue after the tag.
        $position = $tagPosition + strlen($tag);
    }

    // Print any remaining text.
    if ($printedLength < $maxLength && $position < strlen($html))
        print(substr($html, $position, $maxLength - $printedLength));

    // Close any open tags.
    while (!empty($tags))
        printf('</%s>', array_pop($tags));
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T06:46:44+00:00Added an answer on May 18, 2026 at 6:46 am

    The general rule you’ll be quoted by almost all other answers is “do not process HTML with regex – you can’t capture all the edge cases”

    I believe this to be quite true

    Anything even slightly malformed in your string, and even the best-crafted regular expression will still mess it up

    Ignoring that you want to split some tags and not others (p-tags are tags, after all, and you’re looking to split one into two), you may need to rethink the process, and get very specific about what you’re wanting to achieve e.g. is splitting in the middle of a paragraph tag okay? What about divs? If middle point is inside a tag, do you want the first string to be longer, or the second?

    Assuming that splitting paragraph tags is okay, but others aren’t, try an approach as follows: (no copy-paste code here, sorry)
    * Strip the target string twice – once of all tags, and once of just paragraph tags
    * Find the middle point in the no-tags-at-all string
    * Split the no-tags-at-all string at first space after middle point
    * Find the spot in the just-p-tags-stripped string that matches the word/words just after the middle point in previous step – this should tell you where in the just-p-tags-stripped string is ‘the middle’ when tags are ignored
    * Check to see if you’re inside a tag.

    .. actually, just as I got to this point I realised that 90% of what I wrote is pretty darned obvious, and that the last dot-point is precisely where the problem is

    I’m going to leave my half-finished rant here at a warning to others, and to myself..

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.