Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 771859
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T18:43:11+00:00 2026-05-14T18:43:11+00:00

Does anybody have an example of spliting a html string (coming from a tiny

  • 0

Does anybody have an example of spliting a html string (coming from a tiny mce editor) and splitting it into N parts using C#?

I need to split the string evenly without splitting words.

I was thinking of just splitting the html and using the HtmlAgilityPack to try and fix the broken tags. Though I’m not sure how to find the split point, as Ideally it should be based purley on the text rather than the html aswell.

Anybody got any ideas on how to go about this?

UPDATE

As requested, here is an example of input and desired output.

INPUT:

<p><strong>Lorem ipsum dolor sit amet, <em>consectetur adipiscing</em></strong> elit.</p>

OUTPUT (When split into 3 cols):

Part1: <p><strong>Lorem ipsum dolor</strong></p>
Part2: <p><strong>sit amet, <em>consectetur</em></strong></p>
Part3: <p><strong><em>adipiscing</em></strong> elit.</p>

UPDATE 2:

I’ve just had a play with Tidy HTML and that seems to work well at fixing broken tags, so this may be good option if I can find a way to locate the split pints?

UPDATE 3

Using a method similar to this Truncate string on whole words in .NET C#, I’ve now managed to get a list of plain text words that will make up each part. So, say using Tidy HTML I have a valid XML structure for the html, and given this list of words, anybody got an idea on what would now be the best way to split it?

UPDATE 4

Can anybody see an issue with using a regex to find the indices with the HTML in the followin way:

Given the plain text string “sit amet, consectetur”, replace all spaces with the regex “(\s|<(.|\n)+?>)*”, in theory finding that string with any combination of spaces and/or tags

I could then just use Tidy HTML to fix the broken html tags?

Many thanks

Matt

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T18:43:12+00:00Added an answer on May 14, 2026 at 6:43 pm

    A Proposed Solution

    Man, this is a curse of mine! I apparently cannot walk away from a problem without spending up-to-and-including an unreasonable amount of time on it.

    I thought about this. I thought about HTML Tidy, and maybe it would work, but I had trouble wrapping my head around it.

    So, I wrote my own solution.

    I tested this on your input and on some other input that I threw together myself. It seems to work pretty well. Surely there are holes in it, but it might provide you with a starting point.

    Anyway, my approach was this:

    1. Encapsulate the notion of a single word in an HTML document using a class that includes information about that word’s position in the HTML document hierarchy, up to a given “top”. This I have implemented in the HtmlWord class below.
    2. Create a class that is capable of writing a single line composed of these HTML words above, such that start-element and end-element tags are added in the appropriate places. This I have implemented in the HtmlLine class below.
    3. Write a few extension methods to make these classes immediately and intuitively accessible straight from an HtmlAgilityPack.HtmlNode object. These I have implemented in the HtmlHelper class below.

    Am I crazy for doing all this? Probably, yes. But, you know, if you can’t figure out any other way, you can give this a try.

    Here’s how it works with your sample input:

    var document = new HtmlDocument();
    document.LoadHtml("<p><strong>Lorem ipsum dolor sit amet, <em>consectetur adipiscing</em></strong> elit.</p>");
    
    var nodeToSplit = document.DocumentNode.SelectSingleNode("p");
    var lines = nodeToSplit.SplitIntoLines(3);
    
    foreach (var line in lines)
        Console.WriteLine(line.ToString());
    

    Output:

    <p><strong>Lorem ipsum dolor </strong></p>
    <p><strong>sit amet, <em>consectetur </em></strong></p>
    <p><strong><em>adipiscing </em></strong>elit. </p>
    

    And now for the code:

    HtmlWord class

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    using HtmlAgilityPack;
    
    public class HtmlWord {
        public string Text { get; private set; }
        public HtmlNode[] NodeStack { get; private set; }
    
        // convenience property to display list of ancestors cleanly
        // (for ease of debugging)
        public string NodeList {
            get { return string.Join(", ", NodeStack.Select(n => n.Name).ToArray()); }
        }
    
        internal HtmlWord(string text, HtmlNode node, HtmlNode top) {
            Text = text;
            NodeStack = GetNodeStack(node, top);
        }
    
        private static HtmlNode[] GetNodeStack(HtmlNode node, HtmlNode top) {
            var nodes = new Stack<HtmlNode>();
    
            while (node != null && !node.Equals(top)) {
                nodes.Push(node);
                node = node.ParentNode;
            };
    
            return nodes.ToArray();
        }
    }
    

    HtmlLine class

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Xml;
    
    using HtmlAgilityPack;
    
    [Flags()]
    public enum NodeChange {
        None = 0,
        Dropped = 1,
        Added = 2
    }
    
    public class HtmlLine {
        private List<HtmlWord> _words;
        public IList<HtmlWord> Words {
            get { return _words.AsReadOnly(); }
        }
    
        public int WordCount {
            get { return _words.Count; }
        }
    
        public HtmlLine(IEnumerable<HtmlWord> words) {
            _words = new List<HtmlWord>(words);
        }
    
        private static NodeChange CompareNodeStacks(HtmlWord x, HtmlWord y, out HtmlNode[] droppedNodes, out HtmlNode[] addedNodes) {
            var droppedList = new List<HtmlNode>();
            var addedList = new List<HtmlNode>();
    
            // traverse x's NodeStack backwards to see which nodes
            // do not include y (and are therefore "finished")
            foreach (var node in x.NodeStack.Reverse()) {
                if (!Array.Exists(y.NodeStack, n => n.Equals(node)))
                    droppedList.Add(node);
            }
    
            // traverse y's NodeStack forwards to see which nodes
            // do not include x (and are therefore "new")
            foreach (var node in y.NodeStack) {
                if (!Array.Exists(x.NodeStack, n => n.Equals(node)))
                    addedList.Add(node);
            }
    
            droppedNodes = droppedList.ToArray();
            addedNodes = addedList.ToArray();
    
            NodeChange change = NodeChange.None;
            if (droppedNodes.Length > 0)
                change &= NodeChange.Dropped;
            if (addedNodes.Length > 0)
                change &= NodeChange.Added;
    
            // could maybe use this in some later revision?
            // not worth the effort right now...
            return change;
        }
    
        public override string ToString() {
            if (WordCount < 1)
                return string.Empty;
    
            var lineBuilder = new StringBuilder();
    
            using (var lineWriter = new StringWriter(lineBuilder))
            using (var xmlWriter = new XmlTextWriter(lineWriter)) {
                var firstWord = _words[0];
                foreach (var node in firstWord.NodeStack) {
                    xmlWriter.WriteStartElement(node.Name);
                    foreach (var attr in node.Attributes)
                        xmlWriter.WriteAttributeString(attr.Name, attr.Value);
                }
                xmlWriter.WriteString(firstWord.Text + " ");
    
                for (int i = 1; i < WordCount; ++i) {
                    var previousWord = _words[i - 1];
                    var word = _words[i];
    
                    HtmlNode[] droppedNodes;
                    HtmlNode[] addedNodes;
    
                    CompareNodeStacks(
                        previousWord,
                        word,
                        out droppedNodes,
                        out addedNodes
                    );
    
                    foreach (var dropped in droppedNodes)
                        xmlWriter.WriteEndElement();
                    foreach (var added in addedNodes) {
                        xmlWriter.WriteStartElement(added.Name);
                        foreach (var attr in added.Attributes)
                            xmlWriter.WriteAttributeString(attr.Name, attr.Value);
                    }
    
                    xmlWriter.WriteString(word.Text + " ");
    
                    if (i == _words.Count - 1) {
                        foreach (var node in word.NodeStack)
                            xmlWriter.WriteEndElement();
                    }
                }
            }
    
            return lineBuilder.ToString();
        }
    }
    

    HtmlHelper static class

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    using HtmlAgilityPack;
    
    public static class HtmlHelper {
        public static IList<HtmlLine> SplitIntoLines(this HtmlNode node, int wordsPerLine) {
            var lines = new List<HtmlLine>();
    
            var words = node.GetWords(node.ParentNode);
    
            for (int i = 0; i < words.Count; i += wordsPerLine) {
                lines.Add(new HtmlLine(words.Skip(i).Take(wordsPerLine)));
            }
    
            return lines.AsReadOnly();
        }
    
        public static IList<HtmlWord> GetWords(this HtmlNode node, HtmlNode top) {
            var words = new List<HtmlWord>();
    
            if (node.HasChildNodes) {
                foreach (var child in node.ChildNodes)
                    words.AddRange(child.GetWords(top));
            } else {
                var textNode = node as HtmlTextNode;
                if (textNode != null && !string.IsNullOrEmpty(textNode.Text)) {
                    string[] singleWords = textNode.Text.Split(
                        new string[] {" "},
                        StringSplitOptions.RemoveEmptyEntries
                    );
                    words.AddRange(
                        singleWords
                            .Select(w => new HtmlWord(w, node.ParentNode, top)
                        )
                    );
                }
            }
    
            return words.AsReadOnly();
        }
    }
    

    Conclusion

    Just to reiterate: this is a thrown-together solution; I’m sure it has problems. I present it only as a starting point for you to consider — again, if you’re unable to get the behavior you want through other means.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 424k
  • Answers 424k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer you can bind a click event to your span than… May 15, 2026 at 12:03 pm
  • Editorial Team
    Editorial Team added an answer Well, I'd just use rsync. Any make script you will… May 15, 2026 at 12:03 pm
  • Editorial Team
    Editorial Team added an answer I'm astonished the code you posted ever works -- or… May 15, 2026 at 12:03 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.