Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6768615
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T15:05:44+00:00 2026-05-26T15:05:44+00:00

I have a string that represents an html document. I’m trying to replace text

  • 0

I have a string that represents an html document. I’m trying to replace text in that document, excluding the markup and attribute values ofcourse with some replacement html. I thought it would be simple, but it is incredibly tedious when you want to replace the text with markup. For example, to replace somekeyword with <a href = "link">somekeyword</a>.

from lxml.html import fragments_fromstring, fromstring, tostring
from re import compile
def markup_aware_sub(pattern, repl, text):
    exp = compile(pattern)
    root = fromstring(text)

    els = [el for el in root.getiterator() if el.text]
    els = [el for el in els if el.text.strip()]
    for el in els:
        text = exp.sub(repl, el.text)
        if text == el.text:
            continue
        parent = el.getparent()
        new_el = fromstring(text)
        new_el.tag = el.tag
        for k, v in el.attrib.items():
            new_el.attrib[k] = v
        parent.replace(el, new_el)
    return tostring(root)

markup_aware_sub('keyword', '<a>blah</a>', '<div><p>Text with keyword here</p></div>')

It works but only if the keyword is exactly two “nestings” down. There has to be a better way to do it than the above, but after googling for many hours I can’t find anything.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T15:05:45+00:00Added an answer on May 26, 2026 at 3:05 pm

    This might be the solution you are lookin for:

    from HTMLParser import HTMLParser
    
    class MyParser(HTMLParser):
        def __init__(self,link, keyword):
        HTMLParser.__init__(self)
        self.__html = []
        self.link = link
        self.keyword = keyword
    
        def handle_data(self, data):
        text = data.strip()
        self.__html.append(text.replace(self.keyword,'<a href="'+self.link+'>'+self.keyword+'</a>'))
    
        def handle_starttag(self, tag, attrs):
        self.__html.append("<"+tag+">")
    
        def handle_endtag(self, tag):
        self.__html.append("</"+tag+">")
    
        def new_html(self):
        return ''.join(self.__html).strip()
    
    
    parser = MyParser("blah","keyword")
    parser.feed("<div><p>Text with keyword here</p></div>")
    parser.close()
    print parser.new_html()
    

    This will give you the following output

    <div><p>Text with <a href="blah>keyword</a> here</p></div>
    

    The problem with your lxml approach only seems to occur when the keywords has only a single nesting. It seems to work fine with multiple nestings. So I added an if condition to catch this exception.

    from lxml.html import fragments_fromstring, fromstring, tostring
    from re import compile
    def markup_aware_sub(pattern, repl, text):
        exp = compile(pattern)
        root = fromstring(text)
        els = [el for el in root.getiterator() if el.text]
        els = [el for el in els if el.text.strip()]
    
        if len(els) == 1:
          el = els[0]
          text = exp.sub(repl, el.text)
          parent = el.getparent()
          new_el = fromstring(text)
          new_el.tag = el.tag
          for k, v in el.attrib.items():
              new_el.attrib[k] = v
          return tostring(new_el)
    
        for el in els:
          text = exp.sub(repl, el.text)
          if text == el.text:
            continue
          parent = el.getparent()
          new_el = fromstring(text)
          new_el.tag = el.tag
          for k, v in el.attrib.items():
              new_el.attrib[k] = v
          parent.replace(el, new_el)
        return tostring(root)
    
    print markup_aware_sub('keyword', '<a>blah</a>', '<p>Text with keyword here</p>')
    

    Not very elegant, but seems to work. Please check it out.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have string , that represents HTML and I need to replace in it
I have a string that represents a paragraph of text. var paragraph = It
I have a string that represents an HTML block. All the unicode characters are
I have a string that represents a non indented XML that I would like
I have a string variable that represents the name of a custom class. Example:
I have a string in my database that represents an image. It looks like
I have a data structure that represents C# code like this: class Namespace: string
I have a string that represents a list of tags separated by space. It
I have this string that represents the body of a page, which I would
I have a string that represents a path to a directory. I want split

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.