Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7986809
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T11:57:34+00:00 2026-06-04T11:57:34+00:00

By default lxml doesn’t understsand the wbr tag, used to add word-breaks in long

  • 0

By default lxml doesn’t understsand the wbr tag, used to add word-breaks in long words. It formats it as <wbr></wbr> when it should be formatted simply as <wbr>, similar to the br tag.

How do I add this behavior to lxml?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T11:57:36+00:00Added an answer on June 4, 2026 at 11:57 am

    Actually it is not difficult to patch libxml2 (this walkthrough was done on Ubuntu 11.04 with Python 2.7.3)

    First define a test program wbr_test.py:

    from lxml import etree
    from cStringIO import StringIO
    
    wbr_html = """\
    <html>
      <head>
        <title>wbr test</title>
      </head>
    <body>
      Test for a breakable<wbr>word implemenation change
    </body>
    </html>
    """
    
    parser = etree.HTMLParser()
    tree   = etree.parse(StringIO(wbr_html), parser)
    
    result = etree.tostring(tree.getroot(),
                             pretty_print=True, method="html")
    if result.split() != wbr_html.split(): # split, as we are not interested in whitespace differences
        print(result)
        print("not ok")
    else:
        print("OK")
    

    Make sure that it fails by running python wbr_test.py. It should insert a <\wbr> before
    <\body>, and print not ok at the end.

    Download, extract and compile libxml2:

    wget ftp://xmlsoft.org/libxml2/libxml2-2.8.0.tar.gz
    tar xvf libxml2-2.8.0.tar.gz 
    cd libxml2-2.8.0/
    ./configure --prefix=/usr
    make -j8  # adjust number to match your number of cores
    

    Install, and install python libxml2 bindings:

    sudo make install
    cd to_python_bindings
    sudo python setup.py install
    

    Test your wbr_test.py once more, to make sure it fails with the latest libxml2 version.

    First make a copy of HTMLparser.c e.g. in /var/tmp.

    Now edit the the file HTMLparser.c at the toplevel of the libxml2 source. Search for the word forced (only one occurrence). You will be at the <br> tag definition. Copy the three lines starting with the line you just found. The most appropriate insert point is just before the end (after the definition of <var>). To get the final comma right in the table insert the three lines before the one with just '}' not the one with '};'.

    In the newly inserted code Replace br with wbr and change DECL clear_attrs to NULL (assuming that a new tag does not have deprecated attributes).

    The result should diff with the version in /var/tmp ( diff -u HTMLparser.c /var/tmp) as follows:

    @@ -1039,6 +1039,9 @@
     },
     { "var",   0, 0, 0, 0, 0, 0, 1, "instance of a variable or program argument",
    DECL html_inline, NULL, DECL html_attrs, NULL, NULL
    +},
    +{ "wbr",   0, 2, 2, 1, 0, 0, 1, "possible line break ",
    +   EMPTY , NULL , DECL core_attrs, NULL , NULL
     }
     };
    

    Make and install:

    make && sudo make install
    

    Test your wbr_test.py once more. Should show OK

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

'Default UITableViewCell styles from Apple should not be used' Can somebody tell me what
the default and most commonly used behaviour in jquery mobile when clicking on a
Default struct given: struct counter { long long counter; }; struct instruction { struct
Default JVM uses maximum 1.5 GB RAM/JVM Java application. But my Server have 8
The default markdown syntax for code blocks is indented by 4 spaces on each
By default, the ordered list looks like this: There are some spacing on the
By default pip installs editable packages into src subdirectory of the directory where Python
The default strongly-typed Edit page in ASP.NET MVC 3 generally exposes all fields for
The default behaviour in browsers is to select the next form element. I want
By default, the Requests python library writes log messages to the console, along the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.