Im using lxml.html.cleaner to clean html from an input text. how can i change

Question

0

Editorial Team

Asked: May 17, 20262026-05-17T17:25:11+00:00 2026-05-17T17:25:11+00:00

Im using lxml.html.cleaner to clean html from an input text. how can i change

0

Im using lxml.html.cleaner to clean html from an input text. how can i change \n to <br /> in lxml.html?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T17:25:12+00:00

Fairly easy, slightly hacky way: You could do this as part of a two step process, assuming you have used lxml.html.parse or whichever method to build DOM.

iterate through the text and tail attributes of the nodes with string replacements. Look at the iterdescendants method, which walks through everything for you.
lxml.html.clean as per normal

A more complex way would be to monkey patch the lxml.html.clean module. Unlike lots of lxml, this module is written in Python and is fairly accessible. For example, there is currently a _substitute_whitespace function.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Im using lxml.html.cleaner to clean html from an input text. how can i change

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply