I use hxt to parse some html. It hase unescaped html inside <textarea> .

Question

0

Asked: June 12, 20262026-06-12T05:47:15+00:00 2026-06-12T05:47:15+00:00

I use hxt to parse some html. It hase unescaped html inside <textarea> .

0

I use hxt to parse some html. It hase unescaped html inside <textarea>. hxt gives invalid results (it stumbles upon a tag with content in this case it’s <a>). Minimal testcase (for GHCi) is

let doc = parseHtml "<textarea>before<a>link</a>after</textarea>"
runX . xshow $ doc //> hasName "textarea"

which gives [<textarea>before</textarea><textarea/>] as a result.

It looks like tags with no contents (e.g. <tag/>) do not break parsing.

Is there any way to parse such html with hxt?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T05:47:17+00:00

The problem is that HandsomeSoup (which I’m assuming is where your parseHTML is from) is picky about things like the fact that a textarea can’t contain an a in valid HTML, and will try to “fix” any such errors it sees.

Can you switch to hxt-tagsoup? It will still accept messy HTML (unclosed elements, etc.), but isn’t so fussy about adherence to the HTML schema—specifically it will let you have an a in a textarea:

import Text.XML.HXT.Core
import Text.XML.HXT.TagSoup

let content = "<textarea>before<a>link</a>after</textarea>"
let doc = readString [ withTagSoup ] content
runX . xshow $ doc //> hasName "textarea"

This prints the following:

["<textarea>before<a>link</a>after</textarea>"]

Which I think is what you want.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use hxt to parse some html. It hase unescaped html inside <textarea> .

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply