I’m using Web-Harvest to scrap a website and generate xml file with data. I’m

Question

0

Asked: June 9, 20262026-06-09T10:16:33+00:00 2026-06-09T10:16:33+00:00

I’m using Web-Harvest to scrap a website and generate xml file with data. I’m

0

I’m using Web-Harvest to scrap a website and generate xml file with data.

I’m having ugly nodes like <name> </name>, using normalize-space() didn’t help so I opened the file in Hex view, and I found it corresponds to ‘c2a0’. I looked arround for a solution, but no one helped…

To sum up, what I want is to remove that weird space (using xquery or xpath1/2), so I can get an empty node <name/>

ps: the used encoding is ‘iso-8859-1’

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T10:16:35+00:00

Editorial Team

2026-06-09T10:16:35+00:00Added an answer on June 9, 2026 at 10:16 am

You can use translate to remove certain characters. And utf8 c2a0 is the character U+00A0, hexadecimal 0xA0 is 160, so you can use codepoints-to-string(160) to get a string with the space.

Together:

translate(your node text, codepoints-to-string(160), "")

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using Web-Harvest to scrap a website and generate xml file with data. I’m

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply