I am parsing a table of a HTML page but when I display the

Question

0

Asked: May 27, 20262026-05-27T02:48:30+00:00 2026-05-27T02:48:30+00:00

I am parsing a table of a HTML page but when I display the

0

I am parsing a table of a HTML page but when I display the data, there are random characters added like in this example here:

Preowiveding but it should be Preding.

I dont know if that is a security feature to prevent people from parsing their data.
It is strange because sometimes the text is shown right and another text is shown wrong…

The page were I get the data from is this one here.
The HTML code of the table looks a bit strange:

<a target='_blank' href='#' class='draggableVerein' >&#76;<span style='display:none;'>&#105;<span style='display:none;'>&#115;&#105;&#118;&#98;&#97;</span><u></u>&#118;&#98;&#97;&#111;</span><u></u>&#105;&#101;&#98;&#101;&#110;&#97;&#117;</a>

Between the text there are span and u tags that seem to be doing nothing in the Browser but produce this errors when parsing.

I use Ben Reeves HTML Parser.
Example:

HTMLNode *node = [rowNode findChildWithAttribute:@"class" matchingName:@"rang" allowPartial:TRUE];
team.rang = [node allContents];

edit:

Now I tried libXML2 with HPPLE:

NSArray *elements  = [xpathParser searchWithXPathQuery:@"//table[2]/tr[5]/td/a"];
// Access the first cell
TFHppleElement *element = [elements objectAtIndex:0];

NSString *content = [element content];  
NSLog(@"content: %@",content);

Output is ersdorfinstead of Eggersdorf.
HTML of this example:

<a target='_blank' href='/datenservice/portal/verein/aktuelles.ds?vereinsNr=8070&sektionsId=485215725|665233118344931246&awVerband=ST_' class='draggableVerein' drag_img='/netzwerk/imagedownload/379402779304830775_383470150383145150-60-60-EfcSAtkX.jpg'>&#69;&#103;&#103;&#101;&#114;&#115;&#100;&#111;&#114;&#102;</a>

It is a really strange code.
Any tips?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T02:48:31+00:00

It looks like there are two things going on here.

It’s using HTML character entities to specify common characters (i.e. L instead of L). This may be an attempt at obfuscation.
It uses <span style='display:none'>…</span> to tell the browser not to display certain text. This may be an attempt to introduce invisible garbage into the text. The browser will not display it but an HTML parser will still spit out that text.

If you want to discard the garbage text your code will have to process <span> & </span> tags and automatically discard any text with a style set to display:none.

NB: The source for the page you linked to has a copyright statement (in German).
IANAL, but you may need a translator and a lawyer to make sure you are not violating their terms of service by scraping the page.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am parsing a table of a HTML page but when I display the

edit:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply