Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 665315
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T23:41:41+00:00 2026-05-13T23:41:41+00:00

I think I read every single web page relating to this problem but I

  • 0

I think I read every single web page relating to this problem but I still cannot find a solution to it, so here I am.

I have an HTML web page which is not under my control and I need to parse it from my iPhone application. Here is a sample of the web page I’m talking about:

<HTML>
  <HEAD>
    <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
  </HEAD>
  <BODY>
    <LI class="bye bye" rel="hello 1">
      <H5 class="onlytext">
        <A name="morning_part">morning</A>
      </H5>
      <DIV class="mydiv">
        <SPAN class="myclass">something about you</SPAN> 
        <SPAN class="anotherclass">
          <A href="http://www.google.it">Bye Bye &egrave; un saluto</A>
        </SPAN>
      </DIV>
    </LI>
  </BODY>
</HTML>

I’m using NSXMLParser and it is going well till it find the è html entity. It calls foundCharacters: for "Bye Bye" and then it calls resolveExternalEntityName:systemID:: with an entityName of "egrave".
In this method i’m just returning the character "è" trasformed in an NSData, the foundCharacters is called again adding the string "è" to the previous one "Bye Bye " and then the parser raise the NSXMLParserUndeclaredEntityError error.

I have no DTD and I cannot change the html file I’m parsing. Do you have any ideas on this problem?

Update (12/03/2010). After the suggestion of Griffo I ended up with something like this:

data = [self replaceHtmlEntities:data];
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
[parser setDelegate:self];
[parser parse];

where replaceHtmlEntities:(NSData *) is something like this:

- (NSData *)replaceHtmlEntities:(NSData *)data {
    
    NSString *htmlCode = [[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding];
    NSMutableString *temp = [NSMutableString stringWithString:htmlCode];
    
    [temp replaceOccurrencesOfString:@"&amp;" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [temp length])];
    [temp replaceOccurrencesOfString:@"&nbsp;" withString:@" " options:NSLiteralSearch range:NSMakeRange(0, [temp length])];
    ...
    [temp replaceOccurrencesOfString:@"&Agrave;" withString:@"À" options:NSLiteralSearch range:NSMakeRange(0, [temp length])];

    NSData *finalData = [temp dataUsingEncoding:NSISOLatin1StringEncoding];
    return finalData;
    
}

But I am still looking the best way to solve this problem. I will try TouchXml in the next days but I still think that there should be a way to do this using NSXMLParser API, so if you know how, feel free to write it here.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T23:41:41+00:00Added an answer on May 13, 2026 at 11:41 pm

    After exploring several alternatives, it appears that NSXMLParser will not support entities other than the standard entities &lt;, &gt;, &apos;, &quot; and &amp;

    The code below fails resulting in an NSXMLParserUndeclaredEntityError.

    
    // Create a dictionary to hold the entities and NSString equivalents
    // A complete list of entities and unicode values is described in the HTML DTD
    // which is available for download http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
    
    
    NSDictionary *entityMap = [NSDictionary dictionaryWithObjectsAndKeys: 
                         [NSString stringWithFormat:@"%C", 0x00E8], @"egrave",
                         [NSString stringWithFormat:@"%C", 0x00E0], @"agrave", 
                         ...
                         ,nil];
    
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];
    [parser setDelegate:self];
    [parser setShouldResolveExternalEntities:YES];
    [parser parse];
    
    // NSXMLParser delegate method
    - (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName:(NSString *)entityName systemID:(NSString *)systemID {
        return [[entityMap objectForKey:entityName] dataUsingEncoding: NSUTF8StringEncoding];
    }
    

    Attempts to declare the entities by prepending the HTML document with ENTITY declarations will pass, however the expanded entities are not passed back to parser:foundCharacters and the è and à characters are dropped.

    <?xml version="1.0" encoding="UTF-8" ?>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
    [
      <!ENTITY agrave "à">
      <!ENTITY egrave "è">
    ]>
    

    In another experiment, I created a completely valid xml document with an internal DTD

    <?xml version="1.0" standalone="yes" ?>
    <!DOCTYPE author [
        <!ELEMENT author (#PCDATA)>
        <!ENTITY js "Jo Smith">
    ]>
    <author>&lt; &js; &gt;</author>
    

    I implemented the parser:foundInternalEntityDeclarationWithName:value:; delegate method and it is clear that the parser is getting the entity data, however the parser:foundCharacters is only called for the pre-defined entities.

    2010-03-20 12:53:59.871 xmlParsing[1012:207] Parser Did Start Document
    2010-03-20 12:53:59.873 xmlParsing[1012:207] Parser foundElementDeclarationWithName: author model: 
    2010-03-20 12:53:59.873 xmlParsing[1012:207] Parser foundInternalEntityDeclarationWithName: js value: Jo Smith
    2010-03-20 12:53:59.874 xmlParsing[1012:207] didStartElement: author type: (null)
    2010-03-20 12:53:59.875 xmlParsing[1012:207] parser foundCharacters Before: 
    2010-03-20 12:53:59.875 xmlParsing[1012:207] parser foundCharacters After: <
    2010-03-20 12:53:59.876 xmlParsing[1012:207] parser foundCharacters Before: <
    2010-03-20 12:53:59.876 xmlParsing[1012:207] parser foundCharacters After: < 
    2010-03-20 12:53:59.877 xmlParsing[1012:207] parser foundCharacters Before: < 
    2010-03-20 12:53:59.878 xmlParsing[1012:207] parser foundCharacters After: <  
    2010-03-20 12:53:59.879 xmlParsing[1012:207] parser foundCharacters Before: <  
    2010-03-20 12:53:59.879 xmlParsing[1012:207] parser foundCharacters After: <  >
    2010-03-20 12:53:59.880 xmlParsing[1012:207] didEndElement: author with content: <  >
    2010-03-20 12:53:59.880 xmlParsing[1012:207] Parser Did End Document
    

    I found a link to a tutorial on Using the SAX Interface of LibXML. The xmlSAXHandler that is used by NSXMLParser allows for a getEntity callback to be defined. After calling getEntity, the expansion of the entity is passed to the characters callback.

    NSXMLParser is missing functionality here. What should happen is that the NSXMLParser or its delegate store the entity definitions and provide them to the xmlSAXHandler getEntity callback. This is clearly not happening. I will file a bug report.

    In the meantime, the earlier answer of performing a string replacement is perfectly acceptable if your documents are small. Check out the SAX tutorial mentioned above along with the XMLPerformance sample app from Apple to see if implementing the libxml parser on your own is worthwhile.

    This has been fun.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I think I might have read every usort article on StackOverflow, but I can't
I think I've read every StackOverflow post on XDomainRequest and another few dozen on
I've read other questions with similar titles and I think this is a different
I've read Visual VM remotely over ssh but I think I've not fully understood
I have read much information about agile and waterfall and I just cannot think
I read some website development materials on the Web and every time a person
I've read a lot of topics about redirecting Tumblr to WordPress, but I still
I have read the documentation on this, and still i have no idea how
I've read every response I could fine on SO before posting this question. Although
This post might be a bit long, but if you think you can help,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.