Possible Duplicate:
Using an NSXMLParser to parse HTML
I am trying to parse the following XML data, but the structure is messed up and it doesn’t have closing tags. This is not an XML file I have made, but one I am trying to parse from a webserver.
<FORM ACTION="/prod/bwckgens.p_proc_term_date" METHOD="POST" onSubmit="return checkSubmit()">
<INPUT TYPE="hidden" NAME="p_calling_proc" VALUE="bwckschd.p_disp_dyn_sched">
<TABLE CLASS="dataentrytable" summary="This layout table is used for term selection."width="100%"><CAPTION class="captiontext">Search by Term: </CAPTION>
<TR>
<TD CLASS="dedefault"><LABEL for=term_input_id><SPAN class="fieldlabeltextinvisible">Term</SPAN></LABEL>
<SELECT NAME="p_term" SIZE="1" ID="term_input_id">
<OPTION VALUE="">None
<OPTION VALUE="201320">Spring 2013
<OPTION VALUE="201315">STAR/BGR: New Admits Fall 2012 (View only)
<OPTION VALUE="201310">Fall 2012 (View only)
<OPTION VALUE="201230">Summer 2012 (View only)
<OPTION VALUE="201220">Spring 2012 (View only)
<OPTION VALUE="201210">Fall 2011 (View only)
<OPTION VALUE="201130">Summer 2011 (View only)
<OPTION VALUE="201120">Spring 2011 (View only)
<OPTION VALUE="201110">Fall 2010 (View only)
<OPTION VALUE="201030">Summer 2010 (View only)
<OPTION VALUE="201020">Spring 2010 (View only)
<OPTION VALUE="201010">Fall 2009 (View only)
<OPTION VALUE="200930">Summer 2009 (View only)
<OPTION VALUE="200920">Spring 2009 (View only)
<OPTION VALUE="200910">Fall 2008 (View only)
<OPTION VALUE="200830">Summer 2008 (View only)
<OPTION VALUE="200820">Spring 2008 (View only)
</SELECT>
</TD>
</TR>
</TABLE>
<BR>
<BR>
<INPUT TYPE="submit" VALUE="Submit">
<INPUT TYPE="reset" VALUE="Reset">
</FORM>
There is a lot more to the HTML File, but I am only including what is relevant. I want to get all of the numbers OPTION VALUE="these numbers" and the Term following the bracket. e.g. Spring 2013.
How do I use the NSXMLParser to get these values since there is no closing tag. I tried printing out all of the elements the parser encounters by
NSLog(@"Current start element: %@\n", elementName);
NSLog(@"Current attr:%@\n", attributeDict.description);
but I don’t see OPTION or VALUE anywhere. This is the result from the NSLog statements:
2012-10-28 13:58:47.638 Purdue Course Finder[32890:c07] Current start element: HTML
2012-10-28 13:58:47.638 Purdue Course Finder[32890:c07] Current attr:{
lang = en;
}
2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current start element: HEAD
2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current attr:{
}
2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current start element: META
2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current attr:{
content = "text/html; charset=UTF-8";
"http-equiv" = "Content-Type";
}
2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current start element: META
2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current attr:{
CONTENT = "no-cache";
"HTTP-EQUIV" = Pragma;
NAME = "Cache-Control";
}
2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current start element: META
2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current attr:{
CONTENT = "no-cache";
"HTTP-EQUIV" = "Cache-Control";
NAME = "Cache-Control";
}
2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current start element: LINK
2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current attr:{
HREF = "/css/web_defaultapp.css";
REL = stylesheet;
TYPE = "text/css";
}
2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current start element: LINK
2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current attr:{
HREF = "/css/web_defaultprint.css";
REL = stylesheet;
TYPE = "text/css";
media = print;
}
2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current start element: TITLE
2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current attr:{
}
2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current end element: TITLE
2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current start element: META
2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current attr:{
CONTENT = "text/javascript";
"HTTP-EQUIV" = "Content-Script-Type";
NAME = "Default_Script_Language";
}
2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current start element: SCRIPT
2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current attr:{
LANGUAGE = JavaScript;
TYPE = "text/javascript";
}
2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current end element: SCRIPT
2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current start element: SCRIPT
2012-10-28 13:58:47.646 Purdue Course Finder[32890:c07] Current attr:{
LANGUAGE = JavaScript;
TYPE = "text/javascript";
}
2012-10-28 13:58:47.646 Purdue Course Finder[32890:c07] Current end element: SCRIPT
I even tried printing everywhere in the - (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string method. It does not find these tags anywhere. I was wondering if someone could help me parse this poorly constructed XML file. Thanks!
Its looks to me like you’re not even getting to the main chunk of data you’re interested in, possibly because the HEAD is malformed as well (though I don’t know for sure since that part of the document isn’t included in the question).
I would suggest making subtle tweaks to the document after it is received until it processes correctly. You don’t have to fix every error, just the ones that prevent you from getting to your option data. Once you know the fixes required, do some automatic replace/regexing after the file is received and process normally.