I’m trying to parse some information from an HTML page. The only problem is that the information I need is not in a tag so it can’t be easily found. Here is an example of what I am talking about.
<span class="fieldlabeltext">Levels: </span>Undergraduate
<br>
<span class="fieldlabeltext">Attributes: </span>Online Course
<br>
<span class="fieldlabeltext">Instructors: </span>N/A
<br>
I need to extract “Online Course” from the example above, but not all of the “Attributes” are the same throughout the entire HTML file. So some could be maybe “Critical Thinking” or “Capstone”, and many more other titles. What would be the best way to go about extracting this data? I am using the PHP Simple HTML DOM Parser – http://simplehtmldom.sourceforge.net/
Marc B’s comment is right-on. SimpleHTMLDOM has the following functions that you can perform on elements to accomplish what you want.
element $e->parent()– Returns the parent of element.element $e->first_child()– Returns the first child of element, or null if not found.element $e->last_child()– Returns the last child of element, or null if not found.element $e->next_sibling()– Returns the next sibling of element, or null if not found.element $e->prev_sibling()– Returns the previous sibling of element, or null if not found.Source: http://simplehtmldom.sourceforge.net/manual.htm#section_traverse