I dont mean to be a bother and I know this has been asked a thousand times before but i’m just not understanding the concept. I was wondering if somebody could walk me through it, Here is what i’m trying to do:
I have a set of information inside an html file. The file is uploaded to the server and i need to parse information out of the file inside of set parameters (demo code to follow). I have been reading on parsing for over a week and understand some of it but just not grasping the concept, i guess i just need somebody to do one on this demo for me to understand and if you could, break down the search variables please. Here’s the demo:
<hr>
<a id="Operating_System"></a>
<table WIDTH="100%" BORDER="0" CELLSPACING="0" ALIGN="CENTER">
<CAPTION ALIGN="TOP"><FONT size="5">Operating System</FONT></CAPTION>
<tr><td><a href="#TOC">Top</a></td></tr>
<TR ALIGN="LEFT" BGCOLOR="#00FF00">
<TH>Property</TH>
<TH>Value</TH>
</TR>
<TR BGCOLOR="#F0F0F0">
<TD>Name</TD>
<TD>Windows 7 Professional x64 Service Pack 1</TD>
</TR>
<TR>
<TD>Features</TD>
<TD>Terminal Services in Remote Admin Mode, 64 Bit Edition, Media Center Edition, Multiprocessor Free</TD>
</TR>
<TR BGCOLOR="#F0F0F0">
<TD>Up Time</TD>
<TD>5 Days 22 Hours 4 Minutes 26 seconds</TD>
</TR>
<!-- Operating System Duration: 1.853 seconds -->
</table>
<hr>
<a id="Installed_Updates"></a>
<table WIDTH="100%" BORDER="0" CELLSPACING="0" ALIGN="CENTER">
<CAPTION ALIGN="TOP"><FONT size="5">Installed Updates</FONT></CAPTION>
and here is what i’m trying to accomplish. On this demo, i would need the information parsed but only certain information to come back. there is a lot more information here but only need about 30 things total on each document. first i need to search from Operating_System to Installed_Updates, this will give me the first set area i need to gather information (there is other groups too so i’ll make one for each group of information). The i need to make the search more specific such as from <TR> to </TR> which will give me the actual information set i need. After that just grap the first ‘name’ and ‘value’ to store in a database.
Again, i know it’s out there but i’m just not getting the whole concept of simple expressions. After i do it a few times on an actual document, i’ll get the hang of it i think.
Thank you all so much for the help, i really appreciate it.
This only works for fixed HTML with little variations. But if you just want a simple example, here is one:
See also https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world for some tools. And http://regular-expressions.info/ to learn the syntax.
But as said, if you want to extract a lot of values, there are easier options.