I have some java code that will print out html from a website of my choosing. I would like it to only print out specific dates in HTML code that looks like this:
<tr class="bgWhite">
<td align="center" width="50"><nobr>GD </nobr></td>
<td align="center">Q3 2012</td>
<td align="left" width="*">Q3 2012 General Dynamics Earnings Release</td>
<td align="center">$ 1.83 </td>
<td align="center">n/a </td>
<td align="center">$ 1.83 </td>
<td align="center"><nobr>24-Oct-12</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>GD </nobr></td>
<td align="center">Q2 2012</td>
<td align="left" width="*">Q2 2012 General Dynamics Earnings Release</td>
<td align="center">$ 1.75 </td>
<td align="center">n/a </td>
<td align="center">$ 1.79 </td>
<td align="center"><nobr>25-Jul-12 BMO</nobr></td>
</tr>
So I only want it to print out:
24-Oct-12
25-Jul-12
How do I do that?
Here is the code that I have:
String nextLine;
URL url = null;
URLConnection urlConn = null;
InputStreamReader inStream = null;
BufferedReader buff = null;
try{
// Create the URL obect that points
// at the default file index.html
url = new URL("http://www.earnings.com/company.asp?client=cb&ticker=gd");
urlConn = url.openConnection();
inStream = new InputStreamReader(
urlConn.getInputStream());
buff= new BufferedReader(inStream);
// Read and print the lines from index.html
while (true){
nextLine =buff.readLine();
if (nextLine !=null){
System.out.println(nextLine);
}
else{
break;
}
}
} catch(MalformedURLException e){
System.out.println("Please check the URL:" +
e.toString() );
} catch(IOException e1){
System.out.println("Can't read from the Internet: "+
e1.toString() );
}
It’s easier to use a fullworthy HTML parser for the job than low level
java.net.URLConnection. However, since the targeted website generates absolutely non-semantic HTML (one and all tables without any semantic identifiers/classes, like as how the average 90’s website looked (yuck)), it’s even for a decent HTML parser tricky to parse it properly. But anyway, here’s a complete kickoff example using Jsoup which prints exactly the information you need:That’s all. No need to hassle with low level
java.net.URLConnectionor a verbose SAX parser.See also: