I’m having a little problem in Java. How to do this: I want to search in a HTML file for the tags href and src, and then I want to get the URL associated with that tags.
What is the best way to do it?
Thanks for the help. Best regards.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This is the code I used to accomplish exactly what you’d like to do, but first let me give you a few tips.
If you’re in a Java Swing environment, make sure to use the methods in the javax.swing.text.html and javax.swing.text.html.parser packages. Unfortunately, they’re mostly intended for use on a JEditorPane, but I’d still strongly recommend that you take a look at these.
There’s a class in the Java 6 API called HTML.Tag that identifies the HTML start and end tags, which you can then use in order to determine where the links are that you’d like your program to follow.http://java.sun.com/javase/6/docs/api/javax/swing/text/html/HTML.Tag.html
When I wrote a program very similar to this, I used 3 main methods:
If you need more help on how to write these methods, you can message me, but basically, you are looking for an initial tag and an end tag and then from that you will have identified the url and then you can proceed to the next step, which is following the url.
To follow the url, I advise you to use the JEditorPane object. The javax.swing.event.HyperlinkListener interface defines only one method, hyperlinkUpdate(HyperlinkEvent e), which you can pass the url into and then call .setPage(evt.getURL()) on your JEditorPane object. This will then update the pane with the new page and allow you to start the process again.
Msg me if you have any probs and please vote this answer!