i am looking for a method to extract text from web page (initially html) using jdk or another library . please help
thanks
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Use a HTML parser if at all possible; there are many available for Java.
Or you can use regex like many people do. This is generally not advisable, however, unless you’re doing very simplistic processing.
Related questions
Text extraction:
Tag stripping: