I am trying to read a website using the java.net package classes. The site has content, and i see it manually in html source utilities in the browser. When I get its response code and try to view the site using java, it connects successfully but interprets the site as one without content(204 code). What is going on and is it possible to get around this to view the html automatically.
thanks for your responses:
Do you need the URL?
here is the code:
URL hef=new URL(the website);
BufferedReader kj=null;
int kjkj=((HttpURLConnection)hef.openConnection()).getResponseCode();
System.out.println(kjkj);
String j=((HttpURLConnection)hef.openConnection()).getResponseMessage();
System.out.println(j);
URLConnection g=hef.openConnection();
g.connect();
try{
kj=new BufferedReader(new InputStreamReader(g.getInputStream()));
while(kj.readLine()!=null)
{
String y=kj.readLine();
System.out.println(y);
}
}
finally
{
if(kj!=null)
{
kj.close();
}
}
}
Suggestions:
I’m not sure that the java.net package is robot-aware, but that could be a factor as well (can you check if the underlying site has robot.txt files).Edit:
assuming you are using the java.net package’s HttpURLConnection class, the “robot” hypothesis doesn’t apply.
On the other hand you’ll probably want to use the connection’s setRequestProperty() method to prepare the desired HTTP header for the request (so they match these from the web browser client)
Maybe you can post the relevant portions of your code.