I have a program to pull the source code of a webpage and save it to a .txt file. It works if done with just one at a time, but when I go through a loop of say 100 pages all of a sudden each page source starts to get cut off between 1/4 and 3/4 of the way through (seems to be arbitrary). Any ideas on why or how I would go about solving this?
Initial thoughts where that the loop is going too fast for the java (I am running this java from a php script) but then thought that it technically shouldn’t be going to the next item until the current condition was finished anyway.
Here is the code I’m using:
import java.io.*;
import java.net.URL;
public class selectout {
public static BufferedReader read(String url) throws Exception{
return new BufferedReader(
new InputStreamReader(
new URL(url).openStream()));}
public static void main (String[] args) throws Exception{
BufferedReader reader = read(args[0]);
String line = reader.readLine();
String thenum = args[1];
FileWriter fstream = new FileWriter(thenum+".txt");
BufferedWriter out = new BufferedWriter(fstream);
while (line != null) {
out.write(line);
out.newLine();
//System.out.println(line);
line = reader.readLine(); }}
}
The PHP is a basic mysql_query while(fetch_assoc) grab the url from the database, then run system("java -jar crawl.jar $url $filename");
Then, it fopen and fread the new file, and finally saves the source to database (after escaping_strings and such).
You need to close your output streams after you finish writing each file. After your while loop, call out.close(); and fstream.close();