I’m new to concurrent programming in java.
I need to read, analyze and process an extremely fast growing logfile, so I got to be
fast.
My idea was to read the file (line by line) and upon matching a relevant line I want to
pass those lines to separate threads that can do further processing on the line.
I called these threads “IOThread” in the following example code.
My problem is that the BufferedReader readline in IOthread.run() apparently never returns.
What is a working way to read the Stream inside the thread?
Are there any better approaches than the one below?
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
class IOThread extends Thread {
private InputStream is;
private int t;
public IOThread(InputStream is, int t) {
this.is = is;
this.t = t;
System.out.println("iothread<" + t + ">.init");
}
public void run() {
try {
System.out.println("iothread<" + t + ">.run");
String line;
BufferedReader streamReader = new BufferedReader(new InputStreamReader(is));
while ((line = streamReader.readLine()) != null) {
System.out.println("iothread<" + t + "> got line " + line);
}
System.out.println("iothread " + t + " end run");
} catch (Exception e) {
e.printStackTrace();
}
}
}
public class Stm {
public Stm(String filePath) {
System.out.println("start");
try {
BufferedReader reader = new BufferedReader(new FileReader(filePath));
PipedOutputStream po1 = new PipedOutputStream();
PipedOutputStream po2 = new PipedOutputStream();
PipedInputStream pi1 = new PipedInputStream(po1);
PipedInputStream pi2 = new PipedInputStream(po2);
IOThread it1 = new IOThread(pi1,1);
IOThread it2 = new IOThread(pi2,2);
it1.start();
it2.start();
// it1.join();
// it2.join();
String line;
while ((line = reader.readLine()) != null) {
System.out.println("got line " + line);
if (line.contains("aaa")) {
System.out.println("passing to thread 1: " + line);
po1.write(line.getBytes());
} else if (line.contains("bbb")) {
System.out.println("passing to thread 2: " + line);
po2.write(line.getBytes());
}
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
new Stm(args[0]);
}
}
An example input file would be:
line 1
line 2
line 3 aaa ...
line 4
line 5 bbb ...
line 6 aaa ...
line 7
line 8 bbb ...
line 9 bbb ...
line 10
Call the above code with the filename of the input file as argument.
IMHO you have got it backwards. Create multiple threads for “processing” stuff and not for reading data from the file. When reading data from file, you are anyways bottlenecked so having multiple threads won’t make any difference. The simplest solution is to read lines as fast as you can in a given thread and store the lines in a shared queue. This queue can then be accessed by any number of threads to do the relevant processing.
This way, you can actually do concurrent processing stuff while the I/O or reader thread is busy reading/waiting for the data. If possible, keep the “logic” to a minimum in the reader thread. Just read those lines and let the worker threads do the real heavy lifting stuff (matching pattern, further processing etc.). Just go with a thread safe queue and you should be kosher.
EDIT: Use some variant of the
BlockingQueue, either array based or linked list based.