i am still new to the Java language and libraries… i often use this kind of pattern in Python, and wonder how i should implement this one with Java.
i need to read a huge file line by line, with some kind of xml marking (i am producing the input, so i am sure that there will not be any ambiguity)
i want to iterate inside some parts of the huge file like the python code below :
(using the yield / python iterator pattern… is there any equivallent in Java? i really do like the for item in my collection: yield something_about(many items))
what will be the best (java) way to implement this kind of behaviour ?
thx
first EDIT: BTW, i would have also be interested in the similar mapping between List and File that are available from a Python point of view when using file and [python list,] if of course possible with Java => answer : see Jeff Foster suggestion of using : Apache.IOUtils
def myAcc(instream, start, end):
acc = []
inside = False
for line in instream:
line = line.rstrip()
if line.startswith(start):
inside = True
if inside:
acc.append(line)
if line.startswith(end):
if acc:
yield acc
acc = []
inside = False
f = open("c:/test.acc.txt")
s = """<c>
<a>
this is a test
</a>
<b language="en" />
</c>
<c>
<a>
ceci est un test
</a>
<b language="fr" />
</c>
<c>
<a>
esta es una prueba
</a>
<b language="es" />
</c>"""
f = s.split("\n") # here mimic for a input file...
print "Reading block from <c> tag!"
for buf in myAcc(f, "<c>", "</c>"):
print buf # actually process this inner part... printing is for simplification
print "-" * 10
print "Reading block from <a> tag!"
for buf in myAcc(f, "<a>", "</a>"):
print buf # actually process this inner part...
print "-" * 10
OUTPUT :
Reading block from <c> tag!
['<c>', '<a>', 'this is a test', '</a>', '<b language="en" />', '</c>']
----------
['<c>', '<a>', 'ceci est un test', '</a>', '<b language="fr" />', '</c>']
----------
['<c>', '<a>', 'esta es una prueba', '</a>', '<b language="es" />', '</c>']
----------
Reading block from <a> tag!
['<a>', 'this is a test', '</a>']
----------
['<a>', 'ceci est un test', '</a>']
----------
['<a>', 'esta es una prueba', '</a>']
----------
so directly inspired by the answer of Jeff Foster below, here is a try to solve my trouble and do the same kind of thing than my python code :
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
interface WorkerThing {
public void doSomething(List<String> acc);
}
class ThatReadsLargeFiles {
public void readAHugeFile( BufferedReader input, String start, String end, WorkerThing action) throws IOException {
// TODO write some code to read through the file and store it in line
List<String> acc = new ArrayList<String> ();
String line;
Boolean inside = false;
while ((line = input.readLine()) != null) {
if (line.equals(start)) {
inside = true;
}
if (inside) {
acc.add(line);
}
if (line.equals(end)) {
if (acc != null && !acc.isEmpty()) { // well not sure if both are needed here...
// Here you are yielding control to something else
action.doSomething(acc);
//acc.clear(); // not sure how to empty/clear a list... maybe : List<String> acc = new ArrayList<String> (); is enough/faster?
acc = new ArrayList<String> (); // looks like this is the *right* way to go!
}
inside = false;
// ending
}
}
input.close();
}
}
public class YieldLikeTest {
public static void main(String[] args) throws IOException {
String path = "c:/test.acc.txt";
File myFile = new File(path);
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(myFile), "UTF8"));
//BufferedReader in = new BufferedReader(new FileReader(path, "utf8"));
new ThatReadsLargeFiles().readAHugeFile(in, "<a>", "</a>", new WorkerThing() {
public void doSomething(List<String> acc) {
System.out.println(acc.toString());
}
});
}
}
second EDIT: i was too fast accepting this answer, actually, i still miss and have a misunderstanding : i do not know how to get and keep trace of the content of accat the @ most upper level (not inside the anonymous class). So that it can be used from the calling for something else than printing say for example instantiate a class, and do other processing… Yield allow this kind of usage, i do not see how i can adapt the proposed answer to have this behaviour. Sorry, my Python usage/sample was to simple.
so here is the answer derived from the Jeff Foster explanation for memorizing the acc:
class betweenWorker implements WorkerThing {
private List<String> acc;
public void process(List<String> acc) {
this.acc = acc;
}
public List<String> getAcc() { return this.acc; }
}
Java doesn’t support something like the
yield, but you can achieve the same sort of thing by creating an interface that encapsulates the action that you’ll perform on the individual lines.When you use it you can use anonymous interface implementations to make things slightly more bearable.