I am working on a large scale dataset and after building a model, I

Question

0

Asked: May 21, 20262026-05-21T02:14:06+00:00 2026-05-21T02:14:06+00:00

I am working on a large scale dataset and after building a model, I

0

I am working on a large scale dataset and after building a model, I use multithreading (whole project in Java) as follows:

OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));

int i=0;
Collection<Track1Callable> callables = new ArrayList<Track1Callable>();

// For each entry in the test file, do watever needs to be done.
// Track1Callable actually processes that entry and returns a double value.
for (Pair<PreferenceArray, long[]> tests : new DataFileIterable(
        KDDCupDataModel.getTestFile(dataFileDirectory))) {
    PreferenceArray userTest = tests.getFirst();
    callables.add(new Track1Callable(recommender, userTest));
    i++;
}

ExecutorService executor = Executors.newFixedThreadPool(cores); //24 cores
List<Future<byte[]>> results = executor.invokeAll(callables);
executor.shutdown();

for (Future<byte[]> result : results) {
    for (byte estimate : result.get()) {
        out.write(estimate);
    }
}
out.flush();
out.close();

When I receive the result from each callable, output it to a file. Does this output in the exact order as the list of initial Callables was made? In spite of some completing before others? Seems it should but not sure.

Also, I expect a total of 6.2 million bytes to be written to the outfile. But I get an additional 2000 bytes (Yeah for free). That messes up my submission and I think it is because of some concurrency issues. I tested this on small dataset and it seems to work fine there (264 bytes expected and received).

Anyhing wrong I am doing with the Executor framework or Futures?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T02:14:06+00:00

Q: Does the order is the same as the one specified for the tasks? Yes.

From the API:

Returns: A list of Futures
representing the tasks, in the same
sequential order as produced by the
iterator for the given task list. If
the operation did not time out, each
task will have completed. If it did
time out, some of these tasks will not
have completed.

As for the “extra” bytes: have you tried doing all of this in sequential order (i.e., without using an executor) and checking if you obtain different results? It seems that your problem is outside the code provided (and probably is not due to concurrency).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a large scale dataset and after building a model, I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply