I have this code :
public void GenerateWtW() {
ExecutorService exec = Executors.newFixedThreadPool(30);
ConcurrentHashMap<String, Double> tf_idfCache = new ConcurrentHashMap<String, Double>();
ArrayList<String> allwords = getAllWords();
int no_docs = getNumberOfDocs();
int cnt = 0;
for (int i = 0; i < allwords.size(); i++) {
String word1 = allwords.get(i);
if (i < allwords.size() - 1) {
for (int j = i + 1; j < allwords.size(); j++) {
String word2 = allwords.get(j);
cnt++;
if (word1.equals(word2)) {
continue;
}
//System.out.println("[" + cnt + "] WtW Started: " + word1 + "," + word2 + " No of Docs: " + no_docs + " Total No of words: " + allwords.size());
WTWThread t = new WTWThread(tf_idfCache, word1, word2, this, no_docs, db);
exec.execute(t);
}
}
}
exec.shutdown();
}
and here is the code for the thread:
private static class WTWThread implements Runnable {
private ConcurrentHashMap<String, Double> cacheRef;
private String word1, word2;
private WordRank workRankInstance;
private int no_docs;
private Database db;
public WTWThread(ConcurrentHashMap<String, Double> cacheRef, String word1, String word2, WordRank workRankInstance, int no_docs, Database db) {
this.cacheRef = cacheRef;
this.word1 = word1;
this.word2 = word2;
this.workRankInstance = workRankInstance;
this.no_docs = no_docs;
this.db = db;
}
@Override
public void run() {
double sum = 0;
for (int i = 1; i <= 10; i++) {
Double tf_idf1 = cacheRef.get(word1 + i);
if (tf_idf1 == null) {
tf_idf1 = workRankInstance.getTF_IDF(word1, i);
cacheRef.put(word1 + i, tf_idf1);
}
Double tf_idf2 = cacheRef.get(word2 + i);
if (tf_idf2 == null) {
tf_idf2 = workRankInstance.getTF_IDF(word2, i);
cacheRef.put(word2 + i, tf_idf2);
}
sum = sum + (tf_idf1 * tf_idf2);
}
double wtw = sum / no_docs;
String query = "INSERT INTO wtw(word1,word2,wtw) VALUES(?,?,?);";
try {
PreparedStatement ps = db.getConnection().prepareStatement(query);
ps.setString(1, word1);
ps.setString(2, word2);
ps.setDouble(3, wtw);
ps.executeUpdate();
ps.close();
} catch (SQLException ex) {
Logger.getLogger(WordRank.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
everything to me looks fine but here is what happens, when I run the program it processes the first few hundreds and then suddenly stops ! I checked in the System Monitor, the java process starts growing in memory usage and it goes up to something about 1Gb and then nothing happens. I thought maybe this is happening because I’m having too many threads, I tried with 4 threads but same thing happens. Then I thought maybe I should use sleep() before creating the threads and that did solve the problem, it worked like a charm, but even sleep(1) makes the program very slow ! and I checked every possible thing that I could think of ! Is there anything I’m missing here ?
How many words do you have, how much RAM do you have and what is this program doing?
Your
tf_idfCachewill get very large growing at least quadratically with number of words, with quite of constant factor (you are putting 10 things to cache for every word?), and it might cause performance problems.Finally you do have a concurrency issue, but I don’t think it is causing a lock. In code
You have no guarantee that you won’t calculate rank twice.
I don’t think that number of threads is causing any problem, but you might have some other concurrency issue that is causing a lock (if locking, and not memory overhead is a problem at all).