I am currently experiencing a strange behavior in this application I am building. Preface

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T20:30:30+00:00 2026-06-16T20:30:30+00:00

I am currently experiencing a strange behavior in this application I am building. Preface

0

I am currently experiencing a strange behavior in this application I am building.

Preface

This application I am building has a simple goal — to take a collection of strings and search for each of those strings across multiple text files. The application also tracks unique matches for each string, i.e. string "abcd" will only be counted once if it appears n-times in file A.

Since this application will mainly be dealing with large numbers of files and large number of strings, I decided to do the string search in the background by creating a class that implements Runnable and using a ExecutorService to run the Runnable task. I also decided to investigate the speediness of the string search, so I started comparing the times using different methods of string matching (i.e. String.contains(), String.indexOf(), Boyer-Moore algorithm). I grabbed the source code of the Boyer-Moore algorithm from http://algs4.cs.princeton.edu/53substring/BoyerMoore.java.html and included it into my project. Here is where the problem started…

The Problem

I noticed that the string search would come back with varying results (each time I would run the search, the number of found strings would vary) when using the BoyerMoore class so I replaced it with a String.contains() so that the code looks like the following…

private boolean findStringInFile(String pattern, File file) {
    boolean result = false;
    BoyerMoore bm = new BoyerMoore(pattern); // This line still causes varying results.
    try {
        Scanner in = new Scanner(new FileReader(file));
        while(in.hasNextLine() && !result) {
            String line = in.nextLine();
            result = line.contains(pattern);
        }
        in.close();
    } catch (FileNotFoundException e) {
        System.out.println("ERROR: " + e.getMessage());
        System.exit(0);
    }
    return result;
}

Even with the above code, the results were still inconsistent. It seems like the instantiation of the BoyerMoore object is causing the results to vary. I dug a little deeper and found that the following code in the BoyerMoore constructor was causing this inconsistency…

// position of rightmost occurrence of c in the pattern
right = new int[R];
for (int c = 0; c < R; c++)
    right[c] = -1;
for (int j = 0; j < pat.length(); j++)
    right[pat.charAt(j)] = j;

Now I know what was causing the inconsistency but I still do not understand why it was happening. I’m no veteran when it comes to multi-threading so any possible explanation/insight is greatly appreciated!

Below is the full code for the search task…

private class Search implements Runnable {
    private File mSearchableFile;
    private ConcurrentHashMap<String,Integer> mTable;

    public Search(File file,ConcurrentHashMap<String,Integer> table) {
        mSearchableFile = file;
        mTable = table;
    }

    @Override
    public void run() {
        Iterator<String> nodeItr = mTable.keySet().iterator();
        while(nodeItr.hasNext()) {
            String currentString = nodeItr.next();
            if(findStringInFile(currentString , mSearchableFile)) {
                Integer count = mTable.get(currentString) + 1;
                mTable.put(currentString,count);
            }
        }
    }

    private boolean findStringInFile(String pattern, File file) {
        boolean result = false;
        // BoyerMoore bm = new BoyerMoore(pattern);
        try {
            Scanner in = new Scanner(new FileReader(file));
            while(in.hasNextLine() && !result) {
                String line = in.nextLine();
                result = line.contains(pattern);
            }
            in.close();
        } catch (FileNotFoundException e) {
            System.out.println("ERROR: " + e.getMessage());
            System.exit(0);
        }
        return result;
    }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T20:30:31+00:00

This should perform better as

each file is only opened and closed once.
no data is shared between threads so there is no inter-thread overhead (except the final result)

This gets the matches for each file and accumulates the count in a single thread.

static class Search implements Callable<Set<String>> {
    private final File file;
    private final Set<String> toFind;
    private final long lastModified;

    public Search(File file, Set<String> toSearchFor) {
        this.file = file;
        lastModified = file.lastModified();
        toFind = new CopyOnWriteArraySet<>(toSearchFor);
    }

    @Override
    public Set<String> call() throws Exception {
        Set<String> found = new HashSet<>();
        Scanner in = new Scanner(new FileReader(file));
        while (in.hasNextLine() && !toFind.isEmpty()) {
            String line = in.nextLine();
            for (String s : toFind) {
                if (line.contains(s)) {
                    toFind.remove(s);
                    found.add(s);
                }
            }
        }
        in.close();

        if (file.lastModified() != lastModified) 
            throw new AssertionError(file + " was modified");
        return found;
    }
}

public static Map<String, AtomicInteger> performSearches(
        ExecutorService service, File[] files, Set<String> toFind)
        throws ExecutionException, InterruptedException {
    List<Future<Set<String>>> futures = new ArrayList<>();
    for (File file : files) {
        futures.add(service.submit(new Search(file, toFind)));
    }
    Map<String, AtomicInteger> counts = new LinkedHashMap<>();
    for (String s : toFind)
        counts.put(s, new AtomicInteger());
    for (Future<Set<String>> future : futures) {
        for (String s : future.get())
            counts.get(s).incrementAndGet();
    }
    return counts;
}

These lines are not thread safe. Any number of threads can be updating the same key so the result will not be safe.

Integer count = mTable.get(currentString) + 1;
// another thread could be running here.
mTable.put(currentString,count);

A simple workaround is to use AtomicInteger (it will also simplify your code)

private final ConcurrentHashMap<String, AtomicInteger> mTable;

for(Map.Entry<String, AtomicInteger> entry: mTable.entrySet()) 
    if(findStringInFile(entry.getKey(), mSearchableFile)) 
        entry.getValue().incrementAndGet();

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am currently experiencing a strange behavior in this application I am building. Preface

Preface

The Problem

Below is the full code for the search task…

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply