I am trying to write a regular expression that will count the number of

Question

0

Asked: May 26, 20262026-05-26T15:08:24+00:00 2026-05-26T15:08:24+00:00

I am trying to write a regular expression that will count the number of

0

I am trying to write a regular expression that will count the number of times two words co-occur within a certain proximity (within 5 words of each other) in a string, without double counting words.

For example, if I had a string:

“The man liked his big hat. The hat was very big.”

In this case, the regex should see the “big hat” in the first sentence and the “hats are big” in the second sentence, returning a total of 2. Note that in the second sentence, there are several words between “hat” and “big”, they also appear in a different order than the first sentence, but they still occur within a 5-word window.

If regular expressions are not the correct way to approach this problem, please let me know what I should try instead.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:08:25+00:00

A bit like Stephen C but using library classes to assist in the mechanics.

    String input = "The man liked his big hat. The hat was very big";
    int proximity = 5;

    // split input into words
    String[] words = input.split("[\\W]+");

    // create a Deque of the first <proximity> words
    Deque<String> haystack = new LinkedList<String>(Arrays.asList(Arrays.copyOfRange(words, 0, proximity)));

    // count duplicates in the first <proximity> words
    int count = haystack.size() - new HashSet<String>(haystack).size();
    System.out.println("initial matches: " + count);

    // process the rest of the words
    for (int i = proximity; i < words.length; i++) {
        String word = words[i];
        System.out.println("matching '" + word + "' in [" + haystack + "]");

        if (haystack.contains(word)) {
            System.out.println("matched word " + word + " at index " + i);
            count++;
        }

        // remove the first word
        haystack.removeFirst();
        // add the current word
        haystack.addLast(word);
    }

    System.out.println("total matches:" + count);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to write a regular expression that will count the number of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply