I have some serial code like this that computes word concordances i.e. counting collocated

Question

0

Asked: May 26, 20262026-05-26T15:50:48+00:00 2026-05-26T15:50:48+00:00

I have some serial code like this that computes word concordances i.e. counting collocated

0

I have some serial code like this that computes word concordances i.e. counting collocated word pairs. The following program works except that the list of sentences is canned for illustrative purposes.

import sys
from collections import defaultdict

GLOBAL_CONCORDANCE = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: [])))

def BuildConcordance(sentences):
    global GLOBAL_CONCORDANCE
    for sentenceIndex, sentence in enumerate(sentences):
        words = [word for word in sentence.split()]

        for index, word in enumerate(words):
            for i, collocate in enumerate(words[index:len(words)]):
                GLOBAL_CONCORDANCE[word][collocate][i].append(sentenceIndex)

def main():
    sentences = ["Sentence 1", "Sentence 2", "Sentence 3", "Sentence 4"]
    BuildConcordance(sentences)
    print GLOBAL_CONCORDANCE

if __name__ == "__main__":
    main()

To me, the first for loop can be parallelized because the numbers being computed are indepedent. However, the data structure being modified is a global one.

I tried using Python’s Pool module but I am facing some pickling problems which makes me wonder if I am using the right design pattern. Can someone suggest a good way to parallelize this code?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:50:49+00:00

In general, multiprocessing is easiest when you use a functional style. In this case, my suggestion would be to return a list of result tuples from each instance of the worker function. The extra complexity of the nested defaultdicts doesn’t really gain you anything. Something like this:

import sys
from collections import defaultdict
from multiprocessing import Pool, Queue
import re

GLOBAL_CONCORDANCE = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))

def concordance_worker(index_sentence):
    sent_index, sentence = index_sentence
    words = sentence.split()

    return [(word, colo_word, colo_index, sent_index)
            for i, word in enumerate(words)
            for colo_index, colo_word in enumerate(words[i:])]

def build_concordance(sentences):
    global GLOBAL_CONCORDANCE
    pool = Pool(8)

    results = pool.map(concordance_worker, enumerate(sentences))

    for result in results:
        for word, colo_word, colo_index, sent_index in result:
            GLOBAL_CONCORDANCE[word][colo_word][colo_index].append(sent_index)

    print len(GLOBAL_CONCORDANCE)


def main():
    sentences = ["Sentence 1", "Sentence 2", "Sentence 3", "Sentence 4"]
    build_concordance(sentences)

if __name__ == "__main__":
    main()

Let me know if that doesn’t generate what you’re looking for.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some serial code like this that computes word concordances i.e. counting collocated

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply