Given a text file in the format below, each line is a list of

Question

0

Asked: June 7, 20262026-06-07T06:22:00+00:00 2026-06-07T06:22:00+00:00

Given a text file in the format below, each line is a list of

0

Given a text file in the format below, each line is a list of up to 50
names. Write a program produces a list of pairs of names which appear
together in at least fifty different lists.
Tyra,Miranda,Naomi,Adriana,Kate,Elle,Heidi
Daniela,Miranda,Irina,Alessandra,Gisele,Adriana
In the above sample, Miranda and Adriana appear together twice, but
every other pair appears only once. It should return
“Miranda,Adriana\n”. An approximate solution may be returned with
lists which appear at least 50 times with high probability.

I was thinking of the following solution:

Generate a Map <Pair,Integer> pairToCountMap, after reading through the file.
Iterate through the map, and print those with counts >= 50

Is there a better way to do this? The file could be very large, and I’m not sure what is meant by the approximate solution. Any links or resources would be much appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T06:22:02+00:00

First let’s assume that names are limited in length, so operations on them are constant time.

Your answer should be acceptable if it fits in memory. If you have N lines with m names each, your solution should take O(N*m*m) to complete.

If that data set doesn’t fit in memory, you can write the pairs to a file, sort that file using a merge sort, then scan through to count pairs. The running time of this is O(N*m*log(N*m)), but due to details about speed of disk access will run much faster in practice.

If you have a distributed cluster, then you could use a MapReduce. It would run very similarly to the last solution.

As for the statistics approach, my guess is that they mean running through the list of files to find the frequency of each name, and the number of lines with different numbers of names in them. If we assume that each line is a random assortment of names, using statistics we can estimate how many intersections there are between any pair of common names. This will be roughly linear in the length of the file.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given a text file in the format below, each line is a list of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply