I have two text files files (TXT) which contain over 2 million distinct file

Question

0

Asked: May 25, 20262026-05-25T16:43:57+00:00 2026-05-25T16:43:57+00:00

I have two text files files (TXT) which contain over 2 million distinct file

0

I have two text files files (TXT) which contain over 2 million distinct file names. I want to loop through all the names in the first file and find those that are also present in the second text file.

I have tried looping through the StreamReader but it takes a lot of time. I also tried the code below, but it still takes too much time.

StreamReader first = new StreamReader(path);
string strFirst = first.ReadToEnd();
string[] strarrFirst = strFirst.Split('\n');

 bool found = false;

StreamReader second = new StreamReader(path2);
string str = second.ReadToEnd();
string[] strarrSecond = str.Split('\n');

for (int j = 0; j < (strarrFirst.Length); j++)
{
          found = false;

    for (int i = 0; i < (strarrSecond .Length); i++)
    {
        if (strarrFirst[j] == strarrSecond[i])
        {
            found = true;
            break;
        }
    }

    if (!found)
    {
        Console.WriteLine(strarrFirst[j]);
    }
}

What is a good way to compare the files?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T16:43:58+00:00

How about this:

var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2));

That’s O(N + M) instead of your current solution which tests every line in the first file with every line in the second file – O(N * M).

That’s assuming you’re using .NET 4. Otherwise, you could use File.ReadAllLines, but that will read the whole file into memory. Or you could write the equivalent of File.ReadLines yourself – it’s not terribly hard.

Ultimately you’re likely to be limited by file IO by the time you’ve got rid of the O(N * M) problem in your current code – there’s not much way to get round that.

EDIT: For .NET 2, first let’s implement something like ReadLines:

public static IEnumerable<string> ReadLines(string file)
{
    using (TextReader reader = File.OpenText(file))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

Now we really want to use a HashSet<T>, but that wasn’t in .NET 2 – so let’s use Dictionary<TKey, TValue> instead:

Dictionary<string, string> map = new Dictionary<string, string>();
foreach (string line in ReadLines(path))
{
    map[line] = line;
}

List<string> intersection = new List<string>();
foreach (string line in ReadLines(path2))
{
    if (map.ContainsKey(line))
    {
        intersection.Add(line);
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two text files files (TXT) which contain over 2 million distinct file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply