I have a file collection (3000 files) in a FileInfoCollection. I want to process all the files by applying some logic which is independent (can be executed in parallel).
FileInfo[] fileInfoCollection = directory.GetFiles();
Parallel.ForEach(fileInfoCollection, ProcessWorkerItem);
But after processing about 700 files I am getting an out of memory error. I used Thread-pool before but it was giving same error.
If I try to execute without threading (parallel processing) it works fine.
In “ProcessWorkerItem” I am running an algorithm based on the string data of the file. Additionally I use log4net for logging and there are lot of communications with the SQL server in this method.
Here are some info, Files size : 1-2 KB XML files. I read those files and the process is dependent on the content of the file. It is identifying some keywords in the string and generating another XML format. Keywords are in the SQL server database (nearly 2000 words).
I found the bug which raised the memory leak, I as using Unit Of Work pattern with entity framework. In unit of work I keep the context in a hash table with thread name as the hash key. When I use threading the hash table keeps growing and it cased the memory leak.
So I added additional method to unit of work to remove the element from hash table after completing the task of a thread.