Following code is simplified version of the code that I am trying to optimize.
void Main()
{
var words = new List<string> {"abcd", "wxyz", "1234"};
foreach (var character in SplitItOut(words))
{
Console.WriteLine (character);
}
}
public IEnumerable<char> SplitItOut(IEnumerable<string> words)
{
foreach (string word in words)
{
var characters = GetCharacters(word);
foreach (char c in characters)
{
yield return c;
}
}
}
char[] GetCharacters(string word)
{
Thread.Sleep(5000);
return word.ToCharArray();
}
I cannot change the signature of method SplitItOut.The GetCharacters method is expensive to call but is thread safe. The input to SplitItOut method can contain 100,000+ entries and a single call to GetCharacters() method can take around 200ms. It can also throw exceptions which I can ignore. Order of the results do not matter.
In my first attempt I came up with following implementation using TPL which speeds up the things quite a bit, but is blocking till I am done processing all the words.
public IEnumerable<char> SplitItOut(IEnumerable<string> words)
{
Task<char[][]> tasks = Task<char[][]>.Factory.StartNew(() =>
{
ConcurrentBag<char[]> taskResults = new ConcurrentBag<char[]>();
Parallel.ForEach(words,
word =>
{
taskResults.Add(GetCharacters(word));
});
return taskResults.ToArray();
});
foreach (var wordResult in tasks.Result)
{
foreach (var c in wordResult)
{
yield return c;
}
}
}
I am looking for any better implementation for method SplitItOut() than this. Lower processing time is my priority here.
If I’m reading your question correctly, you’re not looking to just speed up the parallel processing that creates the chars from the words – you would like your enumerable to produce each one as soon as it’s ready. With the implementation you currently have (and the other answers I currently see), the
SplitItOutwill wait until all of the words have been sent toGetCharacters, and all results returned before producing the first one.In cases like this, I like to think of things as splitting my process into producers and a consumer. Your producer thread(s) will take the available words and call GetCharacters, then dump the results somewhere. The consumer will yield up characters to the caller of
SplitItOutas soon as they are ready. Really, the consumer is the caller ofSplitItOut.We can make use of the
BlockingCollectionas both a way to yield up the characters, and as the “somewhere” to put the results. We can use theConcurrentBagas a place to put the words that have yet to be split:No changes to your
mainorGetCharacters– since these represent your constraints (can’t change caller, can’t change expensive operation)Here, we change the
SplitItOutmethod to do four things:BlockingCollectionthat we are done when all tasks have completed.IEnumerable<char>rather than foreach and yield, but you could do it the long way if you wish)All that’s missing is our producer implementation. I’ve expanded out all the linq shortcuts to make it clear, but it’s super simple:
This simply