I am working on application which processes large amount of text data gathering statistics on word occurrences (see: Source Code Word Cloud).
Here what the simplified core of my code is doing.
- Enumerate through all files with *.txt extension.
- Enumerate through words in each text files.
- Group by word and count occurrences.
- Sort by occurrences.
- Output top 20.
Everything worked fine with LINQ. Moving to PLINQ brought me significant performance boost.
But … cancelability during long running queries is lost.
It seems that the OrderBy Query is synchronizing data back into main thread and windows messages are not processed.
In the examle below I am demonstarting my implementation of cancelation according to MSDN How to: Cancel a PLINQ Query whic does not work 🙁
Any other ideas?
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;
using System.Windows.Forms;
namespace PlinqCancelability
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
m_CancellationTokenSource = new CancellationTokenSource();
}
private readonly CancellationTokenSource m_CancellationTokenSource;
private void buttonStart_Click(object sender, EventArgs e)
{
var result = Directory
.EnumerateFiles(@"c:\temp", "*.txt", SearchOption.AllDirectories)
.AsParallel()
.WithCancellation(m_CancellationTokenSource.Token)
.SelectMany(File.ReadLines)
.SelectMany(ReadWords)
.GroupBy(word => word, (word, words) => new Tuple<int, string>(words.Count(), word))
.OrderByDescending(occurrencesWordPair => occurrencesWordPair.Item1)
.Take(20);
try
{
foreach (Tuple<int, string> tuple in result)
{
Console.WriteLine(tuple);
}
}
catch (OperationCanceledException ex)
{
Console.WriteLine(ex.Message);
}
}
private void buttonCancel_Click(object sender, EventArgs e)
{
m_CancellationTokenSource.Cancel();
}
private static IEnumerable<string> ReadWords(string line)
{
StringBuilder word = new StringBuilder();
foreach (char ch in line)
{
if (char.IsLetter(ch))
{
word.Append(ch);
}
else
{
if (word.Length != 0) continue;
yield return word.ToString();
word.Clear();
}
}
}
}
}
As Jon said, you’ll need to start the PLINQ operation on a background thread. This way, the user interface doesn’t hang while waiting until the operation completes (so the event handler for Cancel button can be invoked and the
Cancelmethod of the cancellation token gets called). The PLINQ query cancels itself automatically when the token is cancelled, so you don’t need to worry about that.Here is one way to do this: