I have the variable
StreamReader DebugInfo = GetDebugInfo();
var text = DebugInfo.ReadToEnd(); // takes 10 seconds!!! because there are a lot of students
text equals:
<student>
<firstName>Antonio</firstName>
<lastName>Namnum</lastName>
</student>
<student>
<firstName>Alicia</firstName>
<lastName>Garcia</lastName>
</student>
<student>
<firstName>Christina</firstName>
<lastName>SomeLattName</lastName>
</student>
... etc
.... many more students
what am I doing now is:
StreamReader DebugInfo = GetDebugInfo();
var text = DebugInfo.ReadToEnd(); // takes 10 seconds!!!
var mtch = Regex.Match(text , @"(?s)<student>.+?</student>");
// keep parsing the file while there are more students
while (mtch.Success)
{
AddStudent(mtch.Value); // parse text node into object and add it to corresponding node
mtch = mtch.NextMatch();
}
the whole process takes about 25 seconds. to convert the streamReader to text (var text = DebugInfo.ReadToEnd();) that takes 10 seconds. the other part takes about 15 seconds. I was hoping I could do the two parts at the same time…
EDIT
I will like to have something like:
const int bufferSize = 1024;
var sb = new StringBuilder();
Task.Factory.StartNew(() =>
{
Char[] buffer = new Char[bufferSize];
int count = bufferSize;
using (StreamReader sr = GetUnparsedDebugInfo())
{
while (count > 0)
{
count = sr.Read(buffer, 0, bufferSize);
sb.Append(buffer, 0, count);
}
}
var m = sb.ToString();
});
Thread.Sleep(100);
// meanwhile string is being build start adding items
var mtch = Regex.Match(sb.ToString(), @"(?s)<student>.+?</student>");
// keep parsing the file while there are more nodes
while (mtch.Success)
{
AddStudent(mtch.Value);
mtch = mtch.NextMatch();
}
Edit 2
Summary
I forgot to mention sorry the text is very similar to xml but it is not. That’s why I have to use regular expressions… In short I think I could save time because what am I doing is converting the stream to a string then parsing the string. why not just parse the stream with a regex. Or if that is not possible why not get a chunk of the stream and parse that chunk in a separate thread.
@kakridge was right. I could be dealing with a race condition where one task is writing listToProces[30] for example and another thread could be parsing listToProces[30]. To fix that problem and also to remove the Thread.Sleep methods that are ineficient I ended up using semaphores. Here is my new code: