This is a vexing one. I had a block of text coming from a Beyond Compare script report.
Picture Compare
Produced: 10/17/2012 9:42:25 AM
Ignoring Unimportant
Left file: K:\HDA_FIN\user\JMan\All\A-0001.jpg Right file: K:\HDA_FIN\user\JMan\All\B-0001.jpg
3454945 same pixel(s)
2154 ignored unimportant difference pixel(s)
2741 important difference pixel(s)
This repeats over and over as the script compares mated jpegs in the folder. But some jpegs are 100% the same, so they have no ignored unimportant or important differences. And some will have same differences and important differences, but no unimportants, and etc. So I am trying to capture matches that start with “Picture Compare” and end with the LAST “pixel(s)” before the next “Picture Compare” starts again.
What I have tried:
What is am doing not is an ugly method: I use a stream reader, and while !EndOfStream, I perform sr.ReadLine() and add each line to a List. I then use a for loop to iterate through the list and apply a series of if statements to determine whether the current string in the loop and the next few ahead match what I am looking for, and if so, I bind them to an object. But surely Regex is much simpler.
var lineByLine = new List<string>();
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
sb.AppendLine(line);
if (line.Trim().Length > 0) // && !line.Contains("picture-report layout"))
{
lineByLine.Add(line);
}
}
Contents = sb.ToString();
//get the report blocks
for (int i = 0; i < lineByLine.Count; i++)
{
Block block;
string[] lines = { "", "", "", "", "", "", "" };
//does line contain pic compare? if so, this is the start of an object
if (lineByLine[i].Contains("Picture Compare"))
{
lines[0] = lineByLine[i]; //start line
block = new Block();
lines[1] = lineByLine[i + 1]; //produces
lines[2] = lineByLine[i + 2]; //subheading
if (lineByLine[i + 3].Contains("Left"))
{
lines[3] = lineByLine[i + 3]; //file
if (lineByLine[i + 4].Contains("same pixel(s)"))
{
lines[4] = lineByLine[i + 4]; //same
if (lineByLine[i + 5].Contains("ignored unimportant"))
{
lines[5] = lineByLine[i + 5];
if (lineByLine[i + 6].Contains(" important difference"))
{
lines[6] = lineByLine[i + 6];
}
}
}
else if (lineByLine[i + 4].Contains("ignored unimportant"))
{
lines[5] = lineByLine[i + 4];
if (lineByLine[i + 5].Contains(" important difference"))
{
lines[6] = lineByLine[i + 5];
}
}
else if (lineByLine[i + 4].Contains(" important difference"))
{
lines[6] = lineByLine[i + 4];
}
}
Blocks.Add(new Block(lines[0], lines[1], lines[2], lines[3], lines[4], lines[5], lines[6]));
}
}
}
finally
{
sr.Close();
}
This works, but I am trying to refactor and make it cleaner. I tried this:
var matches = Regex.Matches(cr.Contents, "(Picture Compare)(.*?)(pixel)", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.ExplicitCapture);
-but it stops at same pixels in all cases. I need something greedier. Any ideas?
Instead of finding the end you could try to find the next start:
This matches
Picture Compareand then as many characters as possible, as long as they don’t start a newPicture Compare(this is what the negative lookahead is for). This should simply give you all those blocks.Then on each of those blocks, you can do a lot simpler scanning to get the values you are interested in (unfortunately, I don’t know which ones that are, otherwise I might have another regex for those as well
:P).