I’m trying to extract a set of data from some (large) text files. Basically, each line looks something like this:
2011-12-09 18:20:55, ABC.EXE[3b78], The rest of the line...
I’d like to get the date and the bit between the braces (the process id), and then compile a table. The second stage of the task is to group this table so that I get the earliest date for each process id, in effect giving me the date and time of the first log entry per process id which will hopefully approximate to the start time of that instance of the process.
What I’ve got so far (split onto different line for readability)
gci -filter *.log -r
| select-string '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}), ABC.EXE\[(.{4})'
| % { $_.matches } | % { $_.groups } | % { $_.value }
spits out the the captures. I’d like to ignore the first capture, and combine the second and third onto the same line.
Help?
Please?
Edit: DOH! Can’t answer my own question. So…
Ok, I think I’m on the right track. A SO question here helped me to get the individual parts I wanted, namely:
$_.matches[0].groups[1].value, $_.matches[0].groups[2].value
Then, an MSDN article here shows how to ‘clump’ the bits into an object, which allows it to be grouped / sorted / manipulated. Final result
gci -filter *.log | select-string '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}), ABC.EXE\[(.{4})'
| % { new-object object
| add-member NoteProperty Name $_.matches[0].groups[1].value -passthru
| add-member NoteProperty PId $_.matches[0].groups[2].value -passthru }
Quite messy, so if anyone knows of a cleaner way to do it, please let me know.
You can create new objects simpler in PowerShell v2 where the
New-Objectcmdlet supports a-Propertyparameter that receives a hashtable of properties:Generally, I’d do the processing a little differently, though:
Using
switch -regexhas become a nice way (to me at least) to do quick-and-dirty parsers for text data. With-Regexall matching cases will be run, in this case all (so it’s just a convenience to separate different parts of the matching). The first one grabs the date and time and stores it in a variable (even as aDateTimevalue); the second gets the process ID and the third, matching on the end of a line, puts it all together.Just a personal preference, though; I have actually never used
Select-String.This then uses the just-compiled data, groups it by process ID and outputs the ID with the minimum date for each one.
Note, this is more a “looks nice in code” approach. If the files you’re dealing with are really large, you probably want something way more efficient.