I wrote a file routing utility (.NET) some time ago to examine a file’s location and name pattern and move it to some other preconfigured place based on the match. Fairly simple, straightforward kinda stuff. I had included the possibility of minor transformations through a series of regular expression search-and-replace actions that could be assigned to the file “route”, with the intent of adding header rows, replacing commas with pipes, that sort of thing.
So now I have a new text feed that consists of a file header, a batch header, and a multitude of detail records under the batches. The file header contains a count of all detail records in the file, and I have been asked to “split” the file in the assigned transformations, essentially producing a file for each batch record. This is fairly straightforward, as well, but the kicker is, there is an expectation to update the file header for each file to reflect the detail count.
I do not even know if this is possible with pure regular expressions. Can I count the number of matches of a group in a given text document and replace the count value in the original text, or am I going to have to write a custom transformer for this one file?
If I have to write another transformer, are there suggestions on how to make it generic enough to be reusable? I’m considering adding an XSLT transformer option, but my understanding of XSLT is not so great.
I’ve been asked for an example. Say I have a file like so:
FILE001DETAILCOUNT002
BATCH01
DETAIL001FOO
BATCH02
DETAIL001BAR
this file will be split and stored in two locations. The files will look like this:
FILE001DETAILCOUNT001
BATCH01
DETAIL001FOO
and
FILE001DETAILCOUNT001
BATCH01
DETAIL001BAR
so the sticker for me is the file header’s DETAILCOUNT value.
Regular expressions by themselves can’t count the number of matches they’ve made (or, better put, they don’t expose that to the regex user), so you do need additional program code to keep track of this.
A regex can only capture text that exists somewhere in the source material, it can’t generate new text. So unless you can find the number you need explicitly at some point in the source, you’re out of luck. Sorry.