I have a file containing records delimited by the pattern /#matchee/. These records are of varying lengths …say 45 – 75 lines. They need to ALL be 45 lines and still maintain the record delimiter. Records can be from different departments, department name is on line 2 following a blank line. So record delimiter could be thought of as simply /^#matchee/ or /^matchee/ followed by \n. There is a Deluxe edition of this problem and a Walmart edition …
DELUXE EDITION
Pull each record by pattern range so I can sort records by department. Eg., with sed
sed -n '/^DEPARTMENT NAME/,/^#matchee/{p;}' mess-o-records.txt
Then, Print only the first 45 lines of each record in the file to conform to
the 45 line constraint.
Finally, make sure the result still has the record delimiter on line 45.
WALMART EDITION
Same as above, but instead of using a range, just use the record delimiter.
STATUS
My attempt at this might clarify what I’m trying to do.
sed -n -e '/^DEPARTMENT-A/,/^#matchee/{p;}' -e '45q' -e '$s/.*/#matchee/' mess-o-records.txt
This doesn’t work, of course, because sed is operating on the entire file at each command.
I need it to operate on each range match not the whole file.
SAMPLE INPUT – 80 Lines ( truncated for space )
<blank line>
DEPARTMENT-A
Office space 206
Anonymous, MI 99999
Harold O Nonymous
Buckminster Abbey
Anonymous, MI 99999
item A Socket B 45454545
item B Gizmo Z 76767676
<too many lines here>
<way too many lines here>
#matchee
SAMPLE OUTPUT – now only 45 lines
<blank line>
DEPARTMENT-A
Office space 206
Anonymous, MI 99999
Harold O Nonymous
Buckminster Abbey
Anonymous, MI 99999
item A Socket B 45454545
item B Gizmo Z 76767676
<Record now equals exactly 45 lines>
<yet record delimiter is maintained>
#matchee
CLARIFICATION UPDATE
I will never need more than the first 40 lines if this makes things easier. Maybe the process would be:
- Match pattern(s)
- Print first 40 lines.
- Pad to appropriate length. Eg., 45 lines.
- Tack delimiter back on. Eg., #matchee
I think this would be more flexible — Ie., can handle record shorter than 45 lines.
Here’s a riff based on @Borodin’s Perl example below:
my $count = 0;
$/ = "#matchee";
while (<>) {
if (/^REDUNDANCY.*DEPT/) {
print;
$count = 0;
}
else {
print if $count++ < 40;
print "\r\n" x 5;
print "#matchee\r\n";
}
}
This add 5 newlines to each record + the delimiting pattern /#matchee/. So it’s wrong — but it illustrates what I want.
Print 40 lines based on department — pad — tack delimiter back on.
I think I understand what you want. Not sure about the bit about pull each record by pattern range. Is
#matcheealways followed by a blank line and then the department line? So in fact record number 2?This Perl fragment does what I understand you need.
If you prefer you can put the input file on the command line and drop the
opencall. Then the loop would have to bewhile (<>) { ... }.Let us know if this is right so far, and what more you need from it.