Hope the AWK gurus can provide a solution to my problem .
I have a file that goes like this :
cat cat cat cat cat cat dog rat ate dog tit
dog cat dog dog dog rat dog pat ate cat dog
I have to use AWK to extract the pattern between the first occuring c and a d .Starting from the first c a count should be kept on the number of c‘s and d‘s such that when the count matches , the part between the first c and the matched d shoud be ouput to a file including the number of the line in which the match for d occured .
In this particular example the match occurs on the seventh dog , therefore the output will have to be :
cat cat cat cat cat cat dog rat ate dog tit
dog cat dog dog dog rat d
The match can go beyond just two lines ! The output can or cannot be inclusive of the c and the d .There exists all kinds of characters inclusive of the special ones in the text !
In order for the print to occur the count has to be matched .
Thanks in advance for the replies. Suggestions are always welcome .
EDIT : The capture of the pattern between c and d can be compromised as long as the condition is met and the line number of the exit d is obtained 🙂
A few tips, without giving the full solution:
By default, awk considers each line as a record. The default record separator is
RS="\n".Depending on your version of awk, you may be able to set
RS, the record separator, to a regex which matches eithercord. Then, for each record, you can check theRTvariable, which will contain eithercord, depending on what has actually been matched. Starting from there, using a variable incremented onc, decremented ondyou will be able to find the end of the match when it reaches 0.You can then use a variable that contains your match so far, and keep concatenating
RTand the new record to it, until you’re done.If you need to know the line number of the end of the match, you can set
RSto a regex that either matchesc,d, as previously, but also add the possibility to match\n. And by maintaining another counter variable incremented every timeRTtells you that\nhas been matched, you’ll have your line number.