I have a big log file, over 1 million lines.
I need to use regexp to find a pattern and then start chomping down until I hit another regular expression. So I would end up with something like 1500 lines.
I know sed allows use of regexp, but can it split files? I have no experience with awk, but I think that this should allow me to do what I need. I am confused from reading the manpage though… I would appriciate some examples or even more simpler solutions.
In both AWK ans SED you can define a RegEx like this –
AWK: In AWK you would notice that we haven’t written
printanywhere. In AWK (which is based on pattern/action statements, print is the default action whenever the pattern statement is true. Hence the in following case, whenever the RegEx pattern is true, AWK would print it for us.SED: In SED we use a -n option to suppress default behaviour of printing everything and use the RegEx with
pto tell SED for printing specific lines.Alternatively, you can also give the following one-liner
Using the redirection operator
>you can create a subset of your file.For Splitting files in AWK, if you know the Number of Records in your file (
wc -l < INPUT_FILE) then you can write something like this –NR is AWK’s built-in variable that gets set to the Record’s Line Number. So if you have a file with 1500 lines and need just top 750, then you can do something like this –
As mentioned earlier, you can but you don’t have to mention
printwith AWK. It does it for you as long as your pattern is true.Though with a million lines in your file, this will be a major pain. So the following AWK one-liner should do the trick.
This one-liner will create SMALL_BATCH_OF_FILES_ containing 3 lines each. You can set this to your comfort level. (NR+2/3)
Execution: