I have a big log file, over 1 million lines. I need to use

Question

0

Asked: May 27, 20262026-05-27T02:20:14+00:00 2026-05-27T02:20:14+00:00

I have a big log file, over 1 million lines. I need to use

0

I have a big log file, over 1 million lines.
I need to use regexp to find a pattern and then start chomping down until I hit another regular expression. So I would end up with something like 1500 lines.

I know sed allows use of regexp, but can it split files? I have no experience with awk, but I think that this should allow me to do what I need. I am confused from reading the manpage though… I would appriciate some examples or even more simpler solutions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T02:20:14+00:00

In both AWK ans SED you can define a RegEx like this –

AWK: In AWK you would notice that we haven’t written print anywhere. In AWK (which is based on pattern/action statements, print is the default action whenever the pattern statement is true. Hence the in following case, whenever the RegEx pattern is true, AWK would print it for us.

awk '/regex1/,/regex2/' INPUT_FILE > NEW_FILE

SED: In SED we use a -n option to suppress default behaviour of printing everything and use the RegEx with p to tell SED for printing specific lines.

sed -n '/regex1/,/regex2/p' INPUT_FILE > NEW_FILE

Alternatively, you can also give the following one-liner

sed '/regex1/,/regex2/!d' INPUT_FILE > NEW_FILE

Using the redirection operator > you can create a subset of your file.

For Splitting files in AWK, if you know the Number of Records in your file (wc -l < INPUT_FILE) then you can write something like this –

awk 'NR==2,NR==5' INPUT_FILE

NR is AWK’s built-in variable that gets set to the Record’s Line Number. So if you have a file with 1500 lines and need just top 750, then you can do something like this –

awk 'NR==1,NR==750' INPUT_FILE

As mentioned earlier, you can but you don’t have to mention print with AWK. It does it for you as long as your pattern is true.

Though with a million lines in your file, this will be a major pain. So the following AWK one-liner should do the trick.

awk '{print >("SMALL_BATCH_OF_FILES_" int((NR+2)/3))}' BIG_INPUT_FILE

This one-liner will create SMALL_BATCH_OF_FILES_ containing 3 lines each. You can set this to your comfort level. (NR+2/3)

Execution:

[jaypal~/Temp]$ cat BIG_INPUT_FILE 
1
2
3
4
5
6
7
8
9
10

[jaypal~/Temp]$ awk '{print >("SMALL_BATCH_OF_FILES_" int((NR+2)/3))}' BIG_INPUT_FILE

[jaypal~/Temp]$ ls -lrt SMALL*
-rw-r--r--  1 jaypalsingh  staff  3 25 Nov 10:41 SMALL_BATCH_OF_FILES_4
-rw-r--r--  1 jaypalsingh  staff  6 25 Nov 10:41 SMALL_BATCH_OF_FILES_3
-rw-r--r--  1 jaypalsingh  staff  6 25 Nov 10:41 SMALL_BATCH_OF_FILES_2
-rw-r--r--  1 jaypalsingh  staff  6 25 Nov 10:41 SMALL_BATCH_OF_FILES_1

[jaypal~/Temp]$ cat SMALL_BATCH_OF_FILES_1 
1
2
3
[jaypal~/Temp]$ cat SMALL_BATCH_OF_FILES_2 
4
5
6
[jaypal~/Temp]$ cat SMALL_BATCH_OF_FILES_3
7
8
9
[jaypal~/Temp]$ cat SMALL_BATCH_OF_FILES_4
10

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a big log file, over 1 million lines. I need to use

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply