Summary : is there a way to get the unique lines from a file

Question

0

Asked: June 6, 20262026-06-06T08:44:50+00:00 2026-06-06T08:44:50+00:00

Summary : is there a way to get the unique lines from a file

0

Summary : is there a way to get the unique lines from a file and the number of occurrences more efficiently than using a sort | uniq -c | sort -n?

Details: I often pipe to sort | uniq -c | sort -n when doing log analysis to get a general trending of which log entries show up the most / least etc. This works most of the time – except when I’m dealing with a very large log file that ends up with a very large number of duplicates (in which case sort | uniq -c ends up taking a long time).

Example: The specific case I’m facing right now is for getting a trend from an ‘un-parametrized’ mysql bin log to find out which queries are run the most. For a file of a million entries which I pass through a grep/sed combination to remove parameters – resulting in about 150 unique lines – I spend about 3 seconds grepping & sedding, and about 15s sorting/uniq’ing.

Currently, I’ve settled with a simple c++ program that maintains a map of < line, count > – which does the job in less than a second – but I was wondering if an existing utility already exists.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T08:44:52+00:00

Editorial Team

2026-06-06T08:44:52+00:00Added an answer on June 6, 2026 at 8:44 am

I’m not sure what the performance difference will be, but you can replace the sort | uniq -c with a simple awk script. Since you have many duplicates and it hashes instead of sorting, I’d imagine it’s faster:

 awk '{c[$0]++}END{for(l in c){print c[l], l}}' input.txt | sort -n

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Summary : is there a way to get the unique lines from a file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply