Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8130523
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T08:44:50+00:00 2026-06-06T08:44:50+00:00

Summary : is there a way to get the unique lines from a file

  • 0

Summary : is there a way to get the unique lines from a file and the number of occurrences more efficiently than using a sort | uniq -c | sort -n?

Details: I often pipe to sort | uniq -c | sort -n when doing log analysis to get a general trending of which log entries show up the most / least etc. This works most of the time – except when I’m dealing with a very large log file that ends up with a very large number of duplicates (in which case sort | uniq -c ends up taking a long time).

Example: The specific case I’m facing right now is for getting a trend from an ‘un-parametrized’ mysql bin log to find out which queries are run the most. For a file of a million entries which I pass through a grep/sed combination to remove parameters – resulting in about 150 unique lines – I spend about 3 seconds grepping & sedding, and about 15s sorting/uniq’ing.

Currently, I’ve settled with a simple c++ program that maintains a map of < line, count > – which does the job in less than a second – but I was wondering if an existing utility already exists.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T08:44:52+00:00Added an answer on June 6, 2026 at 8:44 am

    I’m not sure what the performance difference will be, but you can replace the sort | uniq -c with a simple awk script. Since you have many duplicates and it hashes instead of sorting, I’d imagine it’s faster:

     awk '{c[$0]++}END{for(l in c){print c[l], l}}' input.txt | sort -n
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

is there any way to get the ASP.NET validation summary control to just display
Is there an easy way to get a conflict summary after running a cvs
is there some way of sending the summary info of properties in a DataContract?
I wonder if there is a way to get your own app version in
When using the mainframe DB2 adapter for Microsoft BizTalk 2009, is there any way
Is there an easy way to get ride of the traditional quartiles returned by
Is there anyway to give text editors summary information in a tooltip for custom
Is there a tool available that can produce an HTML summary list of perl
There is a note at the end of the atomics package summary that states:
Here's my problem: there's an internal issue tracking system that has a nice summary

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.