Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6892215
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T06:30:26+00:00 2026-05-27T06:30:26+00:00

Lets say the input file is: Hi my name NONE Hi my name is

  • 0

Lets say the input file is:

Hi my name NONE
Hi my name is ABC
Hi my name is ABC
Hi my name is DEF
Hi my name is DEF
Hi my name is XYZ

I have to create the following output:

Hi my name NONE 1
Hi my name is ABC 2
Hi my name is DEF 2
Hi my name is XYZ 1

The number of words in a single line can vary from 2 to 10. File size will be more than 1GB.

How can I get the required output in the minimum possible time. My current implementation uses a C++ program to read a line from the file and then compare it with next line. The running time of this implementation will always be O(n) where n is the number of characters in the file.

To improve the running time, the next option is to use the mmap. But before implementing it, I just wanted to confirm is there a faster way to do it? Using any other language/scripting?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T06:30:26+00:00Added an answer on May 27, 2026 at 6:30 am
    uniq -c filename | perl -lane 'print "@F[1..$#F] $F[0]"'
    

    The perl step is only to take the output of uniq (which looks like “2 Hi my name is ABC”) and re-order it into “Hi my name is ABC 2”. You can use a different language for it, or else leave it off entirely.

    As for your question about runtime, big-O seems misplaced here; surely there isn’t any chance of scanning the whole file in less than O(n). mmap and strchr seem like possibilities for constant-factor speedups, but a stdio-based approach is probably good enough unless your stdio sucks.

    The code for BSD uniq could be illustrative here. It does a very simple job with fgets, strcmp, and a very few variables.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Lets say that you have a following simple application: <form action=helloServlet method=post> Give name:<input
Lets say I have the following: $('input[rel]').jOverlay({ overlayId: #overlayID }); How can I dynamically
Let's say I have an input text file of the following format: Section1 Heading
So let's say i have one input like this: <input id=test multiple=true type=file name=image_name[]
Lets say that input from the user is a decimal number, ex. 5. 2155
Lets say we have a table here, populated with the following data: acc_id1 acc_id2
This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing
Lets say I have a method that is constantly receiving input every 10ms, adds
I have the following Perl code, which reads the input and indents the file
I have an input (let's say a file). On each line there is a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.