Lets say the input file is: Hi my name NONE Hi my name is

Question

0

Asked: May 27, 20262026-05-27T06:30:26+00:00 2026-05-27T06:30:26+00:00

Lets say the input file is: Hi my name NONE Hi my name is

0

Lets say the input file is:

Hi my name NONE
Hi my name is ABC
Hi my name is ABC
Hi my name is DEF
Hi my name is DEF
Hi my name is XYZ

I have to create the following output:

Hi my name NONE 1
Hi my name is ABC 2
Hi my name is DEF 2
Hi my name is XYZ 1

The number of words in a single line can vary from 2 to 10. File size will be more than 1GB.

How can I get the required output in the minimum possible time. My current implementation uses a C++ program to read a line from the file and then compare it with next line. The running time of this implementation will always be O(n) where n is the number of characters in the file.

To improve the running time, the next option is to use the mmap. But before implementing it, I just wanted to confirm is there a faster way to do it? Using any other language/scripting?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T06:30:26+00:00

uniq -c filename | perl -lane 'print "@F[1..$#F] $F[0]"'

The perl step is only to take the output of uniq (which looks like “2 Hi my name is ABC”) and re-order it into “Hi my name is ABC 2”. You can use a different language for it, or else leave it off entirely.

As for your question about runtime, big-O seems misplaced here; surely there isn’t any chance of scanning the whole file in less than O(n). mmap and strchr seem like possibilities for constant-factor speedups, but a stdio-based approach is probably good enough unless your stdio sucks.

The code for BSD uniq could be illustrative here. It does a very simple job with fgets, strcmp, and a very few variables.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Lets say the input file is: Hi my name NONE Hi my name is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply