Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8483527
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T20:07:46+00:00 2026-06-10T20:07:46+00:00

I have a file containing the data shown below. The first comma-delimited field may

  • 0

I have a file containing the data shown below. The first comma-delimited field may be repeated any number of times, and I want to print only the lines after the sixth repetition of any value of this field

For example, there are eight fields with 1111111 as the first field, and I want to print only the seventh and eighth of these records

Input file:

1111111,aaaaaaaa,14
1111111,bbbbbbbb,14
1111111,cccccccc,14
1111111,dddddddd,14
1111111,eeeeeeee,14
1111111,ffffffff,14
1111111,gggggggg,14
1111111,hhhhhhhh,14
2222222,aaaaaaaa,14
2222222,bbbbbbbb,14
2222222,cccccccc,14
2222222,dddddddd,14
2222222,eeeeeeee,14
2222222,ffffffff,14
2222222,gggggggg,14
3333333,aaaaaaaa,14
3333333,bbbbbbbb,14
3333333,cccccccc,14
3333333,dddddddd,14
3333333,eeeeeeee,14
3333333,ffffffff,14
3333333,gggggggg,14
3333333,hhhhhhhh,14

Output:

1111111,gggggggg,14
1111111,hhhhhhhh,14
2222222,gggggggg,14
3333333,gggggggg,14
3333333,hhhhhhhh,14

What I have tried is to transponse the 2nd and 3rd fields with respect to 1st, so that I can use nawk on the field of $7 or $8

#!/usr/bin/ksh awk -F"," '{ a[$1]; b[$1]=b[$1]","$2 c[$1]=c[$1]","$3} END{ for(i in a){ print i","b[i]","c[i]} } ' file > output.txt
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T20:07:47+00:00Added an answer on June 10, 2026 at 8:07 pm

    If your records are unordered

    i.e. you may have "1111111" items distributed randomly throughout your input:

    $ awk -F, '++a[$1] > 6' input.txt
    1111111,gggggggg,14
    1111111,hhhhhhhh,14
    2222222,gggggggg,14
    3333333,gggggggg,14
    3333333,hhhhhhhh,14
    

    How does this work?

    As you know, awk’s -F option sets the delimiter. If it’s not a special character, there’s no pressing need to quote it.

    Awk scripts consist of a series of blocks of condition { action; }. If the condition is missing, action is applied to every line. If the action is missing, it is implied to be print;. So an awk script that consists of simply a condition will print every input line for which that condition evaluates to true.

    In this case, our condition also has elements of an action. That it, it increments elements of an associative array where the keys are your first field. The increment happens regardless of whether the condition evaluates to true. Also, putting ++ ahead rather than following the variable causes the increment to happen before the evaluation rather than after it. (I’m talking about the difference between ++var and var++.) And if the resultant incremented array element is greater than 6, the condition evaluates to true, causing the line to print.

    This is functionally equivalent to the perl solutions in other answers, but because of the nature awk scripts is even tighter and (arguably) simpler. And of course, it’s likely to be faster. (In my informal test just now, the awk script above executed more than twice as fast as an equivalent perl script from another answer, processing 250000 lines of input in 0.23s of user time vs 0.61 seconds in perl.)

    If your records are ordered

    i.e. all your "1111111" lines are together:

    $ awk -F, '$1!=f{c=0;f=$1} ++c>6' input.txt
    1111111,gggggggg,14
    1111111,hhhhhhhh,14
    2222222,gggggggg,14
    3333333,gggggggg,14
    3333333,hhhhhhhh,14
    

    How does this work?

    • If we’re on a different $1 than last time (which is also true on the first line), we reset our counter and save $1 to a variable for future comparisons.
    • Then we increment the counter and print the line (implicitly) if the counter goes above 6.

    This has the advantage of not eating memory with an array, but is only appropriate if your goal is to match sequential sets of lines with common $1 rather than handle matching lines that may be randomly distributed throughout your input.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text file containing 5 columns of data. The first column contains
I have a file containing vectors of data, where each row contains a comma-separated
I have a file containing data in a single column .. I have to
i have a .txt file containing data like: He: 22.1 Ar: 21.1 K: 1.22
I have a file containing some data (for example, 00927E2B112DB958......). This data is a
I have a file containing lots of data put in a form similar to
I have a text file containing the data in following format 12345 Abdt3 hy45d
I have a file (10-20MB) containing data, where each line is a single piece
I have a .csv file containing 3 columns of data. I need to create
i have a file, filedata.mat, containing a 1x1 struct with sub-levels that contain data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.