Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8477151
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T18:22:46+00:00 2026-06-10T18:22:46+00:00

I have a CSV-file similar to this test.csv file: Header 1; Header 2; Header

  • 0

I have a CSV-file similar to this test.csv file:

Header 1; Header 2; Header 3
A;B;US
C;D;US
E;F;US
G;H;FR
I;J;FR
K;L;FR
M;"String with ; semicolon";UK
N;"String without semicolon";UK
O;"String OK";
P;"String OK";

Now, I want to split this file based on header 3. So I want to end up with four separate CSV files, one for “US”, “FR”, “UK”, and “”.

With my very limited Linux command line skills (sadly 🙁 I used until now this line:

awk -F\; 'NR>1{ fname="country_yearly_"$3".csv"; print >>(fname); close(fname);}' test.csv

Of course, the experienced command line users of you will notice my problem: One field in my test.csv contains rows in which the semicolon which I use as a separator is also present in fields that are marked with quotation marks (I can’t guarantee that for sure because of millions of rows, but I’m happy with an answer that assumes this). So sadly, I get an additional file named country_yearly_ semicolon”.csv, which contains this row in my example.

In my venture to solve this issue, I came across this question on SO. In particular, Thor’s answer seems to contain the solution of my problem by replacing all semicolons in strings. I adjusted his code accordingly as follows:

awk -F'"' -v OFS='' '
  NF > 1 { 
    for(i=2; i<=NF; i+=2) { 
      gsub(";", "|", $i);
      $i = FS $i FS;       # reinsert the quotes
    }
    print
  }' test.csv > test1.csv

Now, I get the following test1.csv file:

M;"String with | semicolon";UK
N;"String without semicolon";UK
O;"String OK";
P;"String OK";

As you can see, all rows that have quotation marks are shown and my problem line is fixed as well, but a) I actually want all rows, not only those in quotation marks and I can’t figure out which part in his code does limit the rows to ones with quotation marks and b) I think it would be more efficient if test.csv is just changed instead of sending the output to a new file, but I don’t know how to do that either.

EDIT in response to Birei’s answer:

Unfortunately, my minimal example was too simple. Here is an updated version:

Header 1; Header 2; Header 3; Header 4
A;B;US; 
C;D;US;
E;F;US;
G;H;FR;
I;J;FR;
K;L;FR;
M;"String with ; semicolon";UK;"Yet another ; string"
N;"String without semicolon";UK; "No problem here"
O;"String OK";;"Fine"
P;"String OK";;"Not ; fine"

Note that my real data has roughly 100 columns and millions of rows and the country column, ignoring semicolons in strings, is column 13. However, as far as I see it I can’t use the fact that it’s column 13 if I don’t get rid of the semicolons in strings first.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T18:22:48+00:00Added an answer on June 10, 2026 at 6:22 pm

    To split the file, you might just do:

    awk -v FS=";" '{ CSV_FILE = "country_yearly_" $NF ".csv" ; print > CSV_FILE }'
    

    Which always take the last field to construct the file name.

    In your example, only lines with quotation marks are printed due to the NF > 1 pattern. The following script will print all lines:

    awk -F'"' -v OFS='' '
      NF > 1 { 
        for(i=2; i<=NF; i+=2) { 
          gsub(";", "|", $i);
          $i = FS $i FS;       # reinsert the quotes
        }
      }
      {
        # print all lines
        print
      }' test.csv > test1.csv
    

    To do what you want, you could change the line in the script and reprocess it:

    awk -F'"' -v OFS='' '
      # Save the original line
      { ORIGINAL_LINE = LINE = $0 }
      # Replace the semicolon inside quotes by a dummy character
      # and put the resulting line in the LINE variable
      NF > 1 {
        LINE = ""
        for(i=2; i<=NF; i+=2) { 
          gsub(";", "|", $i)
          LINE = LINE $(i-1) FS $i FS     # reinsert the quotes
        }
        # Add the end of the line after the last quote
        if ( $(i+1) ) { LINE = LINE $(i+1) }
      }
      {
        # Put the semicolon-separated fields in a table
        # (the semicolon inside quotes have been removed from LINE)
        split( LINE, TABLE, /;/ )
        # Build the file name -- TABLE[ 3 ] is the 3rd field
        CSV_FILE = "country_yearly_" TABLE[ 3 ] ".csv"
        # Save the line
        print ORIGINAL_LINE > CSV_FILE
      }'
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Say I have a CSV file with thousands of lines similar to this one
I have an input CSV file with a column containing information similar to the
I have CSV file and Macro in VBA. I want to open CSV file
I have a CSV file that goes something like this: ['Name1', '', '', '',
I have a csv file, and I want to extract the each column a
I have a CSV file that I use split to parse into an array
I have a csv format file, which I want to import to sql server
I am dealing with a CSV file similar to this one foo; val1; position1
My question is similar to this . Basically there is a CSV file but
Similar to this question I have a small groovy test script that basically uses

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.