I’m having a hard time breaking a large (50GB) csv file into smaller part.

Question

0

Asked: June 9, 20262026-06-09T18:46:45+00:00 2026-06-09T18:46:45+00:00

I’m having a hard time breaking a large (50GB) csv file into smaller part.

0

I’m having a hard time breaking a large (50GB) csv file into smaller part. Each line has a few thousand fields. Some of the fields are strings in double quotes, others are integers, decimals and boolean.

I want to parse the file line by line and split by the number of fields in each row. The strings contain possibly several commas (such as ), as well as a number of empty fields.

,,1,30,50,”Sold by father,son and daughter for $4,000″ , ,,,, 12,,,20.9,0,

I tried using

perl -pe'  s{("[^"]+")}{($x=$1)=~tr/,/|/;$x}ge  '  file >> file2

to change the commas inside the quotes to | but that didn’t work. I plan to use

awk -F"|" conditional statement appending to new k_fld_files file2

Is there an easier way to do this please? I’m looking at python, but I probably need a utility that will stream process the file, line by line.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T18:46:47+00:00

Using Python – if you just want to parse CSV including embedded delimiters, and stream out with a new delimiter, then something such as:

import csv
import sys
with open('filename.csv') as fin:
    csvout = csv.writer(sys.stdout, delimiter='|')
    for row in csv.reader(fin):
        csvout.writerow(row)

Otherwise, it’s not much more difficult to make this do all kinds of stuff.

Example of outputting to files per column (untested):

cols_to_output = {}
for row in csv.reader(fin):
    for colno, col in enumerate(row):
        output_to = cols_to_output.setdefault(colno, open('column_output.{}'.format(colno), 'wb')
        csv.writer(output_to).writerow(row)

for fileno in cols_to_output.itervalues():
    fileno.close()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having a hard time breaking a large (50GB) csv file into smaller part.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply