Hi I have a ton of data in multiple csv files and filter out

Question

0

Asked: May 17, 20262026-05-17T16:58:32+00:00 2026-05-17T16:58:32+00:00

Hi I have a ton of data in multiple csv files and filter out

0

Hi I have a ton of data in multiple csv files and filter out a data set using grep:

user@machine:~/$ cat data.csv | grep -a "63[789]\...;"
637.05;1450.2
637.32;1448.7
637.60;1447.7
637.87;1451.5
638.14;1454.2
638.41;1448.6
638.69;1445.8
638.96;1440.0
639.23;1431.9
639.50;1428.8
639.77;1427.3

I want to figure out the data set which has the highest count, the column right of the ; and then know the corresponding value (left of the ;). In this case the set I’m looking for would be 638.14;1454.2

I tried different things and ended up using a combination of bash and python, which works, but isn’t very pretty:

os.system('ls | grep csv > filelist')
files = open("filelist")
files = files.read()
files = files.split("\n")

for filename in files[0:-1]:
  os.system('cat ' + filename + ' | grep -a "63[6789]\...;" > filtered.csv')
  filtered = csv.reader(open('filtered.csv'), delimiter=';')
  sortedlist = sorted(filtered_file, key=operator.itemgetter(1), reverse=True)
  dataset = sortedlist[0][0] + ';' + sortedlist[0][1] + '\n'

I would love to have a bash only solution (cut, awk, arrays?!?) but couldn’t figure it out. Also I don’t like the work around writing the bash commands into files and then reading them into python variables. Can I read them into variables directly or are there better solutions to this problem? (probably perl etc… but I am really interested in a bash solution..)

Thank you very much!!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T16:58:33+00:00

If you are going to use Python, then use Python. Why are you intermixing bash commands together? It makes your code not portable/dependent on a bash environment.

import os
import glob
import operator
os.chdir("/mypath")
for file in glob.glob("*.csv"):
    data=open(file).readlines()
    data=[i.strip().split(";") for i in data if i[:3] in ["637","638","639"]]
    # data=[i.strip().split(";") for i in data if i[:3] in ["637","638","639"] and isinstance(float(i[:6]),float) ]
    sortedlist = sorted(data, key=operator.itemgetter(1), reverse=True)
    print "Highest for file %s: %s" % (file,sortedlist[0])

OR, if you are more interested in a bash+tools solution

find . -type f -name '*.csv' |while read -r FILE
do
 grep -a "63[789]\...;" "$FILE" | sort -n -r -t ';' -k 2 | head -1  >> output.txt
done

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hi I have a ton of data in multiple csv files and filter out

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply