How can I filter which lines of a CSV to be loaded into memory using pandas? This seems like an option that one should find in read_csv. Am I missing something?
Example: we’ve a CSV with a timestamp column and we’d like to load just the lines that with a timestamp greater than a given constant.
There isn’t an option to filter the rows before the CSV file is loaded into a pandas object.
You can either load the file and then filter using
df[df['field'] > constant], or if you have a very large file and you are worried about memory running out, then use an iterator and apply the filter as you concatenate chunks of your file e.g.:You can vary the
chunksizeto suit your available memory. See here for more details.