I have the following code that is part of a tutorial
import csv as csv
import numpy as np
csv_file_object = csv.reader(open("train.csv", 'rb'))
header = csv_file_object.next()
data = []
for row in csv_file_object:
data.append(row)
data = np.array(data)
the code works as it is supposed to but it is not clear to me why calling .next() on the file with the variable header works. Isn’t csv_file_object still the entire file? How does the program know to skip the header row when for row in csv_file_object is called since it doesn’t appear the variable header is ever referenced once defined?
The header row is “skipped” as a result of calling
next(). That’s how iterators work.When you loop over an iterator, its
next()method is called each time. Each call advances the iterator. When theforloop starts, the iterator is already at the second row, and it goes from there on.Here’s the documentation on the
next()method (here’s another piece).What’s important is that
csv.readerobjects are iterators, just like file object returned byopen(). You can iterate over them, but they don’t contain all of the lines (or any of the lines) at any given moment.