Having an issue with a custom iterator in that it will only iterate over the file once. I am calling seek(0) on the relevant file object in between iterations, but StopIteration is thrown on the first call to next() on the 2nd run through. I feel I am overlooking something obvious, but would appreciate some fresh eyes on this:
class MappedIterator(object):
"""
Given an iterator of dicts or objects and a attribute mapping dict,
will make the objects accessible via the desired interface.
Currently it will only produce dictionaries with string values. Can be
made to support actual objects later on. Somehow... :D
"""
def __init__(self, obj=None, mapping={}, *args, **kwargs):
self._obj = obj
self._mapping = mapping
self.cnt = 0
def __iter__(self):
return self
def reset(self):
self.cnt = 0
def next(self):
try:
try:
item = self._obj.next()
except AttributeError:
item = self._obj[self.cnt]
# If no mapping is provided, an empty object will be returned.
mapped_obj = {}
for mapped_attr in self._mapping:
attr = mapped_attr.attribute
new_attr = mapped_attr.mapped_name
val = item.get(attr, '')
val = str(val).strip() # get rid of whitespace
# TODO: apply transformers...
# This allows multi attribute mapping or grouping of multiple
# attributes in to one.
try:
mapped_obj[new_attr] += val
except KeyError:
mapped_obj[new_attr] = val
self.cnt += 1
return mapped_obj
except (IndexError, StopIteration):
self.reset()
raise StopIteration
class CSVMapper(MappedIterator):
def __init__(self, reader, mapping={}, *args, **kwargs):
self._reader = reader
self._mapping = mapping
self._file = kwargs.pop('file')
super(CSVMapper, self).__init__(self._reader, self._mapping, *args, **kwargs)
@classmethod
def from_csv(cls, file, mapping, *args, **kwargs):
# TODO: Parse kwargs for various DictReader kwargs.
return cls(reader=DictReader(file), mapping=mapping, file=file)
def __len__(self):
return int(self._reader.line_num)
def reset(self):
if self._file:
self._file.seek(0)
super(CSVMapper, self).reset()
Sample usage:
file = open('somefile.csv', 'rb') # say this file has 2 rows + a header row
mapping = MyMappingClass() # this isn't really relevant
reader = CSVMapper.from_csv(file, mapping)
# > 'John'
# > 'Bob'
for r in reader:
print r['name']
# This won't print anything
for r in reader:
print r['name']
I think that you are better off not trying to do the
.seek(0)but rather opening the file from the filename each time.And I don’t recommend you just return
selfin the__iter__()method. That means you only ever have one instance of your object. I don’t know how likely it is for someone to try to use your object from two different threads, but if that happened the results would be surprising.So, save the filename, and then in the
__iter__()method, create a fresh object with a freshly initialized reader object and a freshly opened file handle object; return this new object from__iter__(). This will work every time, no matter what the file-like object really is. It could be a handle to a networking function that is pulling data from a server, or who knows what, and it might not support a.seek()method; but you know that if you just open it again you will get a fresh file handle object. And if someone uses thethreadingmodule to run 10 instances of your class in parallel, each one will always get all of the lines from the file, instead of each randomly getting about a tenth of the lines.Also, I don’t recommend your exception handler inside the
.next()method inMappedIterator. The.__iter__()method should return an object that can be reliably iterated. If a silly user passes in an integer object (for example: 3), this won’t be iterable. Inside.__iter__()you can always explicitly calliter()on an argument, and if it is already an iterator (for example, an open file handle object) you will just get the same object back; but if it is a sequence object, you will get an iterator that works on the sequence. Now if the user passes in 3, the call toiter()will raise an exception that makes sense right at the line where the user passed the 3, rather than the exception coming from the first call to.next(). And as a bonus, you don’t need thecntmember variable anymore, and your code will be a little bit faster.So, if you put together all my suggestions, you might get something like this:
Now the
.__iter__()method gives you a fresh object every time you call it.Note how the example code uses a list of strings instead of opening a file. In this example, you need to specify an
open_with()function to be used instead of the defaultopen()to open the file. Since our list of strings can be iterated to return one string at a time, we can simply useiteras ouropen_withfunction here.I didn’t understand your mapping code.
csv.readerreturns a list of string values, not some kind of a dictionary, so I wrote some trivial mapping code that works for CSV files with two columns, the first one a string. Clearly you should chop out my trivial mapping code and put in the desired mapping code.Also, I took out your
.__len__()method. This returns the length of a sequence when you do something likelen(obj); you had it returningline_numwhich means that the value oflen(obj)would change every time you call the.next()method. If users want to know the length, they should store the results in a list and take the length of the list, or something like that.EDIT: I added
**self._kwargsto the call tocall_with()in the.__iter__()method. That way, if yourcall_with()function needs any extra arguments they will be passed through. Before I made this change, there wasn’t really a good reason to save thekwargsargument in the object; it would have been just as good to add acall_withargument to the class.__init__()method, with a default argument ofNone. I think this change is a good one.