I’ve got the following snippet of code
def send(self, queue, fd):
for line in fd:
data = line.strip()
if data:
queue.write(json.loads(data))
Which of course works just fine, but I wonder sometimes if there is a “better” way to write that construct where you only act on non-blank lines.
The challenge is this should use the iterative nature of the for the ‘fd’ read and be able to handle files in the 100+ MB range.
UPDATE –
In your haste to get points for this question you’re ignoring an import part, which is memory usage. For instance the expression:
non_blank_lines = (line.strip() for line in fd if line.strip())
Is going to buffer the whole file into memory, not to mention performing a strip() action twice. Which will work for small files, but fails when you’ve got 100+MB of data (or once in a while a 100GB).
Part of the challenge is the following works, but is soup to read:
for line in ifilter(lambda l: l, imap(lambda l: l.strip(), fd)):
queue.write(json.loads(line))
Look for magic folks!
FINAL UPDATE: PEP-289 is very useful for my own better understanding of the difference between [] and () with iterators involved.
There’s nothing wrong with the code as written, it’s readable and efficient.
An alternative approach would be to write it as a generator comprehension:
This approach can be beneficial (terser) if you are applying a function that can take an iterator: e.g. python3 print
To do away with the multiple calls to strip(), chain together generator comprehensions
Note that generator expressions will not adversely affect memory as detailed in this pep.
For a more in depth look at this approach, and some performance bench marks, take a look at this set of notes.
Finally note that rstrip() will outperform strip() if you don’t need the full behaviour of strip().