Ajax is, in short, the process of communicating with a…

Question

0

Editorial Team

Asked: May 16, 20262026-05-16T09:51:38+00:00 2026-05-16T09:51:38+00:00

I’d like to download, extract and iterate over a text file in Python without

0

I’d like to download, extract and iterate over a text file in Python without having to create temporary files.

basically, this pipe, but in python

curl ftp://ftp.theseed.org/genomes/SEED/SEED.fasta.gz | gunzip | processing step

Here’s my code:

def main():
    import urllib
    import gzip

    # Download SEED database
    print 'Downloading SEED Database'
    handle = urllib.urlopen('ftp://ftp.theseed.org/genomes/SEED/SEED.fasta.gz')


    with open('SEED.fasta.gz', 'wb') as out:
        while True:
            data = handle.read(1024)
            if len(data) == 0: break
            out.write(data)

    # Extract SEED database
    handle = gzip.open('SEED.fasta.gz')
    with open('SEED.fasta', 'w') as out:
        for line in handle:
            out.write(line)

    # Filter SEED database
    pass

I don’t want to use process.Popen() or anything because I want this script to be platform-independent.

The problem is that the Gzip library only accepts filenames as arguments and not handles. The reason for “piping” is that the download step only uses up ~5% CPU and it would be faster to run the extraction and processing at the same time.

EDIT:
This won’t work because

“Because of the way gzip compression
works, GzipFile needs to save its
position and move forwards and
backwards through the compressed file.
This doesn’t work when the “file” is a
stream of bytes coming from a remote
server; all you can do with it is
retrieve bytes one at a time, not move
back and forth through the data
stream.” – dive into python

Which is why I get the error

AttributeError: addinfourl instance has no attribute 'tell'

So how does curl url | gunzip | whatever work?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T09:51:38+00:00

Editorial Team

2026-05-16T09:51:38+00:00Added an answer on May 16, 2026 at 9:51 am

Just gzip.GzipFile(fileobj=handle) and you’ll be on your way — in other words, it’s not really true that “the Gzip library only accepts filenames as arguments and not handles”, you just have to use the fileobj= named argument.

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions