Let’s consider a big file (~100MB). Let’s consider that the file is line-based (a

Question

0

Asked: May 25, 20262026-05-25T00:41:49+00:00 2026-05-25T00:41:49+00:00

Let’s consider a big file (~100MB). Let’s consider that the file is line-based (a

0

Let’s consider a big file (~100MB). Let’s consider that the file is line-based (a text file, with relatively short line ~80 chars).
If I use built-in open()/file() the file will be loaded in lazy manner.
I.E. if a I do aFile.readline() only a chunk of a file will reside in memory. Does the urllib.urlopen() do something similar (with usage of a cache on disk)?

How big is the difference in performance between urllib.urlopen().readline() and file().readline()? Let’s consider that file is located on localhost. Once I open it with urllib.urlopen() and then with file(). How big will be difference in performance/memory consumption when i loop over the file with readline()?

What is best way to process a file opened via urllib.urlopen()? Is it faster to process it line by line? Or shall I load bunch of lines(~50) into a list and then process the list?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T00:41:50+00:00

Does the urllib.urlopen() do something similar (with usage of a cache on disk)?

The operating system does. When you use a networking API such as urllib, the operating system and the network card will do the low-level work of splitting data into small packets that are sent over the network, and to receive incoming packets. Those are stored in a cache, so that the application can abstract away the packet concept and pretend it would send and receive continuous streams of data.

How big is the difference in performance between urllib.urlopen().readline() and file().readline()?

It is hard to compare these two. For urllib, this depends on the speed of the network, as well as the speed of the server. Even for local servers, there is some abstraction overhead, so that, usually, it is slower to read from the networking API than from a file directly.

For actual performance comparisons, you will have to write a test script and do the measurement. However, why do you even bother? You cannot replace one with another since they serve different purposes.

What is best way to process a file opened via urllib.urlopen()? Is it faster to process it line by line? Or shall I load bunch of lines(~50) into a list and then process the list?

Since the bottle neck is the networking speed, it might be a good idea to process the data as soon as you get it. This way, the operating system can cache more incoming data “in the background”.

It makes no sense to cache lines in a list before processing them. Your program will just sit there waiting for enough data to arrive while it could be doing something useful already.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s consider a big file (~100MB). Let’s consider that the file is line-based (a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply