Is there anyway to load a URL via Python and then retrieve a list of all of the images that were loaded via that URL? I’m essentially looking to do something similar to TamperData or Fiddler and retrieve a list of all images that a given website loaded.
Share
Interesting task. Here’s one way of solving it, along the line suggested by Jochen Ritzel.
It uses pylibpcap instead of pycap. Personally, I find pycap to be hard to work with due to little amount of documentation available. For pylibpcap, you can translate most code directly from the libpcap examples (see for example this tutorial for a nice introduction). The man pages for tcpdump and pcap are also great resources.
You may want to look at the standards for Ethernet, IPv4, TCP, and HTTP.
Note 1: The code below only prints out the HTTP GET requests. Filtering out the images and downloading them using the urllib module should pose no problem.
Note 2: This code works on Linux, not sure what device names you need to use on Windows/MacOS. You’ll also need root privileges.