Is there anyway to load a URL via Python and then retrieve a list

Question

0

Asked: May 24, 20262026-05-24T01:48:59+00:00 2026-05-24T01:48:59+00:00

Is there anyway to load a URL via Python and then retrieve a list

0

Is there anyway to load a URL via Python and then retrieve a list of all of the images that were loaded via that URL? I’m essentially looking to do something similar to TamperData or Fiddler and retrieve a list of all images that a given website loaded.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T01:48:59+00:00

Interesting task. Here’s one way of solving it, along the line suggested by Jochen Ritzel.

It uses pylibpcap instead of pycap. Personally, I find pycap to be hard to work with due to little amount of documentation available. For pylibpcap, you can translate most code directly from the libpcap examples (see for example this tutorial for a nice introduction). The man pages for tcpdump and pcap are also great resources.

You may want to look at the standards for Ethernet, IPv4, TCP, and HTTP.

Note 1: The code below only prints out the HTTP GET requests. Filtering out the images and downloading them using the urllib module should pose no problem.

Note 2: This code works on Linux, not sure what device names you need to use on Windows/MacOS. You’ll also need root privileges.

#!/usr/bin/env python

import pcap
import struct

def parse_packet(data):
    """
    Parse Ethernet/IP/TCP packet.
    """
    # See the Ethernet, IP, and TCP standards for details.

    data = data[14:] # Strip Ethernet header

    header_length = 4 * (ord(data[0]) & 0x0f) # in bytes
    data = data[header_length:]  # Strip IP header

    dest_port = struct.unpack('!H', data[2:4])[0]
    if not dest_port == 80: # This is an outgoing package
        return

    header_length = 4 * ((ord(data[12]) & 0xf0) >> 4) # in bytes
    data = data[header_length:] # Strip TCP header

    return data


def parse_get(data):
    """
    Parse a HTTP GET request, returning the request URI.
    """
    if data is None or not data.startswith('GET'):
        return

    fields = data.split('\n')
    uri = fields[0].split()[1]

    for field in fields[1:]:
        if field.lower().startswith('host:'):
            return field[5:].strip() + uri


def packet_handler(length, data, timestamp):
    uri = parse_get(parse_packet(data))
    if not uri is None:
        print uri


# Set up pcap sniffer
INTERFACE = 'wlan0'
FILTER = 'tcp port 80'
p = pcap.pcapObject()
p.open_live(INTERFACE, 1600, 0, 100)
p.setfilter(FILTER, 0, 0)

try:
    while True:
        p.dispatch(1, packet_handler)
except KeyboardInterrupt:
    pass

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is there anyway to load a URL via Python and then retrieve a list

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply