I’m aware that there are modules that fully simplify this function, but saying that

Question

0

Asked: June 15, 20262026-06-15T10:20:42+00:00 2026-06-15T10:20:42+00:00

I’m aware that there are modules that fully simplify this function, but saying that

0

I’m aware that there are modules that fully simplify this function, but saying that I am running from a base install of python (standard modules only), how would I extract the following:

I have a list. This list is the contents, line by line, of a webpage. Here is a mock up list (unformatted) for informative purposes:

<script>
    link = "/scripts/playlists/1/" + a.id + "/0-5417069212.asx";
<script>

"<a href="/apps/audio/?feedId=11065"><span class="px13">Eastern Metro Area Fire</span>"

From the above string, I need the following extracted. The feedId (11065), which is incidentally a.id in the code above., “/scripts/playlists/1/” and “/0-5417069212.asx”. Remembering that each of these lines is just contents from objects in a list, how would I go about extracting that data?

Here is the full list:

contents = urllib2.urlopen("http://www.radioreference.com/apps/audio/?ctid=5586")

Pseudo:

from urllib2 import urlopen as getpage
page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586")

feedID        = % in (page_contents.search() for "/apps/audio/?feedId=%")
titleID       = % in (page_contents.search() for "<span class="px13">%</span>")
playlistID    = % in (page_contents.search() for "link = "%" + a.id + "*.asx";")
asxID         = * in (page_contents.search() for "link = "*" + a.id + "%.asx";")

streamURL     = "http://www.radioreference.com/" + playlistID + feedID + asxID + ".asx"

I plan to format it as such that streamURL should = :

http://www.radioreference.com/scripts/playlists/1/11065/0-5417067072.asx

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T10:20:43+00:00

I’d do this with regular expressions. Python’s re module is great!

However, it’s easier (and faster) to search a single string holding all the page’s text (rather than doing repeated searches line by line). If you can, do a read() on the file-like object you get when you open the URL, rather than readlines() (or directly iterating over the file object). If you can’t do that, you can use "\n".join(list_of_strings) to get the lines back into a single string.

Here’s some code that works for me on your example URL:

from urllib2 import urlopen
import re

contents = urlopen("http://www.radioreference.com/apps/audio/?ctid=5586").read()

playlist_pattern = r'link = "([^"]+)" \+ a.id \+ "([^"]+\.asx)'
feed_pattern = r'href="/apps/audio/\?feedId=(\d+)"><span class="px13">([^<]+)'
pattern = playlist_pattern + ".*" + feed_pattern

playlist, asx, feed, title = re.search(pattern, contents, re.DOTALL).groups()

streamURL = "http://www.radioreference.com" + playlist + feed + asx

print title
print streamURL

Output:

Eastern Metro Area Fire
http://www.radioreference.com/scripts/playlists/1/11065/0-5417090148.asx

It’s not strictly necessary to do all the matching in one pass. You can use playlist_pattern and feed_pattern to get two parts each, if you want. It is a little more difficult to split either of the halves up though, since you’ll start running into extra matches for some of the pieces (there are several identical link = "stuff" sections, for instance).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m aware that there are modules that fully simplify this function, but saying that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply