I have a simple Python script that uses BeautifulSoup to find a section of

Question

0

Asked: June 14, 20262026-06-14T21:10:22+00:00 2026-06-14T21:10:22+00:00

I have a simple Python script that uses BeautifulSoup to find a section of

0

I have a simple Python script that uses BeautifulSoup to find a section of the HTML tree. For example, to find everything inside the <div id="doctext"> tags, the script does this:

html_section = str(soup.find("div", id="doctext"))

I would like to be able to make the arguments to find() vary, however, according to strings given in an input file. For example, a user could feed the script a URL followed by a string like "div", id="doctext", and the script would adjust the find accordingly. Imagine that the input file looks like this:

http://www.example.com | "div", id="doctext"

The script splits the line to get the URL, which works fine, but I want it to also grab the arguments. For example:

vars = line.split(' | ')
html = urllib2.urlopen(vars[0]).read()
soup = BeautifulSoup(html)
args = vars[1].split()
html_section = str(soup.find(*args))

This doesn’t work—and probably doesn’t make sense as I’ve been trying multiple ways to do this. How do I get the string provided by the input file and prepare it into the right syntax for the soup.find() function?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T21:10:24+00:00

You could parse line like this:

line = 'http://www.example.com | div, id=doctext'
url, args = line.split(' | ', 1)
args = args.split(',')
name = args[0]
params = dict([param.strip().split('=') for param in args[1:]])
print(name)
print(params)

yields

div
{'id': 'doctext'}

Then you could call soup.find like this:

html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
html_section = str(soup.find(name, **params))

WARNING: Note that if doctext (or some other keyword argument) contains a comma, then

args = args.split(',')

will split the parameters in the wrong place. This problem might arise if you are searching for some text content that contains a comma.

So let’s look for a better solution:

To avoid the problem described above, you might consider using the JSON format for the arguments: if line looks like this:

'http://www.example.com | ["div", {"id": "doctext"}]'

Then you could parse it with

import json
line = 'http://www.example.com | ["div", {"id": "doctext"}]'
url, arguments = line.split('|', 1)
url = url.strip()
arguments = json.loads(arguments)
args = []
params = {}
for item in arguments:
    if isinstance(item, dict):
        params = item
    else:
        args.append(item)

print(args)
print(params)

which yields

[u'div']
{u'id': u'doctext'}

Then you could call soup.find with

html_section = str(soup.find(*args, **params))

An added advantage is that you can supply any number of soup.find’s positional arguments (for name, attrs, recursive, and text), not just the name.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a simple Python script that uses BeautifulSoup to find a section of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply