Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8884883
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T21:10:22+00:00 2026-06-14T21:10:22+00:00

I have a simple Python script that uses BeautifulSoup to find a section of

  • 0

I have a simple Python script that uses BeautifulSoup to find a section of the HTML tree. For example, to find everything inside the <div id="doctext"> tags, the script does this:

html_section = str(soup.find("div", id="doctext"))

I would like to be able to make the arguments to find() vary, however, according to strings given in an input file. For example, a user could feed the script a URL followed by a string like "div", id="doctext", and the script would adjust the find accordingly. Imagine that the input file looks like this:

http://www.example.com | "div", id="doctext"

The script splits the line to get the URL, which works fine, but I want it to also grab the arguments. For example:

vars = line.split(' | ')
html = urllib2.urlopen(vars[0]).read()
soup = BeautifulSoup(html)
args = vars[1].split()
html_section = str(soup.find(*args))

This doesn’t work—and probably doesn’t make sense as I’ve been trying multiple ways to do this. How do I get the string provided by the input file and prepare it into the right syntax for the soup.find() function?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T21:10:24+00:00Added an answer on June 14, 2026 at 9:10 pm

    You could parse line like this:

    line = 'http://www.example.com | div, id=doctext'
    url, args = line.split(' | ', 1)
    args = args.split(',')
    name = args[0]
    params = dict([param.strip().split('=') for param in args[1:]])
    print(name)
    print(params)
    

    yields

    div
    {'id': 'doctext'}
    

    Then you could call soup.find like this:

    html = urllib2.urlopen(url).read()
    soup = BeautifulSoup(html)
    html_section = str(soup.find(name, **params))
    

    WARNING: Note that if doctext (or some other keyword argument) contains a comma, then

    args = args.split(',')
    

    will split the parameters in the wrong place. This problem might arise if you are searching for some text content that contains a comma.


    So let’s look for a better solution:

    To avoid the problem described above, you might consider using the JSON format for the arguments: if line looks like this:

    'http://www.example.com | ["div", {"id": "doctext"}]'
    

    Then you could parse it with

    import json
    line = 'http://www.example.com | ["div", {"id": "doctext"}]'
    url, arguments = line.split('|', 1)
    url = url.strip()
    arguments = json.loads(arguments)
    args = []
    params = {}
    for item in arguments:
        if isinstance(item, dict):
            params = item
        else:
            args.append(item)
    
    print(args)
    print(params)
    

    which yields

    [u'div']
    {u'id': u'doctext'}
    

    Then you could call soup.find with

    html_section = str(soup.find(*args, **params))
    

    An added advantage is that you can supply any number of soup.find’s positional arguments (for name, attrs, recursive, and text), not just the name.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a simple Python script that uses a signal handler for Ctl-C. If
I have a simple Python script that uses the socket module to send a
I have the following simple python test script that uses Suds to call a
I have a very simple python script that should scan a text file, which
I have a python script on a vps that I run a simple command
I have a command-line python script that uses a configuration file. I'm planning to
I have a simple Python script that receives username and password as arguments, but
I have a rather simple Python script that contains a function call like f(var,
I have a simple Python script that I want to stop executing if a
I have a simple python script that updates that statuses of justin.tv streams in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.