Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8834837
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T09:07:07+00:00 2026-06-14T09:07:07+00:00

I am trying to write a script to scrape a website, and am using

  • 0

I am trying to write a script to scrape a website, and am using this one (http://www.theericwang.com/scripts/eBayRead.py).

I however want to use it to crawl sites other than ebay, and to customize to my needs.

I am fairly new to python and have limited re experience.

I am unsure of what this line achieves.

for url, title in re.findall(r'href="([^"]+).*class="vip" title=\'([^\']+)', lines):

Could someone please give me some pointers?

Is there anything else I need to consider if I port this for other sites?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T09:07:08+00:00Added an answer on June 14, 2026 at 9:07 am

    In general, parsing HTML is best done with a library such as BeautifulSoup, which takes care of virtually all of the heavy lifting for you, leaving you with more intuitive code. Also, read @Tadeck’s link below – regex and HTML shouldn’t be mixed if it can be avoided (to put it lightly).

    As for your question, that line uses something called ‘regular expression’ to find matching patterns in a text (in this case, HTML). re.findall() is a method that returns a list, so if we focus on just that:

    re.findall(r'href="([^"]+).*class="vip" title=\'([^\']+)', lines):
    

    r indicates that the following will be interpreted ‘raw’, meaning that characters like backslashes, etc., will be interpreted literally.

    href="([^"]+)
    

    The parentheses indicate a group (what we care about in the match), and the [^"]+ means ‘match anything that isn’t a quote’. As you can probably guess, this group will return the URL of the link.

    .*class="vip"
    

    The .* matches anything (well, almost anything) 0 or more times (which here could include other tags, the closing quote of the link, whitespace, etc.). Nothing special with class="vip" – it just needs to appear.

    title=\'([^\']+)', lines):
    

    Here you see an escaped quote and then another group as we saw above. This time, we are capturing anything between the two apostrophes after the title tag.

    The end result of this is you are iterating through a list of all matches, and those matches are going to look something like (my_matched_link, my_matched_title), which are passed into for url, title, after which further processing is done.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to write parsing script using python/scrapy. How can I remove [] and
Im trying to write this script to say if email is valied insert into
I'm trying to write a script to query ActiveMQ through the shell (using the
I am trying to write some code, to automatically fill this webform: http://scoweb.sco.ca.gov/UCP/ Then
I am trying to scrape http://www.nscb.gov.ph/ggi/database.asp , specifically all the tables you get from
I'm trying to write a script in Ruby to parse a Wikipedia article using
I'm trying to write a script to login to a Drupal website automagically to
I'm trying to write a script using Android's shell to rename all files of
Im trying to write a script using python language(abc.py). I need to take few
I'm trying to write a script to visit links for movies at boxofficemojo.com and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.