Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 502415
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T06:17:20+00:00 2026-05-13T06:17:20+00:00

In answer to a previous question , several people suggested that I use BeautifulSoup

  • 0

In answer to a previous question, several people suggested that I use BeautifulSoup for my project. I’ve been struggling with their documentation and I just cannot parse it. Can somebody point me to the section where I should be able to translate this expression to a BeautifulSoup expression?

hxs.select('//td[@class="altRow"][2]/a/@href').re('/.a\w+')

The above expression is from Scrapy. I’m trying to apply the regex re('\.a\w+') to td class altRow to get the links from there.

I would also appreciate pointers to any other tutorials or documentation. I couldn’t find any.

Thanks for your help.

Edit:
I am looking at this page:

>>> soup.head.title
<title>White & Case LLP - Lawyers</title>
>>> soup.find(href=re.compile("/cabel"))
>>> soup.find(href=re.compile("/diversity"))
<a href="/diversity/committee">Committee</a> 

Yet, if you look at the page source "/cabel" is there:

 <td class="altRow" valign="middle" width="34%"> 
 <a href='/cabel'>Abel, Christian</a> 

For some reason, search results are not visible to BeautifulSoup, but they are visible to XPath because hxs.select('//td[@class="altRow"][2]/a/@href').re('/.a\w+') catches “/cabel”

Edit:
cobbal: It is still not working. But when I search this:

>>>soup.findAll(href=re.compile(r'/.a\w+'))
[<link href="/FCWSite/Include/styles/main.css" rel="stylesheet" type="text/css" />, <link rel="shortcut icon" type="image/ico" href="/FCWSite/Include/main_favicon.ico" />, <a href="/careers/northamerica">North America</a>, <a href="/careers/middleeastafrica">Middle East Africa</a>, <a href="/careers/europe">Europe</a>, <a href="/careers/latinamerica">Latin America</a>, <a href="/careers/asia">Asia</a>, <a href="/diversity/manager">Diversity Director</a>]
>>>

it returns all the links with second character “a” but not the lawyer names. So for some reason those links (such as “/cabel”) are not visible to BeautifulSoup. I don’t understand why.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T06:17:21+00:00Added an answer on May 13, 2026 at 6:17 am

    I know BeautifulSoup is the canonical HTML parsing module, but sometimes you just want to scrape out some substrings from some HTML, and pyparsing has some useful methods to do this. Using this code:

    from pyparsing import makeHTMLTags, withAttribute, SkipTo
    import urllib
    
    # get the HTML from your URL
    url = "http://www.whitecase.com/Attorneys/List.aspx?LastName=&FirstName="
    page = urllib.urlopen(url)
    html = page.read()
    page.close()
    
    # define opening and closing tag expressions for <td> and <a> tags
    # (makeHTMLTags also comprehends tag variations, including attributes, 
    # upper/lower case, etc.)
    tdStart,tdEnd = makeHTMLTags("td")
    aStart,aEnd = makeHTMLTags("a")
    
    # only interested in tdStarts if they have "class=altRow" attribute
    tdStart.setParseAction(withAttribute(("class","altRow")))
    
    # compose total matching pattern (add trailing tdStart to filter out 
    # extraneous <td> matches)
    patt = tdStart + aStart("a") + SkipTo(aEnd)("text") + aEnd + tdEnd + tdStart
    
    # scan input HTML source for matching refs, and print out the text and 
    # href values
    for ref,s,e in patt.scanString(html):
        print ref.text, ref.a.href
    

    I extracted 914 references from your page, from Abel to Zupikova.

    Abel, Christian /cabel
    Acevedo, Linda Jeannine /jacevedo
    Acuña, Jennifer /jacuna
    Adeyemi, Ike /igbadegesin
    Adler, Avraham /aadler
    ...
    Zhu, Jie /jzhu
    Zídek, Aleš /azidek
    Ziółek, Agnieszka /aziolek
    Zitter, Adam /azitter
    Zupikova, Jana /jzupikova
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In a previous question I asked, the suggested answer was for me to partition
Just got this answer from a previous question and it works a treat! SELECT
In an answer to a previous question: How can I use 'do I have
I've found an answer to a previous question about javascript short hand for this
Expanding on Jon Skeet's answer to This Previous Question . Skeet doesn't address the
This question is based on my previous question which I got a working answer
This is a CSS related question, I got one good answer from my previous
An answer posted for one of my previous questions brings up another question; I
(This question is related to my previous question , or rather to my answer
I've heard now from several sources (stackoverflow.com, cocoa-dev, the documentation, blogs, etc) that it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.