Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8080775
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T16:31:02+00:00 2026-06-05T16:31:02+00:00

I am trying to do some html parsing. I am dealing with some very

  • 0

I am trying to do some html parsing.
I am dealing with some very dynamic data, and my sources vary widely.
If to be more specific, I am trying to parse product information, including
name, price and description from pages that I do not know in advance.

Throughout these pages, the only basic information the stays the same is the title of the page
the name of them item I am querying (they both match each other) and the price.
The only real logic that remains the same throughout different websites is the
proximity between the different sets of information.
So, a price label will be close to the product’s name and close to its description.

I am looking for an html parser that will give me the ability to narrow down my parsing based on the distance in pixels between the different html tags.

Do you know of such a library?
Is there any other way I could try to tackle this issue?

EDIT:

The language, the os and the resolution don’t metter.
What tools do you know that might help with this problem?
I might decide to change my underlaying OS and language if I
find a good enough library.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T16:31:03+00:00Added an answer on June 5, 2026 at 4:31 pm

    The price of an item is normally preceeded by a particular special character denoting the currency inside the same tag as the numerals displaying the value in a eg:

    <div class="product_value">£ 10.99</div>
    <div class="product_value">¥ 10.99</div>
    <div class="product_value">$ 10.99</div>
    

    Assuming you are using a search API such as google or bing to get a list of pages that contain a specific products name then opening that page up a simple regex statement will be able to retrieve everything between the currency marker (£,$,¥ etc) and the end of div or span.

    However if the search results throw up pages that contain more than one product or multiple price markers then this system will may not work quite as well as hoped. The only way to be sure is to code individual scraper routines for each site or try and scrape somebody elses comparison service.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Trying to print out some html forms but I get a parsing syntax error.
I'm trying to parse some HTML. I use stringWithContentsOfURL to get the HTML. I
I'm converting some html parsing code from BeautifulSoup to lxml. I'm trying to figure
I'm trying to write some html in the Html tab of TinyMCE while editing
I'm trying to generate some html programmatically in my code behind for a user
I am trying to format some html output from my db using php and
I am trying to fix some HTML, and it is working perfect. But one
I'm trying to add some HTML formatted text to Word using Office Interop. My
I'm trying to get some HTML to work with my python code. I've got
I'm trying to put some HTML content inside <content:encoded> tags using ROME and its

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.