Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8096177
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T21:22:10+00:00 2026-06-05T21:22:10+00:00

I am dealing with very primitive HTML construction that goes like this: <a NAME=header1></a><b><font

  • 0

I am dealing with very primitive HTML construction that goes like this:

<a NAME="header1"></a><b><font face="Verdana, Serif"><font color="#000000"><font size=+1>Hygiene</font></font></font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Shampoo</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000"></font>Soap</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Deodorant</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Toothpaste</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000"></font>Brush</font></b> 

<a NAME="header2"></a><b><font face="Verdana, Serif"><font color="#000000"><font size=+1>Food</font></font></font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Meat</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Vegetables</font></b> 
    <p><b><font face="Verdana, Serif"><font color="#000000">Fruit</font></b> 

The thing is now, I want to get all items from Hygiene header (top) which are Shampoo, Soap, Deodorant, Toothpaste, Brush (and put them in let’s say HashMap> for now).

I use this XPath to get the headers (Hygiene and Food):

//html/body//b/font/font/font

And it works fine, I get what I need.

Then I use this XPath to collect the items:

//html/body//p/b/font/font

for ALL items. So this (last) XPath would return a list from all items which are [Shampoo, Soap, Deodorant, Toothpaste, Brush, Meat, Vegetables, Fruit]. The thing is that I don’t know when to stop putting items in the first list (like, when another header starts, which is Food in this case, create new list and put the Food items there). All I can get with this XPaths is the values of the headers (Hygiene, Food) and ALL items from both lists (not separate).

I need to get something like:

  • Map{“Hygiene”, [Shampoo, Soap, Deodorant, Toothpaste, Brush]}
  • Map{“Food”, [Meat, Vegetables, Fruit]}

All items are thrown like this and they are not in separate divs or spans so that I can recognize when new header had cometh.

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T21:22:11+00:00Added an answer on June 5, 2026 at 9:22 pm

    It’s not easy to parse this HTML because it’s not amenable to parsing (judging from the <font> tags you could probably use some colorful language about it as well).

    AFAIK there’s no way to express a “following siblings until X” condition in XPath, so here’s an alternative: use one XPath expression that matches both headers and items, for example with this specific markup you could use

    //body//font/child::text()
    

    which will select all text nodes (“Hygiene”, “Shampoo”, “Soap”, …).

    The nodes will be returned in document order (this is extremely important), so afterwards you can iterate over the results and perform a test on each to determine if it’s a header or an item (in this case you could check if the parent is a <font> element that has a size attribute).

    This way you can keep a reference to the last “header” found and add all following “items” to an appropriate data structure under it until you come across the next header, etc.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm dealing with very big data in matlab and used to store this data
This is very closely related to this other question , but that question wanted
I am trying to do some html parsing. I am dealing with some very
This is my very first attempt at dealing with XSL, so please be kind
I am dealing with a very big database ~ 6 Million records. I've added
When dealing with events, people are usually taking examples of very simple values object
I've been dealing with this problem for my thesis. The goal is to develop
I am dealing with a set of native functions that return data through dynamically-allocated
I'm dealing with some very non-uniform data and I'm using Ruby regular expressions to
I'm having a very hard time dealing with multipart/form-data requests with my java application

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.