Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8449617
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T10:45:33+00:00 2026-06-10T10:45:33+00:00

I am trying to solve the following problem. Assume I have a HTML file

  • 0

I am trying to solve the following problem.

Assume I have a HTML file that reads:


</div class = nameCouldBeAnything1><br>
    <p>some text here</p><br>
</div>

<div class = nameCouldBeAnything2><br>
    <p>some more text here</p><br>
</div>

<div class = nameCouldBeAnything3><br>
    <p>even more text here</p><br>
<p>and here</p><br>
<p>and here</p><br>
<p>and here</p><br>
<p>and here</p><br>
</div>

What I am trying to achieve is to store the contents in between the div tags into separate string or string array variables.

If there is a Jsoup solution this would be great, if there isn’t then a regex string matching starting from p and ending at /p would be great also.

The challenges to take into consideration are:

1) You can not use specific div class names to pinpoint the location of the p tags in order to obtain the plaintext using Jsoup.

2) Using doc.select("body p") or doc.select("div p") from Jsoup kind of works, however when you want to store the p tags into string variables they will be written individually into variables instead of by div into variables.

This is what I have so far:

htmlFile = Jsoup.parse(input, "UTF-8");
Elements body = htmlFile.select("body p");
Element bodyStart = body.first();
Element bodyEnd = body.last(); 
Element p = bodyStart;
int divCount = 0; 

while(p != bodyEnd)
{
    p = body.get(divCount);
    System.out.println(p.text());        
    divCount++;
}

This will get each individual p tag however I want the p tags to stay within their respective divs and store each individual div into string/string array variables.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T10:45:35+00:00Added an answer on June 10, 2026 at 10:45 am

    I was able to solve my dilemma.

    This is the code I used, hopefully it helps someone in need.

    Thanks to everyone that posted.

    public static ArrayList proc(Document htmlFile)
    {
        Elements body = htmlFile.select("body");
        ArrayList HTMLPlainText = new ArrayList();
    
        HTMLPlainText.add(htmlFile.title());
    
        for(Iterator<Element> it = body.iterator(); it.hasNext();)
        {
            Element pBody = it.next();
            Elements. pTag = pBody.getElementsByTag("p");parents();
    
                for(int pTagCount = 0; pTagCount < pTag.size(); pTagCount++)
                {
                    Element p = pTag.get(pTagCount);
                    String pt = p.text();
    
                    if(pt.length() != 0)
                    {
                        HTMLPainText.add(pt);
                        pTagCount++:
                    }
    
                    pTag.parents().empty();     
    
                }
        }
    }
    

    Note, there may be some syntax errors, I manually typed this in.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to solve the following problem in Redis. I have a list that
I'm trying to solve the following problem: I have a download button that when
Trying to solve this problem: I have the following set of divs that when
I am trying to solve the following problem with Puppet: I have multiple nodes.
I am trying to solve the following problem. Lets say you have the the
Trying to solve a problem with templatetags. I have two templatetags: @register.inclusion_tag('directory/_alphabet.html') def alphabet_list(names):
I'm trying to solve the following problem: Say I have a Python script (let's
I have the following cyclic dependency problem I am trying to solve: typedef std::map<int,
I have the following problem which I'm trying to solve with javascript. I have
I have the following problem to solve, perhaps you could give me some ideas

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.