Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6538715
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T10:46:05+00:00 2026-05-25T10:46:05+00:00

I am fairly new to Java, at least regarding interacting with web. Anyway, I

  • 0

I am fairly new to Java, at least regarding interacting with web. Anyway, I am making an app that has to grab HTML out of a webpage, and parse it.

By parsing I mean finding out what the element has in the ‘class=”” ‘ attribute, or in any attribute available in the element. Also finding out what is inside the element. This is where I have searched so far: http://www.java2s.com/Code/Java/Development-Class/HTMLDocumentElementIteratorExample.htm

I found very little regarding this.

I know there are lots of Java parsers out there. I have tried JTidy, and the default Swing parser. I would prefer to use the built-in-to-java parser.

Here is what i have so far (this is just method for testing how it works, proper code will come when i know what & how. Also connection is a URLConnection variable, and connection has been established before this method gets called. < just to clarify):

public void parse() {
        try {

            InputStream is = connection.getInputStream();
            InputStreamReader isr = new InputStreamReader(is);
            BufferedReader br = new BufferedReader(isr);

            String line;
            while ((line = br.readLine()) != null) {
                System.out.println(line);
            }

            // copied from http://www.java2s.com/Code/Java/Development-Class/HTMLDocumentElementIteratorExample.htm
            HTMLEditorKit htmlKit = new HTMLEditorKit();
            HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
            HTMLEditorKit.Parser parser = new ParserDelegator();
            HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
            parser.parse(br, callback, true);

            // Parse
            ElementIterator iterator = new ElementIterator(htmlDoc);
            Element element;

            while ((element = iterator.next()) != null) {
                AttributeSet attributes = element.getAttributes();

                Object name = attributes.getAttribute(StyleConstants.NameAttribute);
                System.out.println ("All attrs of " + name + ": " + attributes.getAttributeNames().toString());
                Enumeration e = attributes.getAttributeNames();
                Object obj;
                while (e.hasMoreElements()) {
                    obj = e.nextElement();
                    System.out.println (obj.toString());
                    System.out.println ("attribute of class = " + attributes.containsAttribute("class", "login"));
                }

                if ((name instanceof HTML.Tag)
                        && ((name == HTML.Tag.H1) || (name == HTML.Tag.H2) || (name == HTML.Tag.H3))) {
                    // Build up content text as it may be within multiple elements
                    StringBuffer text = new StringBuffer();
                    int count = element.getElementCount();
                    for (int i = 0; i < count; i++) {
                        Element child = element.getElement(i);
                        AttributeSet childAttributes = child.getAttributes();
                        if (childAttributes.getAttribute(StyleConstants.NameAttribute) == HTML.Tag.CONTENT) {
                            int startOffset = child.getStartOffset();
                            int endOffset = child.getEndOffset();
                            int length = endOffset - startOffset;
                            text.append(htmlDoc.getText(startOffset, length));
                        }
                    }
                    System.out.println(name + ": " + text.toString());
                }
            }
        } catch (IOException e) {
            System.out.println ("Exception?1 " + e.getMessage() );
        } catch (Exception e) {
            System.out.println ("Exception? " + e.getMessage());
        }
    }

The question is: How do I get any element’s attributes and print them out?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T10:46:06+00:00Added an answer on May 25, 2026 at 10:46 am

    This code is needlessly verbose. I would suggest using a better library like Jsoup. Here’s some code to find out all the attributes of all divs on this page.

    String url = "http://stackoverflow.com/questions/7311269"
                 + "/java-print-any-detail-of-html-element";
    Document doc = Jsoup.connect(url).get();
    Elements divs = doc.select("div");
    int i = 0;
    for (Element div : divs) {
        System.out.format("Div #%d:\n", ++i);
        for(Attribute attr : div.attributes()) {
            System.out.format("%s = %s\n", attr.getKey(), attr.getValue());
        }
    }
    

    Follow the Jsoup Cookbook for a gentle introduction to the this powerful library.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm relatively new to java (specifically swing) and have recently been making some fairly
I'm fairly new to Java (been writing other stuff for many years) and unless
I am fairly new to Emacs and I have been trying to figure out
Being fairly new to JavaScript, I'm unable to discern when to use each of
I'm still fairly new to T-SQL and SQL 2005. I need to import a
I'm fairly new to ASP.NET and trying to learn how things are done. I
I'm fairly new to the STL, so I was wondering whether there are any
I'm fairly new to the world of versioning but would like to introduce Subversion
I'm fairly new at programming, but I've wondered how shell text editors such as
i'm fairly new to NHibernate and although I'm finding tons of infos on NHibernate

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.