I am new in parsing HTML using Java. What I want to do is

Question

0

Asked: May 27, 20262026-05-27T03:52:49+00:00 2026-05-27T03:52:49+00:00

I am new in parsing HTML using Java. What I want to do is

0

I am new in parsing HTML using Java. What I want to do is to get the text between tags but those tags contains some optional attributes.
for example, I have the folowing string

HelloWorld!

I want to extract the text of the second cell only which is “World!”. (and it has diffrent attributes from “Hello”)

What I have found here so far is:

import java.io.*;
import java.net.URL;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class HtmlParseDemo {

    public static void main(String[] args) throws Exception {

        Reader reader = new StringReader("<tr><td align=\"center\" width=\"408\"><font color=\"#000000\">"
                + "Hello </font></td><td align=\"center\" width=\"275\"><font color=\"#0000FF\">World! "
               + "</font></td></tr>");
        HTMLEditorKit.Parser parser = new ParserDelegator();
        parser.parse(reader, new HTMLTableParser(), true);
        reader.close();
    }
}

class HTMLTableParser extends HTMLEditorKit.ParserCallback {

    private boolean encounteredATableRow = false;

    public void handleText(char[] data, int pos) {
        if (encounteredATableRow) {
            System.out.println(new String(data));
        }
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if (t == HTML.Tag.TD) {
            encounteredATableRow = true;
        }
    }

    public void handleEndTag(HTML.Tag t, int pos) {
        if (t == HTML.Tag.TD) {
            encounteredATableRow = false;
        }
    }
}

Output:

Hello
World!

It output all the text regardles the attributes.

Any ideas please?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:52:50+00:00

I did it and it worked:

import java.io.*;
import java.net.URL;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class HtmlParseDemo {

    public static void main(String[] args) throws Exception {

        Reader reader = new StringReader("<tr><td align=\"center\" width=\"408\"><font color=\"#000000\">"
                + "Hello </font></td><td align=\"center\" width=\"275\"><font color=\"#0000FF\">World! "
               + "</font></td></tr>");
        HTMLEditorKit.Parser parser = new ParserDelegator();
        parser.parse(reader, new HTMLTableParser(), true);
        reader.close();
    }
}
class HTMLTableParser extends HTMLEditorKit.ParserCallback {

    private boolean encounteredATableRow = false;

    public void handleText(char[] data, int pos) {
        if (encounteredATableRow) {
            System.out.println(new String(data));
        }
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        String name1 = (String) a.getAttribute(HTML.Attribute.WIDTH);
        if (t == HTML.Tag.TD) {
            if (name1 != null && name1.equalsIgnoreCase("275") == true) {
//                System.out.println(name1);
                encounteredATableRow = true;
            }
        }
    }

    public void handleEndTag(HTML.Tag t, int pos) {
        if (t == HTML.Tag.TD) {
            encounteredATableRow = false;
        }
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new in parsing HTML using Java. What I want to do is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply