I need to extract the text between two HTML tags and store it in

Question

0

Asked: May 28, 20262026-05-28T00:23:06+00:00 2026-05-28T00:23:06+00:00

I need to extract the text between two HTML tags and store it in

0

I need to extract the text between two HTML tags and store it in a string. An example of the HTML I want to parse is as follows:

<div id=\"swiki.2.1\"> THE TEXT I NEED </div>

I have done this in Java using the pattern (swiki\.2\.1\\\")(.*)(\/div) and getting the string I want from the group $2. However this will not work in android. When I go to print the contents of $2 nothing appears, because the match fails.

Has anyone had a similar problem with using regex in android, or is there a better way (non-regex) to parse the HTML page in the first place. Again, this works fine in a standard java test program. Any help would be greatly appreciated!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T00:23:07+00:00

For HTML-parsing-stuff I always use HtmlCleaner: http://htmlcleaner.sourceforge.net/

Awesome lib that works great with Xpath and of course Android. 🙂

This shows how you can download an XML from URL and parse it to get a certain value from an XML attribute (also shown in the docs):

public static String snapFromHtmlWithCookies(Context context, String xPath, String attrToSnap, String urlString,
                    String cookies) throws IOException, XPatherException {
            String snap = "";

            // create an instance of HtmlCleaner
            HtmlCleaner cleaner = new HtmlCleaner();

            // take default cleaner properties
            CleanerProperties props = cleaner.getProperties();

            props.setAllowHtmlInsideAttributes(true);
            props.setAllowMultiWordAttributes(true);
            props.setRecognizeUnicodeChars(true);
            props.setOmitComments(true);

            URL url = new URL(urlString);

            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setDoOutput(true);

            // optional cookies
            connection.setRequestProperty(context.getString(R.string.cookie_prefix), cookies);
            connection.connect();

            // use the cleaner to "clean" the HTML and return it as a TagNode object
            TagNode root = cleaner.clean(new InputStreamReader(connection.getInputStream()));

            Object[] foundNodes = root.evaluateXPath(xPath);

            if (foundNodes.length > 0) {
                    TagNode foundNode = (TagNode) foundNodes[0];
                    snap = foundNode.getAttributeByName(attrToSnap);
            }

            return snap;
    }

Just edit it for your needs. 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to extract the text between two HTML tags and store it in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply