I am trying to remove all HTML elements from a String. Unfortunately, I cannot

Question

0

Asked: May 14, 20262026-05-14T01:06:52+00:00 2026-05-14T01:06:52+00:00

I am trying to remove all HTML elements from a String. Unfortunately, I cannot

0

I am trying to remove all HTML elements from a String. Unfortunately, I cannot use regular expressions because I am developing on the Blackberry platform and regular expressions are not yet supported.

Is there any other way that I can remove HTML from a string? I read somewhere that you can use a DOM Parser, but I couldn’t find much on it.

Text with HTML:

<![CDATA[As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (<a href="http://www.netflix.com/RoleDisplay/Billy_Bob_Thornton/20000303">Billy Bob Thornton</a>) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (<a href="http://www.netflix.com/RoleDisplay/Bruce_Willis/99786">Bruce Willis</a>) -- the world's finest oil driller -- to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task. <a href="http://www.netflix.com/RoleDisplay/Ben_Affleck/20000016">Ben Affleck</a> and <a href="http://www.netflix.com/RoleDisplay/Liv_Tyler/162745">Liv Tyler</a> co-star.]]>

Text without HTML:

As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (Billy Bob Thornton) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (Bruce Willis) — the world’s finest oil driller — to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task.Ben Affleck and Liv Tyler co-star.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:06:52+00:00

There are a lot of nuances to parsing HTML in the wild, one of the funnier ones being that many pages out there do not follow any standard. This said, if all your HTML is going to be as simple as your example, something like this is more than enough:

    char[] cs = s.toCharArray();
    StringBuilder sb = new StringBuilder();
    boolean tag = false;
    for (int i=0; i<cs.length; i++) {
        switch(cs[i]) {
            case '<': if ( ! tag) { tag = true; break; }
            case '>': if (tag) { tag = false; break; }
            case '&': i += interpretEscape(cs, i, sb); break;
            default: if ( ! tag) sb.append(cs[i]);
        }
    }
    System.err.println(sb);

Where interpretEscape() is supposed to know how to convert HTML escapes such as > to their character counterparts, and skip all characters up to the ending ;.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to remove all HTML elements from a String. Unfortunately, I cannot

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply