In Java I have an arbitrary HTML document as a string. For simplicity, say:

Question

0

Asked: June 1, 20262026-06-01T00:22:23+00:00 2026-06-01T00:22:23+00:00

In Java I have an arbitrary HTML document as a string. For simplicity, say:

0

In Java I have an arbitrary HTML document as a string. For simplicity, say:

String original = "Hello, <strong>this</strong> is a string";

And I have a record of various locations in the string, always within the text, not within a tag. For example the index of the start and end of the word “is” are 29 and 31.

I then perform a transformation on the string – in this case stripping out the HTML tags. This leaves:

original = "Hello, this is a string";

Is there an elegant way of getting the new start and end index of the word “is” now (12 and 14)?

The one possible solution I can think of is inserting a “flag” at each original index, stripping the HTML, then removing the flags while recording their locations. This shouldn’t cause any issues with the HTML stripping as the indices always occur outside the tags.

If this is actually the best way, does anyone have any recommendations for a good choice of “flag” that definitely won’t coincidentally occur in any HTML documents?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T00:22:25+00:00

The best approach is going to depend on how you’re stripping the HTML tags. If you’re simply removing everything enclosed in <> brackets, then you can just loop through the old string and keep a count of everything outside <> brackets preceding the old index. Something along these lines would probably work:

public String newIndex(String str, int oldIndex) {
  int newIndex = 0;
  boolean inBracket = false;
  for (int i = 0; i < str.length(); i++) {
    if (i == oldIndex) return newIndex;
    char c = str.charAt(i);
    if (c == '<') inBracket = true;
    else if (c == '>') inBracket = false;
    else if (!inBracket) newIndex++;
  }
  return newIndex;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Java I have an arbitrary HTML document as a string. For simplicity, say:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply