Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7707489
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T00:22:23+00:00 2026-06-01T00:22:23+00:00

In Java I have an arbitrary HTML document as a string. For simplicity, say:

  • 0

In Java I have an arbitrary HTML document as a string. For simplicity, say:

String original = "Hello, <strong>this</strong> is a string";

And I have a record of various locations in the string, always within the text, not within a tag. For example the index of the start and end of the word “is” are 29 and 31.

I then perform a transformation on the string – in this case stripping out the HTML tags. This leaves:

original = "Hello, this is a string";

Is there an elegant way of getting the new start and end index of the word “is” now (12 and 14)?

The one possible solution I can think of is inserting a “flag” at each original index, stripping the HTML, then removing the flags while recording their locations. This shouldn’t cause any issues with the HTML stripping as the indices always occur outside the tags.

If this is actually the best way, does anyone have any recommendations for a good choice of “flag” that definitely won’t coincidentally occur in any HTML documents?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T00:22:25+00:00Added an answer on June 1, 2026 at 12:22 am

    The best approach is going to depend on how you’re stripping the HTML tags. If you’re simply removing everything enclosed in <> brackets, then you can just loop through the old string and keep a count of everything outside <> brackets preceding the old index. Something along these lines would probably work:

    public String newIndex(String str, int oldIndex) {
      int newIndex = 0;
      boolean inBracket = false;
      for (int i = 0; i < str.length(); i++) {
        if (i == oldIndex) return newIndex;
        char c = str.charAt(i);
        if (c == '<') inBracket = true;
        else if (c == '>') inBracket = false;
        else if (!inBracket) newIndex++;
      }
      return newIndex;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Timer.html The reason for this question is that I have a timer running at
Does Java have a built-in way to escape arbitrary text so that it can
Suppose we have an arbitrary graph represented by nodes and pointers like this: class
I have a Java data structure which results from deserialising this JSON: { 'level1
this is a noobie question regarding tree maps. I have read through the Java
Let's say I have an existing application written in Java which I wish to
I have a Java program where I have a JMenu with an arbitrary number
We have a Java IRC application where users are allowed to execute arbitrary PHP
Does Java have a data type that represents a period of time eg 34
Does Java have an equivalent to .NET resource (.resx) files for localization? In .NET,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.