Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6977649
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T17:39:11+00:00 2026-05-27T17:39:11+00:00

I am using JSoup parser to find particular parts of a html document (defined

  • 0

I am using JSoup parser to find particular parts of a html document (defined by regex) and highlight it by wrapping the found string in <span> tag. Here is my code that does the highlighting –

public String highlightRegex() {
Document doc = Jsoup.parse(htmlContent);

        NodeTraversor nd  = new NodeTraversor(new NodeVisitor() {

            @Override
            public void tail(Node node, int depth) {
                if (node instanceof Element) {

                    Element elem = (Element) node;

                    StringBuffer obtainedText;
                    for(Element tn : elem.getElementsMatchingOwnText(pat)) {

                        Log.e("HELLO", tn.baseUri());
                        Log.e("HELLO", tn.text());
                        obtainedText = new StringBuffer(tn.ownText());
                        mat = pat.matcher(obtainedText.toString());
                        int nextStart = 0;
                        while(mat.find(nextStart)) {
                            obtainedText = obtainedText.replace(mat.start(), mat.end(), "<span>" + mat.group() + "</span>");
                            nextStart = mat.end() + 1;
                        }
                        tn.text(obtainedText.toString());
                        Log.e("HELLO" , "AFTER:" + tn.text());

                    }
                }
            }

            @Override
            public void head(Node node, int depth) {        
            }
        });

        nd.traverse(doc.body());
        return doc.toString();
    }

It does work but the tag <span> is visible inside the webview. What am I doing wrong?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T17:39:11+00:00Added an answer on May 27, 2026 at 5:39 pm

    Looks like no one knows. Here’s some code that i’ve come up with. Slow and inefficient but works anyway. Suggestions are accepted 🙂

    This class can be used to highlight any html using a regex.

    public class Highlighter {
    
        private String regex;
        private String htmlContent;
        Pattern pat;
        Matcher mat;
    
    
        public Highlighter(String searchString, String htmlString) {
            regex = buildRegexFromQuery(searchString);
            htmlContent = htmlString;
            pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
        }
    
        public String getHighlightedHtml() {
    
            Document doc = Jsoup.parse(htmlContent);
    
            final List<TextNode> nodesToChange = new ArrayList<TextNode>();
    
            NodeTraversor nd  = new NodeTraversor(new NodeVisitor() {
    
                @Override
                public void tail(Node node, int depth) {
                    if (node instanceof TextNode) {
                        TextNode textNode = (TextNode) node;
                        String text = textNode.getWholeText();
    
                        mat = pat.matcher(text);
    
                        if(mat.find()) {
                            nodesToChange.add(textNode);
                        }
                    }
                }
    
                @Override
                public void head(Node node, int depth) {        
                }
            });
    
            nd.traverse(doc.body());
    
            for (TextNode textNode : nodesToChange) {
                Node newNode = buildElementForText(textNode);
                textNode.replaceWith(newNode);
            }
            return doc.toString();
        }
    
        private static String buildRegexFromQuery(String queryString) {
            String regex = "";
            String queryToConvert = queryString;
    
            /* Clean up query */
    
            queryToConvert = queryToConvert.replaceAll("[\\p{Punct}]*", " ");
            queryToConvert = queryToConvert.replaceAll("[\\s]*", " ");
    
            String[] regexArray = queryString.split(" ");
    
            regex = "(";
            for(int i = 0; i < regexArray.length - 1; i++) {
                String item = regexArray[i];
                regex += "(\\b)" + item + "(\\b)|";
            }
    
            regex += "(\\b)" + regexArray[regexArray.length - 1] + "[a-zA-Z0-9]*?(\\b))";
            return regex;
        }
    
        private Node buildElementForText(TextNode textNode) {
            String text = textNode.getWholeText().trim();
    
            ArrayList<MatchedWord> matchedWordSet = new ArrayList<MatchedWord>();
    
            mat = pat.matcher(text);
    
            while(mat.find()) {
                matchedWordSet.add(new MatchedWord(mat.start(), mat.end()));
            }
    
            StringBuffer newText = new StringBuffer(text);
    
            for(int i = matchedWordSet.size() - 1; i >= 0; i-- ) {
                String wordToReplace = newText.substring(matchedWordSet.get(i).start, matchedWordSet.get(i).end);
                wordToReplace = "<b>" + wordToReplace+ "</b>";
                newText = newText.replace(matchedWordSet.get(i).start, matchedWordSet.get(i).end, wordToReplace);       
            }
            return new DataNode(newText.toString(), textNode.baseUri());
        }
    
        class MatchedWord {
            public int start;
            public int end;
    
            public MatchedWord(int start, int end) {
                this.start = start;
                this.end = end;
            }
        }
    }
    

    you have to call these two methods to get the highlighted html –

    Highlighter hl = new Highlighter("abc def", htmlString);
    String newhtmlString = hl.getHighlightedHtml();
    

    This will highlight everything that matches the regex (abc)|(def)*.
    You can change the way you want the regex to be built by modifying buildRegexFromQuery() function.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using a HTML parser called Jsoup, to load and parse HTML files.
After I made many changes to the HTML document using JSoup so what I
I am currently using Jsoup to parse HTML document and I use the following
I am using JSoup to parse a gb2312 charset page: http://vars.sinaapp.com/u/t/jsoup_output_encoding_issue.html source code: String
Im using JSoup to parse HTML response. I have multiple Div tags. I have
What would be an optimal way, using Jsoup, to extract all HTML (either to
I'm using jsoup to scrape some HTML data and it's working out great. Now
I'm relatively new to using jsoup, and I can't seem to find the correct
hi all i am using jsoup in a java-ee app to parse html i
I'm trying to parse some html using jsoup (1.3.3) in my android activity. When

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.