Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1086657
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T22:50:26+00:00 2026-05-16T22:50:26+00:00

I need to strip out a few invalid characters from a string and wrote

  • 0

I need to strip out a few invalid characters from a string and wrote the following code part of a StringUtil library:

public static String removeBlockedCharacters(String data) {
    if (data==null) {
      return data;
    }
    return data.replaceAll("(?i)[<|>|\u003C|\u003E]", "");
}

I have a test file illegalCharacter.txt with one line in it:

hello \u003c here < and > there

I run the following unit test:

@Test
public void testBlockedCharactersRemoval() throws IOException{
    checkEquals(StringUtil.removeBlockedCharacters("a < b > c\u003e\u003E\u003c\u003C"), "a  b  c");
    log.info("Procesing from string directly: " + StringUtil.removeBlockedCharacters("hello \u003c here < and > there"));
    log.info("Procesing from file to string:  " + StringUtil.removeBlockedCharacters(FileUtils.readFileToString(new File("src/test/resources/illegalCharacters.txt"))));
}

I get:

INFO - 2010-09-14 13:37:36,111 - TestStringUtil.testBlockedCharactersRemoval(36) | Procesing from string directly: hello  here  and  there
INFO - 2010-09-14 13:37:36,126 - TestStringUtil.testBlockedCharactersRemoval(37) | Procesing from file to string:  hello \u003c here  and  there

I am VERY confused: as you can see, the code properly strips out the ‘<‘, ‘>’, and ‘\u003c’ if I pass a string containing these values but it fails to strip out ‘\u003c’ if I read from a file containing the same string.

My questions, so that I stop loosing hair over it, are:

  1. Why do I get this behavior?
  2. How can I change my code to properly strip \u003c in all occasions?

Thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T22:50:27+00:00Added an answer on May 16, 2026 at 10:50 pm

    When you compile your source file, the very first thing that happens–before any lexing or parsing–is that the Unicode escapes, \u003C and \u003E, get converted to the actual characters, < and >. So your code is really:

    return data.replaceAll("(?i)[<|>|<|>]", "");
    

    When you compile the code for the test against the string literal, the same thing happens; the test string that you wrote as:

    "a < b > c\u003e\u003E\u003c\u003C"
    

    …is really:

    "a < b > c>><<"
    

    But when you read the test string from a file, no such conversion occurs; you end up trying to match the six-character sequence \u003c with the single character, <. If you really want to match \u003C and \u003E, your code should look like this:

    return data.replaceAll("(?i)(?:<|>|\\\\u003C|\\\\u003E)", "");
    
    • If you use one backslash, the Java compiler interprets it as a Unicode escape and converts it to < or >.

    • If you use two backslashes, the regex compiler interprets it as a Unicode escape and thinks you want to match a < or >.

    • If you use three backslashes, the Java compiler turns it into \< or \>, the regex compiler ignores the backslash, and it tries to match < or >.

    • So, to match a raw Unicode escape sequence, you have to use four backslashes to match the one backslash in the escape sequence.

    Notice that I changed your brackets, too. [<|>] is a character class that matches <, | or >; what you want is an alternation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to solve the following question which i can't get to work by
I need to develop a file indexing application in python and wanted to know
I am trying to load a html page through UIWebview.I need to disable all
i have a input tag which is non editable, but some times i need
I have several USB mass storage flash drives connected to a Ubuntu Linux computer
This is beyond both making sense and my control. That being said here is
Is there a way to test if a collection is already initialized? try-catch only?
We manage a site for a medical charity. They have a number of links
I have a login.jsp page which contains a login form. Once logged in the
I have a snippet to create a 'Like' button for our news site: <iframe

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.