Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1086657
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T22:50:26+00:00 2026-05-16T22:50:26+00:00

I need to strip out a few invalid characters from a string and wrote

  • 0

I need to strip out a few invalid characters from a string and wrote the following code part of a StringUtil library:

public static String removeBlockedCharacters(String data) {
    if (data==null) {
      return data;
    }
    return data.replaceAll("(?i)[<|>|\u003C|\u003E]", "");
}

I have a test file illegalCharacter.txt with one line in it:

hello \u003c here < and > there

I run the following unit test:

@Test
public void testBlockedCharactersRemoval() throws IOException{
    checkEquals(StringUtil.removeBlockedCharacters("a < b > c\u003e\u003E\u003c\u003C"), "a  b  c");
    log.info("Procesing from string directly: " + StringUtil.removeBlockedCharacters("hello \u003c here < and > there"));
    log.info("Procesing from file to string:  " + StringUtil.removeBlockedCharacters(FileUtils.readFileToString(new File("src/test/resources/illegalCharacters.txt"))));
}

I get:

INFO - 2010-09-14 13:37:36,111 - TestStringUtil.testBlockedCharactersRemoval(36) | Procesing from string directly: hello  here  and  there
INFO - 2010-09-14 13:37:36,126 - TestStringUtil.testBlockedCharactersRemoval(37) | Procesing from file to string:  hello \u003c here  and  there

I am VERY confused: as you can see, the code properly strips out the ‘<‘, ‘>’, and ‘\u003c’ if I pass a string containing these values but it fails to strip out ‘\u003c’ if I read from a file containing the same string.

My questions, so that I stop loosing hair over it, are:

  1. Why do I get this behavior?
  2. How can I change my code to properly strip \u003c in all occasions?

Thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T22:50:27+00:00Added an answer on May 16, 2026 at 10:50 pm

    When you compile your source file, the very first thing that happens–before any lexing or parsing–is that the Unicode escapes, \u003C and \u003E, get converted to the actual characters, < and >. So your code is really:

    return data.replaceAll("(?i)[<|>|<|>]", "");
    

    When you compile the code for the test against the string literal, the same thing happens; the test string that you wrote as:

    "a < b > c\u003e\u003E\u003c\u003C"
    

    …is really:

    "a < b > c>><<"
    

    But when you read the test string from a file, no such conversion occurs; you end up trying to match the six-character sequence \u003c with the single character, <. If you really want to match \u003C and \u003E, your code should look like this:

    return data.replaceAll("(?i)(?:<|>|\\\\u003C|\\\\u003E)", "");
    
    • If you use one backslash, the Java compiler interprets it as a Unicode escape and converts it to < or >.

    • If you use two backslashes, the regex compiler interprets it as a Unicode escape and thinks you want to match a < or >.

    • If you use three backslashes, the Java compiler turns it into \< or \>, the regex compiler ignores the backslash, and it tries to match < or >.

    • So, to match a raw Unicode escape sequence, you have to use four backslashes to match the one backslash in the escape sequence.

    Notice that I changed your brackets, too. [<|>] is a character class that matches <, | or >; what you want is an alternation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

PLATFORM: PHP & mySQL For my experimentation purposes, I have tried out few of
I've tried to understand a few examples, including questions here so I apologise if
I am modifying a core function of the Kohana library, the text::auto_p() function. The
I have a table which is full of arbitrarily formatted phone numbers, like this
ok, so ill cut to the chase here. and to be clear, im looking
I've got a junk directory where I toss downloads, one-off projects, email drafts, and
I'm trying to get a clear idea of when I should be using indexed
Im abit of a noob to Powershell so please dont chastise me :-) So
I currently have a site that uses a two-tone background image that is centered
I have a set of records, stored as XML files, where the XML files

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.