Consider 3 regex expressions designed to remove non Latin characters from the string. String

Question

0

Asked: May 29, 20262026-05-29T06:09:08+00:00 2026-05-29T06:09:08+00:00

Consider 3 regex expressions designed to remove non Latin characters from the string. String

0

Consider 3 regex expressions designed to remove non Latin characters from the string.

    String x = "some†¥¥¶¶ˆ˚˚word";

    long now = System.nanoTime();
    System.out.println(x.replaceAll("[^a-zA-Z]", ""));     // 5ms
    System.out.println(System.nanoTime() - now);

    now = System.nanoTime();
    System.out.println(x.replaceAll("[^a-zA-Z]+", ""));    // 2ms
    System.out.println(System.nanoTime() - now);

    now = System.nanoTime();
    System.out.println(x.replaceAll("[^a-zA-Z]*", ""));    // <1ms
    System.out.println(System.nanoTime() - now);

All 3 produce the same result with vastly difference performance metrics.

Why is that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T06:09:09+00:00

The first one is slower because the regex matches each non-latin character individually, so replaceAll operates on each characters individually.

The other patterns match the whole sequence of non-latin characters, so replaceAll can replace the whole sequence in one go. I can’t explain the performance difference between these two, though. Probably something to do with the difference in handling * and + in the regex engine.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Consider 3 regex expressions designed to remove non Latin characters from the string. String

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply