I need to split Java Strings at any ” character.
The main thing is, the previous character to that may not be a backslash ( \ ).
So these Strings would split like so:
asdnaoe"asduwd"adfdgb => asdnaoe, asduwd, adfgfb
addfgmmnp"fd asd\"das"fsfk => addfgmmnp, fd asd\"das, fsfk
Is there any easy way to achieve this using regular expressions?
(I use RegEx because it is easiest for me, the coder. Also performance is not an issue…)
Thank you in advance.
I solved it like this:
private static String[] split(String s) {
char[] cs = s.toCharArray();
int n = 1;
for (int i = 0; i < cs.length; i++) {
if (cs[i] == '"') {
int sn = 0;
for (int j = i - 1; j >= 0; j--) {
if (cs[j] == '\\')
sn += 1;
else
break;
}
if (sn % 2 == 0)
n += 1;
}
}
String[] result = new String[n];
int lastBreakPos = 0;
int index = 0;
for (int i = 0; i < cs.length; i++) {
if (cs[i] == '"') {
int sn = 0;
for (int j = i - 1; j >= 0; j--) {
if (cs[j] == '\\')
sn += 1;
else
break;
}
if (sn % 2 == 0) {
char[] splitcs = new char[i - lastBreakPos];
System.arraycopy(cs, lastBreakPos, splitcs, 0, i - lastBreakPos);
lastBreakPos = i + 1;
result[index] = new StringBuilder().append(splitcs).toString();
index += 1;
}
}
}
char[] splitcs = new char[cs.length - (lastBreakPos + 1)];
System.arraycopy(cs, lastBreakPos, splitcs, 0, cs.length - (lastBreakPos + 1));
result[index] = new StringBuilder().append(splitcs).toString();
return result;
}
Anyways, thanks for all your great responses!
(Oh, and despite this, I will be using either @biziclop’s or @Alan Moore’s version, as they
‘re shorter and probably more efficient! =)
Sure, just use
Quick PowerShell test:
However, this won’t split on
\\"(an escaped backslash, followed by a normal quote [at least in most C-like languages’ escaping rules]). You cannot really solve that in Java, though, as arbitrary-length lookbehind isn’t supported:Usually you would expect a proper solution to split on the remaining
"because it isn’t really escaped.