I have a piece of code which basically translates English into text speak.
At the moment I am using the String.split() method and using \\\W as the delimiter, removing all non word characters.
As it stands this is what I get:
input:I hate text speak!:)
output:I h8 txt spk
Is there anyway I do not lose the delimiters?
EDIT:Here’s the method that does the parsing.As it stands it replaces the delimiter with a space so at least its still readable…
public static String engToText(String text){
text=text.toLowerCase();
String translated=" ";
//breaks string into tokens
String[] tokens = text.split("\\W");
for(int x=0;x<tokens.length;x++){
if(wordMapEng.containsKey(tokens[x])){
translated+=" "+wordMapEng.get(tokens[x]);
}else{
translated+=" " + tokens[x];
}
}
return translated.trim();
}
You can use the
StringTokenizerclass which has aconstuctor which when iterating over the tokens gives you back the delimeters too.