How to Remove the Emoticons from the String My simple code is..
public static void main(String[] args) throws SQLException {
String str="My nam is ur -D ";
getRefineCode(str);
}
private static void getRefineCode(String str) throws {
List smstypeWord=getshortWord();
for(int i=0;i<smstypeWord.size();i++) {
String string=smstypeWord.get(i).toString();
String stringcon[]=string.split("_");
String emessage=stringcon[0];
String emoticon=stringcon[1].trim();
if(str.contains(emoticon)) {
str=str.replace(emoticon, emessage);
System.out.println("=================>"+str);
}
}
System.out.println("=======++==========>"+str);
}
private static List getshortWord() throws SQLException {
String query1 = "SELECT * FROM englishSmsText";
PreparedStatement ps = conn.prepareStatement(query1);
ResultSet rs = ps.executeQuery();
String f_message="";
String s_message="";
while(rs.next()) {
s_message=rs.getString("message");
f_message=rs.getString("short_text");
shortMessage.add(s_message+"_"+f_message);
//fullMessage.add(f_message);
}
return shortMessage;
}
My database is based on http://smsdictionary.co.uk/abbreviations site
I able to understand how to remove the multiple abb. or short message
output is like My nam is You are SquintLaughtGrinisappGaspoooh!!shockedintedr, Big SmilGrinisappGaspoooh!!shockedinted, Grin
First of all,
replaceshould bereplaceAll, otherwise you will only catch the first occurrence of an emoticon or abbreviation.Second, you can reduce the number of false positives by matching only whole words.
replaceAllaccepts regular expressions, so you can usereplaceAll("\\b" + emoticon + "\\b", emessage)to only replace abbreviations which are surrounded by word boundaries (whitespace, punctuation etc.).However, with the dictionary you are using you will still replace
KISSwithKeep It Simple, Stupid. You will replace86with"out Of" Or "over" Or "to Get Rid Of"… Maybe you should be looking for a different approach.Edit: I forgot you were looking for special characters. You should try something like this regex, which will suppress special characters in the search string (and will be more generous than the previously too-strict
\bpattern):It should cover most cases, I doubt there is any way to perfectly identify what is intended as an acronym and what is not.