ElasticSearch is a search Server which accepts data only in UTF8.
When i tries to give ElasticSearch following text
Small businesses potentially in line for a lighter reporting load include those with an annual turnover of less than £440,000, net assets of less than £220,000 and fewer than ten employees”
Through my java application – Basically my java application takes this info from a webpage , and gives it to elasticSearch. ES complaints it cant understand £ and it fails. After filtering through below code –
byte bytes[] = s.getBytes("ISO-8859-1");
s = new String(bytes, "UTF-8");
Here £ is converted to �
But then when I copy it to a file in my home directory using bash and it goes in fine. Any pointers will help.
You have ISO-8895-1 octets in
bytes, which you then tellStringto decode as if it were UTF-8. When it does that, it doesn’t recognize the illegal 0xA3 sequence and replaces it with the substitution character.To do this, you have to construct the string with the encoding it uses, then convert it to the encoding that you want. See How do I convert between ISO-8859-1 and UTF-8 in Java?.