I am curling a website and writing it to .json file; this file is input to my java code which parses it using json library and the necessary data is written back in a CSV file which i later use to store it in a database.
As you know data coming from a website can be in different formats so i make sure that i read and write in UTF-8 format, still i get wrong output.
For example, Østerriksk becomes �sterriksk.
I am doing all this in Linux. I think there is some encoding problem because this same code runs fine in Windows but not in Unix/Linux.
I am quite sure my java code is proper but i am not able to find out what I’m doing wrong.
You’re reading the data as ISO 8859-1 but the file is actually UTF-8. I think there’s an argument (or setting) to the file reader that should solve that.
Also: curl isn’t going to care about the encodings. It’s really something in your Java code that’s wrong.