I have some corrupted Gzip log files that I’m trying to restore. The files were transfered to our servers through a Java backed web page. The files have always been sent as plain text, but we recently started to receive log files that were Gzipped. These Gzipped files appear to be corrupted, and are not unzip-able, and the originals have been deleted. I believe this is from the character encoding in the method below.
Is there any way to revert the process to restore the files to their original zipped format? I have the resulting Strings binary array data in a database blob.
Thanks for any help you can give!
private String convertStreamToString(InputStream is) throws IOException {
/*
* To convert the InputStream to String we use the
* Reader.read(char[] buffer) method. We iterate until the
* Reader return -1 which means there's no more data to
* read. We use the StringWriter class to produce the string.
*/
if (is != null) {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(
new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
is.close();
}
return writer.toString();
} else {
return "";
}
}
If this is the method that was used to convert the
InputStreamto aString, then your data is almost certainly lost.The problem is that UTF-8 has quite a few byte sequences that are simply not legal (i.e. they don’t represent any value). These sequences will be replaced with the Unicode replacement character.
That character is the same no matter which invalid byte sequence was decoded. Therefore the specific information in those bytes is lost.