I’m building a tar.gz archive using Ant:
<tar destfile="${linux86.zip.file}" compression="gzip" longfile="gnu">
<tarfileset dir="${work.dir}/data" dirmode="755" filemode="755"
prefix="${app.folder}/data"/>
</tar>
Archive is built on Windows. After being extracted on Ubuntu 12 files with names containing non-latin (for example, cyrillic) characters have broken names.
Is there any way to fix or work around that?
I have found some interesting information in Ant’s developer mailing list (30 Jun 2009, 01 Jul 2009) and in ASF Bugzilla (36851, 53811). The problem is old and well-known, it has not been fixed mainly for ideological reasons because not all untar implementations support that.
Patch mentioned in Bugzilla issue has been applied in revision 1350857. There is a constructor with name of encoding for entry name in tar:
But it is never used in Tar task though. So I made an encoding attribute in Tar task, rebuilt Ant from modified sources and used UTF-8 as encoding of entry names.
Extraction tested under Ubuntu 11/12 and Mandriva.