I’m using the Apache Commons 1.4.1 library to uncompress “.tar” files.
Problem: I don’t have to extract all files. I have to extract specific files from specific location inside a tar archive. i have to extract only few .xml files where as the size of the TAR file is around 300 MB & it is waste of resource in uncompressing the entire content.
I am stuck up & confused whether i have to do a nested directory compare or is there is any way around?
Note: location of the .XML(required files) is always same.
The structure of the TAR is:
directory:E:\Root\data
file:E:\Root\datasheet.txt
directory:E:\Root\map
file:E:\Root\mapers.txt
directory:E:\Root\ui
file:E:\Root\ui\capital.txt
file:E:\Root\ui\info.txt
directory:E:\Root\ui\sales
file:E:\Root\ui\sales\Reqest_01.xml
file:E:\Root\ui\sales\Reqest_02.xml
file:E:\Root\ui\sales\Reqest_03.xml
file:E:\Root\ui\sales\Reqest_04.xml
directory:E:\Root\ui\sales\stores
directory:E:\Root\ui\stores
directory:E:\Root\urls
directory:E:\Root\urls\fullfilment
file:E:\Root\urls\fullfilment\Cams_01.xml
file:E:\Root\urls\fullfilment\Cams_02.xml
file:E:\Root\urls\fullfilment\Cams_03.xml
file:E:\Root\urls\fullfilment\Cams_04.xml
directory:E:\Root\urls\fullfilment\profile
directory:E:\Root\urls\fullfilment\registration
file:E:\Root\urls\options.txt
directory:E:\Root\urls\profile
Constraint: i cant use JDK 7 & have to stick with Apache commons library.
My current Solution:
public static void untar(File[] files) throws Exception {
String path = files[0].toString();
File tarPath = new File(path);
TarEntry entry;
TarInputStream inputStream = null;
FileOutputStream outputStream = null;
try {
inputStream = new TarInputStream(new FileInputStream(tarPath));
while (null != (entry = inputStream.getNextEntry())) {
int bytesRead;
System.out.println("tarpath:" + tarPath.getName());
System.out.println("Entry:" + entry.getName());
String pathWithoutName = path.substring(0, path.indexOf(tarPath.getName()));
System.out.println("pathname:" + pathWithoutName);
if (entry.isDirectory()) {
File directory = new File(pathWithoutName + entry.getName());
directory.mkdir();
continue;
}
byte[] buffer = new byte[1024];
outputStream = new FileOutputStream(pathWithoutName + entry.getName());
while ((bytesRead = inputStream.read(buffer, 0, 1024)) > -1) {
outputStream.write(buffer, 0, bytesRead);
}
System.out.println("Extracted " + entry.getName());
}
}
The TAR file format is designed to be written or read as a stream (ie, to/from a tape drive), and does not have a centralized header. So no, there’s no way around reading the entire file to extract individual entries.
If you want random access, you should use the ZIP format, and open using the JDK’s
ZipFile. Assuming that you have enough virtual memory, the file will be memory-mapped, making random access very fast (I haven’t looked to see if it will use a random-access file if unable to memory-map).