I need to automate R to read a csv datafile that’s into a zip file.
For example, I would type:
read.zip(file = "myfile.zip")
And internally, what would be done is:
- Unzip
myfile.zipto a temporary folder - Read the only file contained on it using
read.csv
If there is more than one file into the zip file, an error is thrown.
My problem is to get the name of the file contained into the zip file, in orded to provide it do the read.csv command. Does anyone know how to do it?
UPDATE
Here’s the function I wrote based on @Paul answer:
read.zip <- function(zipfile, row.names=NULL, dec=".") {
# Create a name for the dir where we'll unzip
zipdir <- tempfile()
# Create the dir using that name
dir.create(zipdir)
# Unzip the file into the dir
unzip(zipfile, exdir=zipdir)
# Get the files into the dir
files <- list.files(zipdir)
# Throw an error if there's more than one
if(length(files)>1) stop("More than one data file inside zip")
# Get the full name of the file
file <- paste(zipdir, files[1], sep="/")
# Read the file
read.csv(file, row.names, dec)
}
Since I’ll be working with more files inside the tempdir(), I created a new dir inside it, so I don’t get confused with the files. I hope it may be useful!
You can use
unzipto unzip the file. I just mention this as it is not clear from your question whether you knew that. In regard to reading the file. Once your extracted the file to a temporary dir (?tempdir), just uselist.filesto find the files that where dumped into the temporary directory. In your case this is just one file, the file you need. Reading it usingread.csvis then quite straightforward:assuming your
tempdirlocation is stored intemp_path.