I am trying to write a program that would read all files in a folder and output all of their contents into one single file. The files are ziped with the .gz extension. I managed to read one file but not all its contents and not the rest of the files. Here is my code:
import glob, gzip, re
import pickle
filed = open('Logs.txt', 'w')
logfilenames = glob.glob('*.gz')
logformat = re.compile(r'^\S+ \S+ \S+ \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) .*" (\d+) (\d+) "([^"]*)" "[^"]*"')
with gzip.GzipFile(logfilenames[0],'r') as f:
for i in glob.glob('*.gz'):
txtline = f.readline()
parsedline = logformat.match(txtline)
print "time={t} size={s} url={u}".format(t=parsedline.group(1), s=parsedline.group(5), u=parsedline.group(3))
pickle.dump(["time={t} size={s} url={u}".format(t=parsedline.group(1), s=parsedline.group(5), u=parsedline.group(3))],filed)
filed.close()
Try this (didn’t touch your regular expression):