I have written a Python script for merging many data files in a few different ways. This is my first Python script, really my first attempt at OOP, and I suspect that I’ve been thinking about objects and classes in a way that’s functional, but not optimal.
I created a class for the source files and a subclass for lines in source files that are records. Now, with my new understanding that everything in Python is an object, I suspect that I’ve created unnecessary complexity by creating a class for files, when a built-in type not only exists, but also I’m already using it every time I open a file.
Unfortunately it is not clear to me from the documentation how I would assign new attributes, methods, and subclasses to the built-in type for files. I also do not understand how the file datatype may differ from a class; I simply understand both as “factories” for creating objects with particular properties.
class SrcFile:
self.name = which
self.terminals = set([])
def <a few methods>():
with open(self.name) as file:
<do some stuff and return something>
class Record(SrcFile):
<methods>
for file in files:
file = SrcFile(file)
if <conditions on values from SrcFile methods>:
with open(file) as file:
for line in file:
if <regexp match>:
record = Record(line)
<apply Record() methods>
<write to tempfiles>
<merge tempfiles to stdout>
pro tip: you don’t.
(there might be situations where you could consider tinkering with the built-in file type, but that would be overkill for your current problem)
Looking at the last part of your example, it seems we could throw away your Record and SrcFile classes and rewrite it like this:
Where
check_conditionschecks the conditions that were contained in your SrcFile class andconvert_recordgenerates the output for a Record line.