Simple question for all you pragmatic object-oriented fellas.
I have read many times to avoid classes like “Processor”, and “xxxxHandler” in order to agree to OO standards: and I believe it’s a good measure for understandability of the system code.
Let’s assume we have a software that scans some file structure, let’s say a bunch of specific CSV files. Let’s say we have an independent module called CsvParser.
class CsvParser {
public string GetToken(int position) { .. }
public bool ReadLine() { .. }
}
class MyCsvFile {
public string FullPath { get; }
public void Scan() {
CsvParser csvp(FullPath);
while (csvp.ReadLine())
{
/* Parse the file that this class represents */
}
}
}
This will save having a “FileScanner” class, which is a -Processor- type class. Something that will collect say, a bunch of files from a directory, and scan each.
class MyFileScan {
public string[] Files { get; set; }
public void GetFiles() { this.Files = Directory.GetFiles(..); }
public void ScanFiles() {
foreach (string thisFilePath in Files)
{
CsvParser csvp(thisFilePath);
/* ... */
}
}
}
The OO approach dictates having the MyCsvFile class, and then a method representing the operation on the object.
Any thoughts? What do you programmers think.
I think what you’re describing is that objects should take care of operations that only require themselves, which is in general a good rule to follow. There’s nothing wrong with a “processor” class, as long as it “processes” a few different (but related) things. But if you have a class that only processes one thing (like a CSV parser only parses CSVs) then really there’s no reason for the thing that the processor processes not to do the processing on itself.
However, there is a common reason for breaking this rule: usually you don’t want to do things you don’t have to do. For example, with your CSV class, if all you want is to find the row in the CSV where the first cell is “Bob” and get the third column in that row (which is, say, Bob’s birth date) then you don’t want to read in the entire file, parse it, and then search through the nice data structure you just created: it’s inefficient, especially if your CSV has 100K lines and Bob’s entry was on line 5.
You could redesign your CSV class to do small-scale operations on CSV’s, like skipping to the next line and getting the first cell. But now you’re implementing methods that you wouldn’t really speak of a CSV having. CSV’s don’t read lines, they store them. They don’t find cells, they just have them. Furthermore, if you want to do a large-scale operation such as reading in the entire CSV and sorting the lines by the first cell, you’ll wish you had your old way of reading in the entire file, parsing it, and going over the whole data structure you created. You could do both in the same class, but now your class is really two classes for two different purposes. Your class has lost cohesion and any instance of the class you create is going to have twice as much baggage, while you’re only likely to use half of it.
In this case, it makes sense to have a high-level abstraction of the CSV (for the large-scale operations) and a “processor” class for low-level operations. (The following is written in Java since I know that better than I know C#):
A similar well-known example of this is SAX and DOM. SAX is the low-level, fine-grained access, while DOM is the high-level abstraction.