This code really confuses me, it is using some Stanford libraries for the Vector (array) class. Can anyone tell me what is the purpose of int index = line [j] - 'a'; why – ‘a’?
void countLetters(string filename)
{
Vector<int> result;
ifstream in2;
in2.open(filename.c_str());
if (in.fail()) Error("Couldn't read '" + filename + "'");
for (int i = 0; i < ALPHABETH_SIZE; i++)
{
result.add(0); // Must initialize contents of array
}
string line;
while (true)
{
getLine(in, line);
// Check that we got a line
if (in.fail()) break;
line = ConvertToLowerCase(line);
for (int j = 0; j < line.length(); j++)
{
int index = line [j] - 'a';
if (index >= 0 && index < ALPHABETH_SIZE)
{
int prevTotal = result[index];
result[index] = prevTotal +1;
}
}
}
}
The purpose of the code:
Takes a filename and prints the number of times each letter of the alphabet appears in that file. Because there are 26 numbers to be printed, CountLetters needs to create a Vector. For example, if the file is:
Characters in a string are encoded using a character set… typically ASCII on hardware common in English language systems. You can see the ASCII table at http://en.wikipedia.org/wiki/ASCII
In ASCII (and most other character sets), the numbers representing letters are contiguous. So, this is the natural way to test whether the character at index
jin character-arraylineis a letter:Your program is equivalent to that, in an algebra-kind of sense it subtracts
afrom both sides (knowing thatais the first character in the character set):Replacing “<=
z–a” with am equivalent:where ALPHABET_SIZE is 26. This trades a dependency on knowing
zis the last character of your character set for knowing how many characters are in your character set – both are a little fragile, but fine if you know you’re dealing with a well-known, stable character set encoding.A better way to check for a letter is to use the
isalpha()predicate: http://www.cplusplus.com/reference/clibrary/cctype/isalpha/