I’m working on an automatic summarization system in my C++ class and have a question regarding one of the ASCII comparisons I’m doing. Here’s the code:
char ch;
string sentence;
pair<char, char> sentenceCheck;
int counter = 0;
while (!ifs2.eof())
{
ch = ifs2.get();
ch = tolower(ch);
if (ch == 13)
ch = ifs2.get();
if (ch != 10 && ch != '?' && ch != '!' && ch != '.')
sentence += ch;
sentenceCheck.first = sentenceCheck.second;
sentenceCheck.second = ch;
cout << sentenceCheck.first << "-" << (int)sentenceCheck.first << " ---- " << sentenceCheck.second << "-" << (int)sentenceCheck.second << endl;
if(sentenceCheck.second == ' ' || sentenceCheck.second == 10 || sentenceCheck.second == -1)
{
if(sentenceCheck.first == '?' || sentenceCheck.first == '!' || sentenceCheck.first == '.')
{
istringstream s(sentence);
while(s >> wordInSentence)
{
sentenceWordMap.insert(pair<string, int>(wordInSentence, 0));
}
//sentenceList.push_back(pair<string, int>(sentence, 0));
sentence.clear();
}
}
}
What is being done here (with the two if statements) is checking whether a new sentence has begun in the text that is to be analyzed and dealt with later. The conditionals work but only because we discovered that we have to check for that -1 as well. Any ideas what that represents?
As an ASCII character -1 doesn’t represent anything (which is to say -1 is not a valid ASCII value). As the return value from get() it means that the read operation failed – most likely due to the end of file being reached.
Note that the eof() function doesn’t return true if the next call to get will fail because of the end of file being reached – it returns true if the previous call to get failed because of the end of file being reached.