There is my source code loading text file and delimitting each line to single items (words).
How to further optimize the code? Testing empty lines (and other constructions) are (in my opinion) a little bit inefficient….
typedef std::vector < std::string > TLines;
typedef std::vector < std::vector < std::string > > TItems;
TItems TFloadFile ( const char * file_name )
{
//Load projection from file
unsigned int lines = 0;
char buffer[BUFF];
FILE * file;
TItems file_words;
TLines file_lines;
file = fopen ( file_name, "r" );
if ( file != NULL )
{
for ( ; fgets ( buffer, BUFF, file ); )
{
//Remove empty lines
bool empty_line = true;
for ( unsigned i = 0; i < strlen ( buffer ); i++ )
{
if ( !isspace ( ( unsigned char ) buffer[i] ) )
{
empty_line = false;
break;
}
}
if ( !empty_line )
{
file_lines.push_back ( buffer );
lines++;
}
}
file_words.resize ( lines + 1 );
for ( unsigned int i = 0; i < lines; i++ )
{
char * word = strtok ( const_cast<char *> ( file_lines[i].c_str() ), " \t,;\r\n" );
for ( int j = 0; word; j++, word = strtok ( 0, " \t;\r\n" ) )
{
file_words[i].push_back ( word );
}
}
fclose ( file );
}
return file_words;
}
Thanks for your help…
Before optimizing, can you explain how big the file is, how long the code currently takes to execute and why you think it isn’t already IO bound (ie due to hard disk speed). How long do you think it should take? Some idea of the type of data in the file would be good too (such as average line length, average proportion of empty lines etc).
That said, combine the remove-empty-line loop with the word-tokenising loop. Then you can remove TLines altogether and avoid the std::string constructions and vector push-back. I haven’t checked this code works, but it should be close enough to give you the idea. It also includes a more efficient empty line spotter: