I would like some help with this email parser, if possible.
This code has all of its proper declaration and initialization, but I’m not sure where I am messing up this loop:
while ( getline( fin, lines ) )
{
for ( int i = 0; i < lines.length( ); i++ )
{
if ( lines[ i ] == '@' )
{
for ( s = i; s < lines.length( ); s-- )
{
if ( s < 0 )
{
break;
}
if ( validChar( lines[ s ] ) == false )
{
break;
}
} //for
for ( e = i; e > lines.length( ); e++ )
{
if ( e == lines.length( ) )
{
break;
}
if ( validChar( lines[ e ] ) == false )
{
break;
}
if ( lines[ e ] == '.' )
{
hasDot = true;
}
} // for
anEmail = lines.substr( s, e );
cout << anEmail << endl;
}
} // if
} // while
And this is for the function:
bool validChar( char a )
{
bool result = false;
if ( a >= 'A' && a <= 'Z' || a >= 'a' && a <= 'z' || a >= '0' && a <= '9' || a == '.' || a == '-' || a == '+' )
{
result = true;
}
return result;
}
EDIT: a test case being this string in a text file “this is an email file with DummyTest@my.test and some other text for the test”, I want this “DummyTest@my.test”, and I’m only getting this “@my.test and some other text for the test”
Where am I going wrong?
Your issue is here:
You’re checking if
lines[s]is valid to decide whether to terminate; butlines[s] == '@'initially, since you just found the@! If you initializestoi - 1, you’ll be closer… but then you’ll discover that yoursubstrhas a bunch of off-by-ones in it. You’ll end up needing to doanEmail = lines.substr( s + 1, e + 1 );But that will just cause the code to past your test cases. This is not a valid approach to parsing e-mail addresses. This approach will not work on all valid e-mail addresses, including
"an@sign"@fooand"spaces are legal only in quotes"@foo. You’ll also want to extend validChar to deal with the actual set of valid characters, which varies for the name and domain;!#$%&'*+-/=?^_{}|~@[IPv6:2001:db8:1ff::a0b:dbd0]is perfectly legal. Finally, if it’s important to you to actually rule out illegal addresses, you’ll again be restricted by this approach:double..dot@foois not legal, nor isdouble@at@foo.The source for this is RFC822 (or it’s much newer siblings RFC5322 and RFC6531), where you’ll discover that a regular expression cannot parse e-mails, as
name(comment(comment))@foois legal, whilename(comment))@foois not.