I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not accented counterparts. Example: I want to replace this word: ‘había’ for habia. I tried replace it directly but with replace method of string class but I could not get that to work.
I’m using this code:
for (it= dictionary.begin(); it != dictionary.end(); it++) { strMine=(it->first); found=toReplace.find_first_of(strMine); while (found!=std::string::npos) { strAux=(it->second); toReplace.erase(found,strMine.length()); toReplace.insert(found,strAux); found=toReplace.find_first_of(strMine,found+1); } }
Where dictionary is a map like this (with more entries):
dictionary.insert ( std::pair<std::string,std::string>('á','a') ); dictionary.insert ( std::pair<std::string,std::string>('é','e') ); dictionary.insert ( std::pair<std::string,std::string>('í','i') ); dictionary.insert ( std::pair<std::string,std::string>('ó','o') ); dictionary.insert ( std::pair<std::string,std::string>('ú','u') ); dictionary.insert ( std::pair<std::string,std::string>('ñ','n') );
and toReplace strings is:
std::string toReplace='á-é-í-ó-ú-ñ-á-é-í-ó-ú-ñ';
I obviously must be missing something. I can’t figure it out. Is there any library I can use?.
Thanks,
First, this is a really bad idea: you’re mangling somebody’s language by removing letters. Although the extra dots in words like “naïve” seem superfluous to people who only speak English, there are literally thousands of writing systems in the world in which such distinctions are very important. Writing software to mutilate someone’s speech puts you squarely on the wrong side of the tension between using computers as means to broaden the realm of human expression vs. tools of oppression.
What is the reason you’re trying to do this? Is something further down the line choking on the accents? Many people would love to help you solve that.
That said, libicu can do this for you. Open the transform demo; copy and paste your Spanish text into the “Input” box; enter
as “Compound 1” and click transform.
(With help from slide 9 of Unicode Transforms in ICU. Slides 29-30 show how to use the API.)