Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as “Peña Báináõ” or with english equivalent charactes like “Pena Bainao“. What we did is to convert the text on the query, something like:
SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;
But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:
Pe?a Baina?
So if the user tries to find that addres typing “Pena Bainao” he can’t find it because “Pena Bainao” is different from “”Pe?a Baina?“”.
We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.
Here is a list of some characters that are not converted to US7ASCII:
Character UTF8 Code Possible Equivalent
æ - u00E6 - ae
å - u00E5 - a
ã - u00E3 - a
ñ - u00F1 - n
õ - u00F5 - o
1) Using
nlssortwith BINARY_AI (Both case and accent insentive):2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:
3) To drop the use of
nlssortfunction and change the sematics of everything, also set the nls_comp session variable:Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section “Using Linguistic Indexes” to see how to be able to use indexes.