ok, this is bugging me.
i got a phonebook DB from a client where some of the results containts accented names,
and by some i mean mainly the city field,or category.
which makes my query results look ridiculous.
DB Charset: UTF-8
for example:
CompanyName | City | etc…
DemoCompany | Hauptstraße 18 | Whatever
DemoCompany | Hauptstrabe 18 | Whatever
the DB has around 360k records…. so manual checking is not an option.
anyone has an idea how can i find the accented/not accented values ?
something like a duplicate column check…
EDIT:
when i query the table, i get results for both, that is not the problem.
the problem is, when i display the results, some are displayed with accent, and some without.
EDIT:
CREATE TABLE `enc` (
`company` varchar(255) DEFAULT NULL,
`address` varchar(255) DEFAULT NULL,
`postcode` varchar(255) DEFAULT NULL,
`city` varchar(255) DEFAULT NULL,
`Telefon1` varchar(255) DEFAULT NULL,
`Telefon2` varchar(255) DEFAULT NULL,
`Telefon3` varchar(255) DEFAULT NULL,
`Telefon4` varchar(255) DEFAULT NULL,
`Telefon5` varchar(255) DEFAULT NULL,
`Branche1` varchar(255) DEFAULT NULL,
`Branche2` varchar(255) DEFAULT NULL,
`Branche3` varchar(255) DEFAULT NULL,
`Branche4` varchar(255) DEFAULT NULL,
`Branche5` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
You can start with something like this, that will show if there are rows that are exact duplicates of each other (and their count):
If you want to find only duplicate addresses, you do something like this:
Reading your question again, I think I misunderstood what you are asking. If you don’t want to find duplicates (as there are not) but you want to find accented words (and replace them with unaccented perhaps):
The table you have now is probably using a case insensitive collation (like
utf_general_ciorutf_unicode_ci), so you could copy the table into a new one that has same charset but a case sensitive collation, likeutf_bin.You could then create a list of accented characters and then write a query to check for this list in fields of your new table (this will be real slow):
or run a query to
REPLACE()those characters, like'ß'with'ss'for example.