Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3300854
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T20:41:55+00:00 2026-05-17T20:41:55+00:00

I am writing a C program to search a large number of UTF-8 strings

  • 0

I am writing a C program to search a large number of UTF-8 strings in a database. Some of these strings contain English characters with didactics, such as accents, etc. The search string is entered by the user, so it will most likely not contain such characters. Is there a way (function, library, etc) which can remove these characters from a string, or just perform a didactic-insensitive search? For example, if the user enters the search string “motor”, it should match the string “motörhead”.

My first attempt was to manually strip out the combining didactic modifiers described here:

http://en.wikipedia.org/wiki/Combining_character

This worked in some cases, but it turns out many of these characters also have specific unicode values. For example, the character “ö” above can be represented by an “o” followed by the combining didactic U+0308, but it can also be represented by the single unicode character U+00F6, and my method only filters the former.

I have also looked into iconv, which can convert from UTF8 to ASCII. However, I may want to localize my program at a future date, and this would no doubt cause problems for languages with non-English characters. Is there a way I can simply strip/convert these accented characters?

Edit: removed typo in question title.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T20:41:55+00:00Added an answer on May 17, 2026 at 8:41 pm

    Convert to one of the decomposed normalizations — probably NFD, but you might want NFKD even — that makes all diacritics into combining characters that can be stripped.

    You will want a library for this. I hear good things about ICU.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am writing a program that needs to search a LARGE text document for
I am writing an application that reads in a large number of basic user
I'm writing a program that sends an email out at a client's specific local
I am writing a program which has two panes (via CSplitter ), however I
If you are writing a program that is executable from the command line, you
I'm writing a program (for Mac OS X, using Objective-C) and I need to
I'm writing a program to read from a POP3 mailbox and upload the email
I'm writing a program that uses SetWindowRgn to make transparent holes in a window
I am writing a program in Python that will act as a server and
I'm writing a program that contains a generational garbage collector. There are just two

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.