Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8691817
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T00:12:04+00:00 2026-06-13T00:12:04+00:00

Sometimes manipulating character strings at the character level is unavoidable. Here I have a

  • 0

Sometimes manipulating character strings at the character level is unavoidable.

Here I have a function written for ANSI/ASCII based character strings that replaces CR/LF sequences with LF only, and also replaces CR with LF. We use this because incoming text files often have goofy line endings due to various text or email programs that have made a mess of them, and I need them to be in a consistent format to make parsing / processing / output work properly down the road.

Here’s a fairly efficient implementation of this compression from various line-endings to LF only, for single byte per character implementations:

// returns the in-place conversion of a Mac or PC style string to a Unix style string (i.e. no CR/LF or CR only, but rather LF only)
char * AnsiToUnix(char * pszAnsi, size_t cchBuffer)
{
    size_t i, j;
    for (i = 0, j = 0; pszAnsi[i]; ++i, ++j)
    {
        // bounds checking
        ASSERT(i < cchBuffer);
        ASSERT(j <= i);

        switch (pszAnsi[i])
        {
            case '\n':
                if (pszAnsi[i + 1] == '\r')
                    ++i;
                break;

            case '\r':
                if (pszAnsi[i + 1] == '\n')
                    ++i;
                pszAnsi[j] = '\n';
                break;

            default:
                if (j != i)
                    pszAnsi[j] = pszAnsi[i];
        }

    }

    // append null terminator if we changed the length of the string buffer
    if (j != i)
        pszAnsi[j] = '\0';

    // bounds checking
    ASSERT(pszAnsi[j] == 0);

    return pszAnsi;
}

I’m trying to transform this into something that will work correctly with multibyte/unicode strings, where the size of the next character can be multible bytes wide.

So:

  1. I need to look at a character only at a valid character-point (not in the middle of a character)
  2. I need to copy over the portion of the character that is part of the rejected piece properly (i.e. copy whole characters, not just bytes)

I understand that _mbsinc() will give me the address of the next start of a real character. But what is the equivalent for Unicode (UTF16), and are there already primitives to be able to copy a full character (e.g. length_character(wsz))?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T00:12:06+00:00Added an answer on June 13, 2026 at 12:12 am

    One of the beautiful things about UTF-8 is that if you only care about the ASCII subset, your code doesn’t need to change at all. The non-ASCII characters get encoded to multi-byte sequences where all of the bytes have the upper bit set, keeping them out of the ASCII range themselves. Your CR/LF replacement should work without modification.

    UTF-16 has the same property. Characters that can be encoded as a single 16-bit entity will never conflict with the characters that require multiple entities.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Sometimes I accidentally declare variables that have the name of a function. Here is
Sometimes you have a function that will work for flat arguments. For example: send(player,message)
Sometimes I have to check for some condition that doesn't change inside a loop,
That's an issue I still don't understand. Sometimes I have to write: NSString* myVariable;
Sometimes, tomcat restart will cause my application stop immediately so that my consuming data
Sometimes I want to have temporary comments fully left justified on a line (//)
We have a list of (let's say 50) reports that get dumped into various
There are a number of questions here that ask this but the answers invariably
I am manipulating relative large images, about 5MP and sometimes even more. I need
I have an FLA done in Flash 10, and some AS3 code that manipulates

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.