Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8213061
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T10:59:09+00:00 2026-06-07T10:59:09+00:00

I am trying to read characters from a file and after removing punctuations. I

  • 0

I am trying to read characters from a file and after removing punctuations. I want to store the words in an array and finally write them to another file. The contents of the file are :-

“यौ ता बाबू उदयभाहू उपेक्षा औंर अपमान्नकीपीड््ा ढोये जैसेतैस्ये वहबाबाके आश्रम म्पें पहैच गया ।
बाबा मान्नो उसी की प्रतीक्षा म्पें वैठे थे । वह ज्योही दण्डवत की मुदा म्पें हुभ्रा त्योंही
बाबा का गभ्रीर स्वर उसके कानों म्पे टकराया ‘ आभ्रो, ञैं तुम्हारे लिए ही बैठा हूें । ‘
अमित न्ने मस्तक ऊैंचा उठाया औंर एकाम्र भाव न्से बाबा को देखता रहा । बाबा
के पास वह अनेकों बार आ चुका था परन्तु. आज जैसी व्यथा, थकान्न औंर प्तानता
इससे दूर्व नहीं थी आदमी कभ्रीकभी इतना टूट ञाता ड़ँ कि ठसे अपने अस्तिल्द
के प्रति भ्री शंका होन्ने लगती न्है वह अनेक विचारों म्पें खो गया उसके नेत्र बाबा
कौ देख रहे थे परन्तु उस्यका मन कहीं औंर भ्रटक रद्दा था ।”
……..

I tried to read these characters(Hindi– utf-8) using old turbo c++. Using simple char data-type.

The program compiled but the contents were not properly written to the file.
Then I used the same coding in visual c++ with the same code and I got error–

"Debug assertion failed ... unsigned(c+1) <=256"

Next I tried to use wide character data-type for this purpose. using<wchar.h> and <cwchar.h> header files and data-type wchar_t and other wide character functions but still the output is not proper —“���त �ྤ���௤ྤ�”

Is there any alternative or any other method to solve this problem.

Do answer with complete code segment also tell me what is the alternative for getline function for wchar. This is what I have tried to do…

#include<sstream>
#include<iostream>
#include<fstream>
#include<ctype.h>
#include<string>
#include<stdio.h>
#include<conio.h>
#include <istream>
#include<vector>
#include<string>
#include<stdlib.h>
#include<iostream>
#include<fstream>
#include<ctype.h>
#include<string.h>
#include<stdio.h>
#include<conio.h>
#include<vector>
#include<wchar.h>
#include<cwchar>
#include <locale.h>
#include <cwchar>
using namespace std;
unsigned char line[1000],storech[2000],storech1[20000];
wchar_t word[50];
std::vector< wchar_t* > storewrd;

void main()
{ 
    FILE * file3 = fopen("H:\\myfile.txt" , "w");
    cout << "check" << endl;
    FILE *stream;
    stream = fopen( "H:\\ocr.txt", "r" );
    setlocale(LC_ALL,"");
    int ch;
    int  test;
    wchar_t temp1;
    wchar_t buffer[500];
    wchar_t temp[500];

    int x=0,j=0;
        do
    {
        int loop = 0;
        ch = fgetwc(stream);

        //read word 
        while( (ch != '\n') && (ch != WEOF) ) 
        {
                buffer[loop] = ch;
            loop++;


         test = fgetwc(stream);
         temp1 = (wchar_t) test;
         if(!iswpunct(test))    
         fputwc( test , file3);
             wcout << temp1 << "  ";


        }


            int t;
        if (ch!= WEOF)
        {
             for(t=0;t<loop;t++)
             {
            temp[t] = buffer[t];
             }
             temp[loop++] = '\0';

                j++;
                //cout << buffer[loop] << "  ";
        }       
    }while(ch != WEOF);

    cout << "check";


    _getch();

}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T10:59:10+00:00Added an answer on June 7, 2026 at 10:59 am

    It’s not really clear to me what you’re trying to do: where did the
    assertion failure occur? How are you trying to determine whether the
    characters are punctuation or not?

    UTF-8 is a multibyte encoding, which means that the single byte
    functions like ispunct don’t work on it. It is a variable length
    encoding, however, and all of the characters in the original ASCII code
    set have single byte encodings. If the only punctuation you are
    concerned with are characters in the original ASCII, you can
    “cheat” a bit, and use something like:

    if ( (ch & 0x80) == 0 && ispunct( ch ) ) {
        //  is ASCII punctuation
    } else {
        //  is something else
    }
    

    I put “cheat” in quotes, because one of the goals of Unicode
    and UTF-8 is that code that looks for things like ASCII punctuation
    should work unchanged.

    If you need to recognize more than just ASCII punctuaion (e.g. things
    like «, ¿ or —), and you want to use wchar_t
    (which is usually, but not always UTF-16 or UTF-32), and the file is
    UTF-8, you’ll need to use an appropriate locale which does the code
    translation. In this case, you should definitely use iostream, and
    not C style IO; iostream will allow you to imbue the stream with the
    appropriate locale, and C++ locales will allow you to create locales on
    the fly, by changing a single facet (codecvt, in this case) from
    another locale (probably the global one). (Under Linux, the global
    locale, particularly in non-English speaking areas, is often a UTF-8
    locale, which can be used directly. Under Windows, I would expect it to
    be a UTF-16 locale, which will not translate UTF-8 correctly.) If you
    don’t want to get involved with locales, read your UTF-8 directly into a
    char buffer, and use the iconv library or something similar to
    translate it within your program. Be aware, however, that there might
    be some rare punctuation outside of the basic plane, which will be
    encoded using two surrogate characters in UTF-16; iswpunct will not
    work for these if your wchar_t uses UTF-16 (Windows and AIX). (Most
    of the characters outside the basic plane are CJK or from historic
    scripts not used today, so this might not be an issue for you.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to read characters from a file and writing them to another.
I'm trying to read the characters from a file in reverse order using lseek.
I am trying to read from csv file. The file contains UTF-8 characters. So
I'm trying to read data in from a binary file and then store in
I'm trying to store strings that I read from a file in a std::vector
I am trying to read a CSV file with accented characters with Python (only
I am trying to read portuguese characters from files, and keep getting into problems.
I'm trying to use urllib and urllib2 to read from a text file that
I am trying to read in data from a text file and do a
I'm trying to read base block from windows HIVE file. First 4 bytes of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.