I would like to ask for help … I am starting in C++ and I got this homework at school … We got to write function bool UTF8toUTF16 (const char * src, const char * dst ); which is supposed to read src file coded in UTF-8 and write it into dst file but in UTF-16. We also mustn’t use any other libraries than in my code down…
So the first thing I am trying to do is that I make a file “xx.txt” and in classic Windows notepad I write there for example char ‘š’. Then am trying to write a program which reads each char of this file in binary mode byte by byte (or bytes by bytes) and prints it’s value… but my program doesn’t work like that…
So I have this file ‘xx.txt’ where is only ‘š’ which has UTF-8 value ‘c5 a1’, UTF-16 value ‘0161’ and Unicode value ‘161’ and I suppose result that it will print: i = 161 (hex) or something close to this result at least…
Here is my code so far:
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <iostream>
#include <fstream>
using namespace std;
int main ( void ) {
char name[] = "xx.txt";
fstream F ( name, ios::in | ios::binary );
unsigned int i;
while( F.read ((char *) & i, 2))
/* I dont know what size to write there - I would guess it s '2' - because I need 2 bytes for the char with hexUTF-16 code '0161', but 2 doesnt work*/
cout << "i = " << hex << i << " (hex) ";
cout << endl;
F.close();
system("PAUSE");
return 0;}
Thanks in advance
Nikolas Jíša
You don’t know how big a character is in utf8 until you finish parsing it, you need to read “chars” one at a time until you have a complete utf8 character.
edit – you don’t say what you are getting as an output – but I suspect it’s a byte ordering issue.
You might be better reading the input (if you know it is always a 16bit value) into a char array and then looking at the individual bytes.
See http://www.joelonsoftware.com/articles/Unicode.html