It is very simple I hope. These are 20 hex values separated by a back-slash \ and C compiler indeed making them a string of 33 characters because \NUMBER is single value \NUMBER+ALPHA = 2 bytes as well as \ALPHA+NUMBER 2 bytes.
char str[] =
"\b3\bc\77\7\de\ed\44\93\75\ce\c0\9\19\59\c8\f\be\c6\30\6";
//when saved is 33 bytes
My question is after it has been saved to 33 bytes on disk, can we (after reading 33 bytes) remake the same presentation that we have in C? So the program prints "\b3\bc\77\7\de\ed\44\93\75\ce\c0\9\19\59\c8\f\be\c6\30\6", any problem solvers here?
"\b3\bc\77\7\de\ed\44\93\75\ce\c0\9\19\59\c8\f\be\c6\30\6";
//when read back program should output this ^
The string literal you have:
will produce undefined behavior according to C89 (not sure if the source for C89 can be trusted, but my point below still holds) and implementation-defined behavior according to C11 standard. In particular,
\d,\e,\9,\care escape sequences not defined in the standard.gccwill not complain about\e, since it is a GNU extension, which represent ESC.Since there are implementation-defined behavior, it is necessary for us to know what compiler you are using as the result may vary.
Another thing is that, you didn’t show clearly that you are aware of the content of the string after compilation. (A clearer way to show would be to include a hex dump of what the string looks like in memory, and show how you are aware of the escape sequences).
This is how the looks-like-hex string is recognized by the compiler:
Enough beating around the bush. Assuming that you are using
gccto compile the code (warnings ignored). When the code is run, the wholechar[]is written to file usingfwrite. I also assume only lower case characters are used in the source code.You should map all possible escape sequences
\xythat looks like 2-digit hex number to sequences of 1 or 2 bytes. There are not that many of them, and you can write a program to simulate the behavior of the compiler:xis any ofa,b,f(other escape sequences like\nare not hex digit) ande(due to GNU extension). It is mapped to special character.\Emaps to ESC)xyforms a valid octal sequence. It is mapped to character with corresponding value.xforms a valid octal sequence. It is mapped to character with corresponding value.xstays the same.yis not consumed,ystays the same.Note that it is possible for the actual
charto come from 2 different ways. For example,\fand\14will map to the samechar. In such case, it might not be possible to get back the string in the source. The most you can do is guess what the string in the source can be.Use your string as an example, at the beginning,
08and33can come from\b3, but it can also come from\10\63.Using the map produce, there are cases where the mapping is clear: hex larger than
3fcannot come from octal escape sequence, and must come from direct interpretation of the character in the original string. From this, you know that ifeis encountered, it must be the 2nd character in a looks-like-hex sequence.You can use the map as a guide, and the simulation as a method to check whether the map will produce back the ASCII code. Without knowing anything about the string declared in the source code, the most you can derive is a list of candidates for the original (broken) string in the source code. You can reduce the size of the list of candidates if you at least know the length of the string in the source code.