when i load data in lua by loadstring, some magic chinese charactors fail.
RawData = '{a="a朶b"}'
Data = loadstring("return " .. RawData)()
that’s because:
- “朶” ‘s ascii char(gbk encoding) is 0x96 0x5c
- 0x5c is ‘\’, which will escape everything afterwards.
- so, ‘{a=”a朶b”}’ becomes ‘{a=”a\150\b”}’ , \b is the wrong answer
then, i will never get the right output “a朶b” , “b” is eaten by “朶” ….
the same quote problem happens in python:
exec("""print '''a朶b''' """)
there are some ways to handle this in python:
- clearly define the encoding in file –– coding:gbk ––
- use utf-8 for string/file encoding
but lua only support standard C, any quote or escaping ideas?
by the way, this works:
RawData = [=[ {a=[[a朶b]]} ]=]
return loadstring("return " .. RawData)() .a
but i have to change the original RawData, that’s unacceptable.
question 2:
how to keep string in lua not escaped? (python does this job so good)
s = "a朶b"
s1 = string.format("%q", s) -- s escaped
return s -- s escaped
print(s) -- s escaped
As was discussed on the Lua mailing list, Lua handles UTF-8 in string literals just fine. If you can save the file in UTF-8, you will have no problems with Lua. If you need to later use the GBK encoding (like saving it in a file or serving a webpage in the encoding), you can use the lua-iconv library to convert from UTF-8 literals to GBK:
The other thing you can do is convert from GBK to UTF-8 before using
loadstring. Then do not forget to convert back from UTF-8 when presenting the results to the user.