When I parse R code with non-native characters under Windows, these characters seem to

Question

0

Asked: June 17, 20262026-06-17T16:59:44+00:00 2026-06-17T16:59:44+00:00

When I parse R code with non-native characters under Windows, these characters seem to

0

When I parse R code with non-native characters under Windows, these characters seem to be turned into their Unicode representations, e.g.

Encoding('ğ')
# [1] "UTF-8"
parse(text="'ğ'")
# expression('<U+011F>')
parse(text="'ğ'", encoding='UTF-8')
# expression('<U+011F>')
deparse(parse(text="'ğ'")[1])
# [1] "expression(\"<U+011F>\")"
eval(parse(text="'ğ'"))
# [1] "<U+011F>"

Since my locale is Simplified Chinese, I can parse code with Chinese characters without such a problem, e.g.

parse(text="'你好'")
# expression('你好')

My question is, how can I preserve characters like the letter ğ in this example? Or at least how can I “reconstruct” the original characters after I deparse() the expression?

My session info:

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T16:59:45+00:00

The root of the problem, is that (quoting R Installation and administration manual): “R supports all the character sets that the underlying OS can handle. These are interpreted according to the current locale”. And unfortunately Windows has no locale supporting UTF-8.

Now, the good thing is that Rgui apparently supports UTF-8 (scroll down to 2.7.0 > Internationalization). The R parser though, works only with the characters supported in the locale. So a solution that worked for me is to temporarily change the R locale with Sys.setlocale() just to do the parsing, and later when deparsing use iconv() to convert to UTF-8:

> Sys.getlocale()
[1] "LC_COLLATE=Greek_Greece.1253;LC_CTYPE=Greek_Greece.1253;LC_MONETARY=Greek_Greece.1253;LC_NUMERIC=C;LC_TIME=Greek_Greece.1253"
> orig.locale <- Sys.getlocale("LC_CTYPE")
> parse(text="'你好'")
expression('<U+4F60><U+597D>')
> Sys.setlocale(locale="Chinese")
[1] "LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936"
> a <- parse(text="'你好'")
> a
expression('你好')
> Sys.setlocale(locale="Turkish")
[1] "LC_COLLATE=Turkish_Turkey.1254;LC_CTYPE=Turkish_Turkey.1254;LC_MONETARY=Turkish_Turkey.1254;LC_NUMERIC=C;LC_TIME=Turkish_Turkey.1254"
> b <- parse(text="'ğ'")
> b
expression('ğ')
> Sys.setlocale(locale=orig.locale)
[1] "LC_COLLATE=Greek_Greece.1253;LC_CTYPE=Greek_Greece.1253;LC_MONETARY=Greek_Greece.1253;LC_NUMERIC=C;LC_TIME=Greek_Greece.1253"
> a
[1] expression('ΔγΊΓ')
> b
[1] expression('π')
> ai <- iconv(a, from="CP936", to="UTF-8")
> ai
[1] "你好"
> bi <- iconv(b, from="CP1254", to="UTF-8")
> bi
[1] "ğ"

Hope this helps!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When I parse R code with non-native characters under Windows, these characters seem to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply