I’m having trouble parsing utf8 characters into Text when deriving a Read instance. For example, when I run the following in ghci…
> import Data.Text
> data Message = Message Text deriving (Read, Show)
> read ("Message \"→\"") :: Message
Message "\8594"
Can I do anything to keep my text inside Message utf-8 encoded? I.e. The result should be…
Message "→"
(P.S. I already receive my serialized messages as Text, but currently need to unpack to a String in order to call read. I’d love to avoid this…)
EDIT: Ah sorry, answers rightly point out that it’s show not read which converts to "\8594" – is there a way to show and convert back to Text again without the backslash encoding?
To the best of my knowledge, the internal encoding used by
Text(which is actually UTF-16) is consistent and not exposed directly. If you want UTF-8, you can decode/encode aTextvalue as appropriate. Similarly, it doesn’t make sense to talk about an encoding forString, because that’s just a list ofChar, where eachCharis a unicode code point.Most likely, it’s only the
Showinstance forTextdisplaying things differently here.Also, keep in mind that (by consistent convention in standard libraries)
readandshoware expected to behave as (de-)serialization functions, with a “serialized” format that, interpreted as a Haskell expression, describes a value equivalent to the one being (de-)serialized. As such, the slash encoding with ASCII text is often preferred for being widely supported and unambiguous. If you want to display aTextvalue with the actual code points,showisn’t what you want.I’m not entirely clear on what you want to do with the
Text–usingshowdirectly is exactly what you’re trying to avoid. If you want to display text in a terminal window that’s going to dictate the encoding, and you want the stuff defined inData.Text.IO. If you need to convert to a specific encoding for whatever other reason,Data.Text.Encodingwill give you an encodedByteString(emphasis on “byte”, not “string”–aByteStringis a sequence of raw bytes, not a string of characters).If you just want to convert from
TexttoStringand back toText… what’s wrong with the slash encoding?showis not really intended for pretty-printing output for users to read, despite many people’s initial expectations otherwise.