Can anyone explain the pros and cons to using Data.Textand Data.ByteString.Char8 data types? Does working with ASCII-only text change these pros and cons? Do their lazy variants change the story as well?
Can anyone explain the pros and cons to using Data.Text and Data.ByteString.Char8 data types?
Share
Data.ByteString.Char8provides functions to treatByteStringvalues as sequences of 8-bit ASCII characters, whileData.Textis an independent type supporting the entirety of Unicode.ByteStringandTextare essentially the same, as far as representation goes — strict, unboxed arrays with lazy variants based on lists of strict chunks. The main difference is thatByteStringstores octets (i.e.Word8s), whileTextstoresChars, encoded in UTF-16.If you’re working with ASCII-only text, then using
Data.ByteString.Char8will probably be faster thanText, and use less memory; however, you should ask yourself whether you’re really sure that you’re only ever going to work with ASCII. Basically, in 99% of cases, usingData.ByteString.Char8overTextis a speed hack — octets aren’t characters, and any Haskeller can agree that using the correct type should be prioritised over raw, bare-metal speed. You should usually only consider it if you’ve profiled the program and it’s a bottleneck.Textis well-optimised, and the difference will probably be negligible in most cases.Of course, there are non-speed-related situations in which
Data.ByteString.Char8is warranted. Consider a file containing data that is essentially binary, not text, but separated into lines; usinglinesis completely reasonable. Additionally, it’s entirely conceivable that an integer might be encoded in ASCII decimal in the context of a binary format; usingreadIntwould make perfect sense in that case.So, basically:
Data.ByteString.Char8: For pure ASCII situations where performance is paramount, and to handle “almost-binary” data that has some ASCII components.Data.Text: Text, including any situation where there’s the slightest possibility of something other than ASCII being used.