I have a 300MB file ( link ) with utf-8 characters in it. I

Question

0

Asked: May 26, 20262026-05-26T22:21:33+00:00 2026-05-26T22:21:33+00:00

I have a 300MB file ( link ) with utf-8 characters in it. I

0

I have a 300MB file (link) with utf-8 characters in it. I want to write a haskell program equivalent to:

cat bigfile.txt | grep "^en " | wc -l

This runs in 2.6s on my system.

Right now, I’m reading the file as a normal String (readFile), and have this:

main = do
    contents <- readFile "bigfile.txt"
    putStrLn $ show $ length $ lines contents

After a couple seconds I get this error:

Dictionary.hs: bigfile.txt: hGetContents: invalid argument (Illegal byte sequence)

I assume I need to use something more utf-8 friendly? How can I make it both fast, and utf-8 compatible? I read about Data.ByteString.Lazy for speed, but Real World Haskell says it doesn’t support utf-8.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T22:21:33+00:00

Editorial Team

2026-05-26T22:21:33+00:00Added an answer on May 26, 2026 at 10:21 pm

Package utf8-string provides support for reading and writing UTF8 Strings. It reuses the ByteString infrastructure so the interface is likely to be very similar.

Another Unicode strings project which is likely to be related to the above and is also inspired by ByteStrings is discussed in this Masters thesis.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a 300MB file ( link ) with utf-8 characters in it. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply