I am programming some ‘openvpn-like’ thing and thought it would be a good candidate to improve my Haskell knowledge. However, I ran into quite severe performance problems.
What it does: It opens a TUN device; it binds itself on an UDP port, starts 2 threads (forkIO, however compiled with -threaded because of the fdRead). I have not used the tuntap package and did it myself completely in Haskell.
thread 1: read a packet (fdRead) from a tun device. Send it using UDP socket.
thread 2: read a packet (recv) from an UDP socket; send it to tun device (fdWrite)
Problem 1: In this configuration fdRead returns String and I have used the Network.Socket functions that accept String. I made a configuration on local system (some iptables magic) and I can run 15MB/s through it on localhost, the program run basically on 100% CPU. That’s slow. Is there anything I could do to improve the performance?
Problem 2: I will have to prepend something to the packets I am sending; however the sendMany network function takes only ByteString; reading from Fd returns String. Conversion is pretty slow. Converting to Handle doesn’t seem to work well enough with the TUN device….
Problem 3: I wanted to store some information in Data.Heap (functional heap) (I need to use the ‘takeMin’ and although for 3 items it is overkill, it is easy to do 🙂 ). So I created an MVar and on each received packet I’ve pulled the Heap from the MVar, updated the Heap with new info and put it back inito the MVar Now the thing simply starts to eat A LOT of memory. Probably because the old heaps don’t get garbage collected soon/frequently enough..?
Is there a way to solve these problems or do I have to get back to C…? What I am doing should be mostly zerocopy operation – am I using wrong libraries to achieve it?
==================
What I did:
– when putting to MVar, did:
a `seq` putMVar mvar a
That perfectly helped with the memory leak.
- changed to ByteString; now I get 42MB/s when using just ‘read/write’ with no further processing. The C version does about 56MB/s so this is acceptable.
String is slow. Really, really, really slow. It’s a singly-linked list of cons cells containing one unicode character each. Writing one to a socket requires converting each character to bytes, copying those bytes into an array, and handing that array to the system call. What part of this sounds like what you want to be doing? 🙂
You want to be using ByteString exclusively. The ByteString IO functions actually use zero-copy IO where possible. Especially look at the network-bytestring package on hackage. It contains versions of all the network libraries that are optimized to work efficiently with ByteString.