I’m writing a program where a large number of agents listen for events and

Question

0

Asked: May 23, 20262026-05-23T08:15:50+00:00 2026-05-23T08:15:50+00:00

I’m writing a program where a large number of agents listen for events and

0

I’m writing a program where a large number of agents listen for events and react on them. Since Control.Concurrent.Chan.dupChan is deprecated I decided to use TChan’s as advertised.

The performance of TChan is much worse than I expected. I have the following program that illustrates the issue:

{-# LANGUAGE BangPatterns #-}

module Main where

import Control.Concurrent.STM
import Control.Concurrent
import System.Random(randomRIO)
import Control.Monad(forever, when)

allCoords :: [(Int,Int)]
allCoords = [(x,y) | x <- [0..99], y <- [0..99]]

randomCoords :: IO (Int,Int)
randomCoords = do
  x <- randomRIO (0,99)
  y <- randomRIO (0,99)
  return (x,y)

main = do
  chan <- newTChanIO :: IO (TChan ((Int,Int),Int))

  let watcher p = do
         chan' <- atomically $ dupTChan chan
         forkIO $ forever $ do
                    r@(p',_counter) <- atomically $ readTChan chan'
                    when (p == p') (print r)
         return ()

  mapM_ watcher allCoords

  let go !cnt = do
       xy <- randomCoords
       atomically $ writeTChan chan (xy,cnt)
       go (cnt+1)

  go 1

When compiled (-O) and run the program first will output something like this:

./tchantest
((0,25),341)
((0,33),523)
((0,33),654)
((0,35),196)
((0,48),181)
((0,48),446)
((1,15),676)
((1,50),260)
((1,78),561)
((2,30),622)
((2,38),383)
((2,41),365)
((2,50),596)
((2,57),194)
((3,19),259)
((3,27),344)
((3,33),65)
((3,37),124)
((3,49),109)
((3,72),91)
((3,87),637)
((3,96),14)
((4,0),34)
((4,17),390)
((4,73),381)
((4,74),217)
((4,78),150)
((5,7),476)
((5,27),207)
((5,47),197)
((5,49),543)
((5,53),641)
((5,58),175)
((5,70),497)
((5,88),421)
((5,89),617)
((6,0),15)
((6,4),322)
((6,16),661)
((6,18),405)
((6,30),526)
((6,50),183)
((6,61),528)
((7,0),74)
((7,28),479)
((7,66),418)
((7,72),318)
((7,79),101)
((7,84),462)
((7,98),669)
((8,5),126)
((8,64),113)
((8,77),154)
((8,83),265)
((9,4),253)
((9,26),220)
((9,41),255)
((9,63),51)
((9,64),229)
((9,73),621)
((9,76),384)
((9,92),569)
...

And then, at some point, will stop writing anything, while still consuming 100% cpu.

((20,56),186)
((20,58),558)
((20,68),277)
((20,76),102)
((21,5),396)
((21,7),84)

With -threaded the lockup is even faster and occurs after only a handful of lines. It will also consume whatever number of cores are made available through RTS’ -N flag.

Additionally the performance seems rather poor – only about 100 events per second are processed.

Is this a bug in STM or am I misunderstanding something about semantics of STM?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T08:15:51+00:00

The program is going to perform quite badly. You’re spawning off 10,000 threads all of which will queue up waiting for a single TVar to be written to. So once they’re all going, you may well get this happening:

Each of the 10,000 threads tries to read from the channel, finds it empty, and adds itself to the wait queue for the underlying TVar. So you’ll have 10,000 queue-up events, and 10,000 processes in the wait queue for the TVar.
Something is written to the channel. This will unqueue each of the 10,000 threads and put it back on the run-queue (this may be O(N) or O(1), depending on how the RTS is written).
Each of the 10,000 threads must then process the item to see if it’s interested in it, which most won’t be.

So each item will cause processing O(10,000). If you see 100 events per second, that means that each thread requires about 1 microsecond to wake up, read a couple of TVars, write to one and queue up again. That doesn’t seem so unreasonable. I don’t understand why the program would grind to a complete halt, though.

In general, I would scrap this design and replace it as follows:

Have a single thread reading the event channel, which maintains a map from coordinate to interested-receiver-channel. The single thread can then pick out the receiver(s) from the map in O(log N) time (much better than O(N), and with a much smaller constant factor involved), and send the event to just the interested receiver. So you perform just one or two communications to the interested party, rather than 10,000 communications to everyone. A list-based form of the idea is written in CHP in section 5.4 of this paper: http://chplib.files.wordpress.com/2011/05/chp.pdf

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a program where a large number of agents listen for events and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply