What is the current state of libraries for scraping websites with Haskell?
I’m trying to make myself do more of my quick oneoff tasks in Haskell, in order to help increase my comfort level with the language.
In Python, I tend to use the excellent PyQuery library for this. Is there something similarly simple and easy in Haskell? I’ve looked into Tag Soup, and while the parser itself seems nice, actually traversing pages doesn’t seem as nice as it is in other languages.
Is there a better option out there?
From my searching on the Haskell mailing lists, it appears that TagSoup is the dominant choice for parsing pages. For example:
http://www.haskell.org/pipermail/haskell-cafe/2008-August/045721.html
As far as the other aspects of web scraping (such as crawling, spidering, and caching), I searched http://hackage.haskell.org/package/ for those keywords but didn’t find anything promising. I even skimmed through packages mentioning “http” but nothing jumped out at me.
Note: I’m not a regular Haskeller, so I hope others can chime in if I missed something.