Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3490996
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T11:35:01+00:00 2026-05-18T11:35:01+00:00

So I am writing a packet sniffing app. Basically I wanted it to sniff

  • 0

So I am writing a packet sniffing app. Basically I wanted it to sniff for tcp sessions, and then parse them to see if they are http, and if they are, and if they have the right content type, etc, save them as a file on my hard drive.

So, to that end, I wanted it to be efficient. Since the current http library is string based, and I will be dealing with large files, and I only really needed to parse http responses, I decided to roll my own in attoparsec.

When I finished my program, I found that when I was parsing a 9 meg http response with a wav file in it, when I profiled it, it was allocating a gig of memory when it was trying to parse out the body of the http response. When I look at the HTTP.prof I see some lines:

httpBody              Main                                                 362           1   0.0    0.0    93.8   99.3

 take                 Data.Attoparsec.Internal                             366        1201   0.0    0.0    93.8   99.3
     takeWith            Data.Attoparsec.Internal                             367        3603   0.0    0.0    93.8   99.3
      demandInput        Data.Attoparsec.Internal                             375         293   0.0    0.0    93.8   99.2
       prompt            Data.Attoparsec.Internal                             378         293   0.0    0.0    93.8   99.2
        +++              Data.Attoparsec.Internal                             380         586  93.8   99.2    93.8   99.2

So as you can see, somewhere within httpbody, take is called 1201 times, causing 500+ (+++) concatenations of bytestrings, which causes an absurd amount of memory allocation.

Here’s the code. N is just the content length of the http response, if there is one. If there isn’t one it just tries to take everything.

I wanted it to return a lazy bytestring of 1000 or so character bytestrings, but even if I change it to just take n and return a strict bytestring, it still has those allocations in it (and it uses 14 gig of memory).


httpBody n = do
  x <- if n > 0
    then AC.take n
    else AC.takeWhile (\_ -> True)
  if B.length x == 0
    then return Nothing
    else return (Just x)

I was reading a blog by the guy who did combinatorrent and he was having the same issue, but I never heard of a resolution. Has anyone ever run across this problem before or found a solution?

Edit: Okay, well I left this up the entire day and got nothing. After researching the problem I don’t think there is a way to do it without adding a lazy bytestring accessor to attoparsec. I also looked at all the other libraries and they either lacked bytestrings or other things.

So I found a workaround. If you think about an http request, it goes headers, newline, newline, body. Since the body is last, and parsing returns a tuple with both what you parsed and what is remaining of the bytestring, I can skip parsing the body inside attoparsec and instead pluck the body straight off the bytestring that is left.


parseHTTPs bs = if P.length results == 0
  then Nothing
  else Just results
  where results = foldParse(bs, [])

foldParse (bs,rs) = case ACL.parse httpResponse bs of
  ACL.Done rest r -> addBody (rest,rs) r
  otherwise ->  rs

addBody (rest,rs) http = foldParse (rest', rs')
  where
    contentlength = ((read . BU.toString) (maybe "0" id (hdrContentLength (rspHeaders http))))
    rest' = BL.drop contentlength rest
    rs' = rs ++ [http { rspBody = body' }]
    body'
      | contentlength == 0  = Just rest
      | BL.length rest == 0 = Nothing
      | otherwise           = Just (BL.take contentlength rest)
httpResponse = do
  (code, desc) <- statusLine
  hdrs <- many header
  endOfLine
--  body <- httpBody ((read . BU.toString) (maybe "0" id (hdrContentLength parsedHeaders)))

  return Response { rspCode = code, rspReason = desc, rspHeaders = parseHeaders hdrs,  rspBody = undefined }

It is a little messy, but ultimately it works fast and allocates nothing more than I wanted. So basically you fold over the bytestring collecting http data structures, then in between collections, I check the content length of the structure I just got, pull an appropriate amount from the remaining bytestring, and then continue on if there is any bytestring left.

Edit: I actually finished up this project. Works like a charm. I isn’t cabalized properly but if someone wants to view the entire source, you can find it at https://github.com/onmach/Audio-Sniffer.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T11:35:01+00:00Added an answer on May 18, 2026 at 11:35 am

    combinatorrent guy here 🙂

    If memory serves, the problem with attoparsec is that demands input a little bit at a time, building up a lazy bytestring which is finally concatenated. My “solution” was to roll the input function myself. That is, I get the input stream for attoparsec from a network socket and I know how many bytes to expect in a message. Basically, I split into two cases:

    • The message is small: Read up to 4k from the socket and eat that Bytestring a little bit at a time (slices of bytestrings are fast and we throw away the 4k after it has been exhausted).

    • The message is “large” (large here means around 16 Kilobyte in bittorrent speak): We calculate how much the 4k chunk we have can fulfill, and then we simply request the underlying network socket to fill things in. We now have two bytestrings, the remaining part of the 4k chunk and the large chunk. They have all data, so concatenating those and parsing them in is what we do.

      You may be able to optimize the concatenation step away.

    The TL;DR version: I handle it outside attoparsec and handroll the loop to avoid the problem.

    The relevant combinatorrent commit is fc131fe24, see

    https://github.com/jlouis/combinatorrent/commit/fc131fe24207909dd980c674aae6aaba27b966d4

    for the details.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing an application to parse certain network packets. A packet field contains the
I'm writing a simple proxy (more a packet logger) for an online game in
I'm writing a simple proxy (more a packet logger) for an online game in
I have just started writing socket programs. Came to know that single UDP packet
I am trying to read an HTTP packet via a socket in ruby, and
I'm writing a program that should parse and reply to network packets but I'm
Writing/reading code seems less stress then preparing a deploy scripts such as ./configure then
Im writing a server application for my iPhone app. The section of the server
I'm writing a program in C++ to listen to a stream of tcp messages
Writing some test scripts in IronPython, I want to verify whether a window is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.