Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6062617
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T09:04:21+00:00 2026-05-23T09:04:21+00:00

I have a little script to read in, parse and derive some kind of

  • 0

I have a little script to read in, parse and derive some kind of interesting (not really) statistics from an apache log file. So far I’ve made two simple options, the total number of bytes sent in all requests in the log file, and a top 10 of the most common IP adresses.

The first “mode” is just a simple sum of all the parsed bytes. The second one is a fold over a map (Data.Map), using insertWith (+) 1' to count the occurrences.

The first one runs as I expected, most of the time spent parsing, in constant space.

42,359,709,344 bytes allocated in
the heap
72,405,840 bytes copied during GC
113,712 bytes maximum residency (1553 sample(s))
145,872 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 76311 collections,
0 parallel, 0.89s, 0.99s elapsed
Generation 1: 1553 collections, 0
parallel, 0.21s, 0.22s elapsed

INIT time 0.00s ( 0.00s
elapsed) MUT time 21.76s (
24.82s elapsed) GC time 1.10s ( 1.20s elapsed) EXIT time
0.00s ( 0.00s elapsed) Total time 22.87s ( 26.02s elapsed)

%GC time 4.8% (4.6% elapsed)

Alloc rate 1,946,258,962 bytes
per MUT second

Productivity 95.2% of total user,
83.6% of total elapsed

However, the second one does not!

49,398,834,152 bytes allocated in
the heap
580,579,208 bytes copied during GC
718,385,088 bytes maximum residency (15 sample(s))
134,532,128 bytes maximum slop
1393 MB total memory in use (172 MB lost due to fragmentation)

Generation 0: 91275 collections,
0 parallel, 252.65s, 254.46s elapsed
Generation 1: 15 collections, 0
parallel, 0.12s, 0.12s elapsed

INIT time 0.00s ( 0.00s
elapsed) MUT time 41.11s (
48.87s elapsed) GC time 252.77s (254.58s elapsed) EXIT time
0.00s ( 0.01s elapsed) Total time 293.88s (303.45s elapsed)

%GC time 86.0% (83.9% elapsed)

Alloc rate 1,201,635,385 bytes
per MUT second

Productivity 14.0% of total user,
13.5% of total elapsed

And here is the code.

{-# LANGUAGE OverloadedStrings #-}

module Main where

import qualified Data.Attoparsec.Lazy as AL
import Data.Attoparsec.Char8 hiding (space, take)
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy.Char8 as L
import Control.Monad (liftM)
import System.Environment (getArgs)
import Prelude hiding (takeWhile)
import qualified Data.Map as M
import Data.List (foldl', sortBy)
import Text.Printf (printf)
import Data.Maybe (fromMaybe)

type Command = String

data LogLine = LogLine {
    getIP     :: S.ByteString,
    getIdent  :: S.ByteString,
    getUser   :: S.ByteString,
    getDate   :: S.ByteString,
    getReq    :: S.ByteString,
    getStatus :: S.ByteString,
    getBytes  :: S.ByteString,
    getPath   :: S.ByteString,
    getUA     :: S.ByteString
} deriving (Ord, Show, Eq)

quote, lbrack, rbrack, space :: Parser Char
quote  = satisfy (== '\"')
lbrack = satisfy (== '[')
rbrack = satisfy (== ']')
space  = satisfy (== ' ')

quotedVal :: Parser S.ByteString
quotedVal = do
    quote
    res <- takeTill (== '\"')
    quote
    return res

bracketedVal :: Parser S.ByteString
bracketedVal = do
    lbrack
    res <- takeTill (== ']')
    rbrack
    return res

val :: Parser S.ByteString
val = takeTill (== ' ')

line :: Parser LogLine
l    ine = do
    ip <- val
    space
    identity <- val
    space
    user <- val
    space
    date <- bracketedVal
    space
    req <- quotedVal
    space
    status <- val
    space
    bytes <- val
    (path,ua) <- option ("","") combined
    return $ LogLine ip identity user date req status bytes path ua

combined :: Parser (S.ByteString,S.ByteString)
combined = do
    space
    path <- quotedVal
    space
    ua <- quotedVal
    return (path,ua)

countBytes :: [L.ByteString] -> Int
countBytes = foldl' count 0
    where
        count acc l = case AL.maybeResult $ AL.parse line l of
            Just x  -> (acc +) . maybe 0 fst . S.readInt . getBytes $ x
            Nothing -> acc

countIPs :: [L.ByteString] -> M.Map S.ByteString Int
countIPs = foldl' count M.empty
    where
        count acc l = case AL.maybeResult $ AL.parse line l of
            Just x -> M.insertWith' (+) (getIP x) 1 acc
            Nothing -> acc

---------------------------------------------------------------------------------

main :: IO ()
main = do
  [cmd,path] <- getArgs
  dispatch cmd path

pretty :: Show a => Int -> (a, Int) -> String
pretty i (bs, n) = printf "%d: %s, %d" i (show bs) n

dispatch :: Command -> FilePath -> IO ()
dispatch cmd path = action path
    where
        action = fromMaybe err (lookup cmd actions)
        err    = printf "Error: %s is not a valid command." cmd

actions :: [(Command, FilePath -> IO ())]
actions = [("bytes", countTotalBytes)
          ,("ips",  topListIP)]

countTotalBytes :: FilePath -> IO ()
countTotalBytes path = print . countBytes . L.lines =<< L.readFile path

topListIP :: FilePath -> IO ()
topListIP path = do
    f <- liftM L.lines $ L.readFile path
    let mostPopular (_,a) (_,b) = compare b a
        m = countIPs f
    mapM_ putStrLn . zipWith pretty [1..] . take 10 . sortBy mostPopular . M.toList $ m

Edit:

Adding +RTS -A16M reduced GC to 20%. Memory use of course unchanged.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T09:04:22+00:00Added an answer on May 23, 2026 at 9:04 am

    I suggest making the following changes to the code:

    @@ -1,4 +1,4 @@
    -{-# LANGUAGE OverloadedStrings #-}
    +{-# LANGUAGE BangPatterns, OverloadedStrings #-}
    
     module Main where
    
    @@ -9,7 +9,7 @@
     import Control.Monad (liftM)
     import System.Environment (getArgs)
     import Prelude hiding (takeWhile)
    -import qualified Data.Map as M
    +import qualified Data.HashMap.Strict as M
     import Data.List (foldl', sortBy)
     import Text.Printf (printf)
     import Data.Maybe (fromMaybe)
    @@ -17,15 +17,15 @@
     type Command = String
    
     data LogLine = LogLine {
    -    getIP     :: S.ByteString,
    -    getIdent  :: S.ByteString,
    -    getUser   :: S.ByteString,
    -    getDate   :: S.ByteString,
    -    getReq    :: S.ByteString,
    -    getStatus :: S.ByteString,
    -    getBytes  :: S.ByteString,
    -    getPath   :: S.ByteString,
    -    getUA     :: S.ByteString
    +    getIP     :: !S.ByteString,
    +    getIdent  :: !S.ByteString,
    +    getUser   :: !S.ByteString,
    +    getDate   :: !S.ByteString,
    +    getReq    :: !S.ByteString,
    +    getStatus :: !S.ByteString,
    +    getBytes  :: !S.ByteString,
    +    getPath   :: !S.ByteString,
    +    getUA     :: !S.ByteString
     } deriving (Ord, Show, Eq)
    
     quote, lbrack, rbrack, space :: Parser Char
    @@ -39,14 +39,14 @@
         quote
         res <- takeTill (== '\"')
         quote
    -    return res
    +    return $! res
    
     bracketedVal :: Parser S.ByteString
     bracketedVal = do
         lbrack
         res <- takeTill (== ']')
         rbrack
    -    return res
    +    return $! res
    
     val :: Parser S.ByteString
     val = takeTill (== ' ')
    @@ -67,14 +67,14 @@
         space
         bytes <- val
         (path,ua) <- option ("","") combined
    -    return $ LogLine ip identity user date req status bytes path ua
    +    return $! LogLine ip identity user date req status bytes path ua
    
     combined :: Parser (S.ByteString,S.ByteString)
     combined = do
         space
    -    path <- quotedVal
    +    !path <- quotedVal
         space
    -    ua <- quotedVal
    +    !ua <- quotedVal
         return (path,ua)
    
     countBytes :: [L.ByteString] -> Int
    @@ -84,11 +84,11 @@
                 Just x  -> (acc +) . maybe 0 fst . S.readInt . getBytes $ x
                 Nothing -> acc
    
    -countIPs :: [L.ByteString] -> M.Map S.ByteString Int
    +countIPs :: [L.ByteString] -> M.HashMap S.ByteString Int
     countIPs = foldl' count M.empty
         where
             count acc l = case AL.maybeResult $ AL.parse line l of
    -            Just x -> M.insertWith' (+) (getIP x) 1 acc
    +            Just x -> M.insertWith (+) (getIP x) 1 acc
                 Nothing -> acc
    
     ---------------------------------------------------------------------------------
    

    I made the fields of LogLine strict to avoid them containing thunks referring to expressions related to parsing. It’s good practice to make fields strict, unless you really need them to be lazy.

    I made sure that the parse result is created as soon as possible (that’s the $! part of the change), also to avoid delaying the parsing until you actually inspect the individual fields of LogLine.

    Finally I switched to a better data structure, HashMap from the unordered-containers package. Note that all functions in Data.HashMap.Strict are value strict, which means we can use the plain insertWith variant.

    Note that taking a sub-string of a ByteString forces the original string to be retained in memory, due to sharing the underlying storage (this is the same as for Java’s String). If you want to make sure that no extra memory is retained, use the copy function from the bytestring package. You can try to call copy on the result of (getIP x) and see if that makes any difference. The trade-off here is using some extra computation to copy the string in return for lower space usage.

    Note that using -A<high number> tend to improve performance of short running programs (i.e. benchmarks) but not necessarily on real programs. Same goes for -H. At least a higher -H value (e.g. 1G) doesn’t hurt the performance of your program.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a little Perl script (On Windows) that checks some files for me
I have a little python script that pulls emails from a POP mail address
I have this nifty little script that does a nice job of manipulating some
I'm developing a little script using ash shell (not bash). Now i have a
I have a little script here which uses DOMDocument to get my data from
I have written a little script which retrieves pictures and movies from my camera
I have this little script to toggle a contact form when a button is
I'm a jQuery newbie, and I have trouble with a little script I created.
I have a little problem with a simple vbScript. The script has to run
This is a little tricky so bear with me: I have a PHP script

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.