Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7547299
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T09:25:06+00:00 2026-05-30T09:25:06+00:00

I need to scan through a document and accumulate the output of different functions

  • 0

I need to scan through a document and accumulate the output of different functions for each string in the file. The function run on any given line of the file depends on what is in that line.

I could do this very inefficiently by making a complete pass through the file for every list I wanted to collect. Example pseudo-code:

at :: B.ByteString -> Maybe Atom
at line
    | line == ATOM record = do stuff to return Just Atom
    | otherwise = Nothing

ot :: B.ByteString -> Maybe Sheet
ot line
    | line == SHEET record = do other stuff to return Just Sheet
    | otherwise = Nothing

Then, I would map each of these functions over the entire list of lines in the file to get a complete list of Atoms and Sheets:

mapper :: [B.ByteString] -> IO ()
mapper lines = do
    let atoms = mapMaybe at lines
    let sheets = mapMaybe to lines
    -- Do stuff with my atoms and sheets

However, this is inefficient because I am maping through the entire list of strings for every list I am trying to create. Instead, I want to map through the list of line strings only once, identify each line as I am moving through it, and then apply the appropriate function and store these values in different lists.

My C mentality wants to do this (pseudo code):

mapper' :: [B.ByteString] -> IO ()
mapper' lines = do
    let atoms = []
    let sheets = []
    for line in lines:
        | line == ATOM record = (atoms = atoms ++ at line)
        | line == SHEET record = (sheets = sheets ++ ot line)
    -- Now 'atoms' is a complete list of all the ATOM records
    --  and 'sheets' is a complete list of all the SHEET records

What is the Haskell way of doing this? I simply can’t get my functional-programming mindset to come up with a solution.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T09:25:07+00:00Added an answer on May 30, 2026 at 9:25 am

    First of all, I think that the answers others have supplied will work at least 95% of the time. It’s always good practice to code for the problem at hand by using appropriate data types (or tuples in some cases). However, sometimes you really don’t know in advance what you’re looking for in the list, and in these cases trying to enumerate all possibilities is difficult/time-consuming/error-prone. Or, you’re writing multiple variants of the same sort of thing (manually inlining multiple folds into one) and you’d like to capture the abstraction.

    Fortunately, there are a few techniques that can help.

    The framework solution

    (somewhat self-evangelizing)

    First, the various “iteratee/enumerator” packages often provide functions to deal with this sort of problem. I’m most familiar with iteratee, which would let you do the following:

    import Data.Iteratee as I
    import Data.Iteratee.Char
    import Data.Maybe
    
    -- first, you'll need some way to process the Atoms/Sheets/etc. you're getting
    -- if you want to just return them as a list, you can use the built-in
    -- stream2list function
    
    -- next, create stream transformers
    -- given at :: B.ByteString -> Maybe Atom
    -- create a stream transformer from ByteString lines to Atoms
    atIter :: Enumeratee [B.ByteString] [Atom] m a
    atIter = I.mapChunks (catMaybes . map at)
    
    otIter :: Enumeratee [B.ByteString] [Sheet] m a
    otIter = I.mapChunks (catMaybes . map ot)
    
    -- finally, combine multiple processors into one
    -- if you have more than one processor, you can use zip3, zip4, etc.
    procFile :: Iteratee [B.ByteString] m ([Atom],[Sheet])
    procFile = I.zip (atIter =$ stream2list) (otIter =$ stream2list)
    
    -- and run it on some data
    runner :: FilePath -> IO ([Atom],[Sheet])
    runner filename = do
      resultIter <- enumFile defaultBufSize filename $= enumLinesBS $ procFile
      run resultIter
    

    One benefit this gives you is extra composability. You can create transformers as you like, and just combine them with zip. You can even run the consumers in parallel if you like (although only if you’re working in the IO monad, and probably not worth it unless the consumers do a lot of work) by changing to this:

    import Data.Iteratee.Parallel
    
    parProcFile = I.zip (parI $ atIter =$ stream2list) (parI $ otIter =$ stream2list)
    

    The result of doing so isn’t the same as a single for-loop – this will still perform multiple traversals of the data. However, the traversal pattern has changed. This will load a certain amount of data at once (defaultBufSize bytes) and traverse that chunk multiple times, storing partial results as necessary. After a chunk has been entirely consumed, the next chunk is loaded and the old one can be garbage collected.

    Hopefully this will demonstrate the difference:

    Data.List.zip:
      x1 x2 x3 .. x_n
                       x1 x2 x3 .. x_n
    
    Data.Iteratee.zip:
      x1 x2      x3 x4      x_n-1 x_n
           x1 x2      x3 x4           x_n-1 x_n
    

    If you’re doing enough work that parallelism makes sense this isn’t a problem at all. Due to memory locality, the performance is much better than multiple traversals over the entire input as Data.List.zip would make.

    The beautiful solution

    If a single-traversal solution really does make the most sense, you might be interested in Max Rabkin’s Beautiful Folding post, and Conal Elliott’s followup work (this too). The essential idea is that you can create data structures to represent folds and zips, and combining these lets you create a new, combined fold/zip function that only needs one traversal. It’s maybe a little advanced for a Haskell beginner, but since you’re thinking about the problem you may find it interesting or useful. Max’s post is probably the best starting point.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on an app in which I need to scan through a string
I'm writing a file manager and need to scan directories and deal with renaming
I want to scan FAT32 disk (I just need file path and file name)
I have a situation where I need to scan through a large number of
Right, I'm iterating through a large binary file I need to minimise the time
For each line in fileX I need to scan all lines of fileY .
I currently have a log file(see bellow) that I need to iterate through and
I need to scan uploaded files for viruses on a Linux server, but I'm
I am creating an application were I need to scan a directory hive to
I'm newbie for python, I'm having task so I need to scan wifi and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.