Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8109999
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T01:35:48+00:00 2026-06-06T01:35:48+00:00

I run a pretty typical producer/consumer model on different tasks. Task1: Reads batches of

  • 0

I run a pretty typical producer/consumer model on different tasks.

Task1: Reads batches of byte[] from binary files and kicks off a new task for each collection of byte arrays. (the operation is batched for memory management purposes).

Task 2-n: Those are worker tasks and each operates on the passed-in collection (from Tasks1) of byte arrays and de-serializes byte arrays, sorts them by certain criteria, and then stores a collection of resulting objects (each byte array deserializes into such object) in a Concurrent Dictionary.

Task (n+1) I chose a concurrent dictionary because the job of this task is to merge those collections that are stored in the concurrent dictionary in the same order than how they originated from Task1. I achieve that by passing a collectionID (it is of type int and incremented for each new collection within Task1) all the way down from Task1 to this task. This task basically checks whether the next expected collectionID is already stored in the concurrent dictionary and if yes, takes it out, adds it to a Final Queue and checks for the next collection in the concurrent dictionary.

Now, from what I have read and the videos I watched it seems to me that TPL Dataflow may be the perfect candidate for such producer/consumer model. I just do not seem to be able to devise a design and thus get started because I have never worked with TPL Dataflow. In terms of throughput and latency is this library even up to the task? I currently process 2.5 million byte arrays and thus objects per second in the resulting collections. Can TPL Dataflow help to simplify? I am especially interested in the answer to the following question: Can TPL Dataflow preserve the order of collection batches from Task1 when spawning off worker tasks and re-merging them once the worker tasks have done their work? Does it optimize things? Having profiled the whole structure I feel there is quite some time wasted due to spinning and too many concurrent collections involved.

Any ideas, thoughts?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T01:35:50+00:00Added an answer on June 6, 2026 at 1:35 am

    EDIT: Turns out I was very wrong. TransformBlock does return items in the same order they came in, even if it is configured for parallelism. Because of that, the code in my original answer is completely useless and normal TransformBlock can be used instead.


    Original answer:

    As far as I know only one parallelism construct in .Net supports returning processed items in the order they came in: PLINQ with AsOrdered(). But it seems to me that PLINQ doesn’t fit what you want well.

    TPL Dataflow, on the other hand, fits well, I think, but it doesn’t have a block that would support parallelism and returning items in order at the same time (TransformBlock supports both of them, but not at the same time). Fortunately, Dataflow blocks were designed with composability in mind, so we can build our own block that does that.

    But first, we have to figure out how to order the results. Using a concurrent dictionary, like you suggested, along with some synchronization mechanism, would certainly work. But I think there is a simpler solution: use a queue of Tasks. In the output task, you dequeue a Task, wait for it to complete (asynchronously) and when it does, you send its result along. We still need some synchronization for the case when the queue is empty, but we can get that for free if we choose which queue to use cleverly.

    So, the general idea is like this: what we’re writing will be an IPropagatorBlock, with some input and some output. The easiest way to create a custom IPropagatorBlock is to create one block that processes the input, another block that produces the results and treat them as one using DataflowBlock.Encapsulate().

    The input block will have to process the incoming items in the correct order, so no parallelization there. It will create a new Task (actually, a TaskCompletionSource, so that we can set the result of the Task later), add it to the queue and then send the item for processing, along with some way to set the result of the correct Task. Because we don’t need to link this block to anything, we can use an ActionBlock.

    The output block will have to take Tasks from the queue, asynchronously wait for them, and then send them along. But since all blocks have a queue embedded in them, and blocks that take delegates have asynchronous waiting built-in, this will be very simple: new TransformBlock<Task<TOutput>, TOutput>(t => t). This block will work both as the queue and as the output block. Because of this, we don’t have to deal with any synchronization.

    The last piece of the puzzle is actually processing the items in parallel. For this, we can use another ActionBlock, this time with MaxDegreeOfParallelism set. It will take the input, process it, and set the result of the correct Task in the queue.

    Put together, it could look like this:

    public static IPropagatorBlock<TInput, TOutput>
        CreateConcurrentOrderedTransformBlock<TInput, TOutput>(
        Func<TInput, TOutput> transform)
    {
        var queue = new TransformBlock<Task<TOutput>, TOutput>(t => t);
    
        var processor = new ActionBlock<Tuple<TInput, Action<TOutput>>>(
            tuple => tuple.Item2(transform(tuple.Item1)),
            new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
            });
    
        var enqueuer = new ActionBlock<TInput>(
            async item =>
            {
                var tcs = new TaskCompletionSource<TOutput>();
                await processor.SendAsync(
                    new Tuple<TInput, Action<TOutput>>(item, tcs.SetResult));
                await queue.SendAsync(tcs.Task);
            });
    
        enqueuer.Completion.ContinueWith(
            _ =>
            {
                queue.Complete();
                processor.Complete();
            });
    
        return DataflowBlock.Encapsulate(enqueuer, queue);
    }
    

    After so much talk, that’s quite a small amount of code, I think.

    It seems you care about performance a lot, so you might need to fine tune this code. For example, it might make sense to set MaxDegreeOfParallelism of the processor block to something like Environment.ProcessorCount, to avoid oversubscription. Also, if latency is more important than throughput to you, it might make sense to set MaxMessagesPerTask of the same block to 1 (or another small number) so that when processing of an item is finished, it’s sent to the output immediately.

    Also, if you want to throttle incoming items, you could set BoundedCapacity of enqueuer.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I run the following command to link the different files in my project. /opt/gcc-4.7-cilkplus/bin/gcc
So the problem I've run into is pretty basic, but I can't seem to
The title is pretty self-explanatory. When I run my app on Netbeans (Tomcat 7),
I have a pretty complicated index.php now, and I would like to only run
To run the spec files & assets as coffee script, I decided to go
Basically, I've created a 5-star rating system. Pretty typical. It shows how many stars
I have a PHP script that needs to run some pretty hefty imagemagick commands.
I'm very new to jQuery and trying to run a pretty simple jQueryUI dialog
Pretty simple: When I run startup.sh or shutdown.sh information is logged into catalina.out When
I have a pretty run-of-the-mill website: header, body and footer. The header and body

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.