Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6015377
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T02:49:39+00:00 2026-05-23T02:49:39+00:00

I’m studying multicore parallelism in F#. I have to admit that immutability really helps

  • 0

I’m studying multicore parallelism in F#. I have to admit that immutability really helps to write correct parallel implementation. However, it’s hard to achieve good speedup and good scalability when the number of cores grows. For example, my experience with Quick Sort algorithm is that many attempts to implement parallel Quick Sort in a purely functional way and using List or Array as the representation are failed. Profiling those implementations shows that the number of cache misses increases significantly compared to those of sequential versions. However, if one implements parallel Quick Sort using mutation inside arrays, a good speedup could be obtained. Therefore, I think mutation might be a good practice for optimizing multicore parallelism.

I believe that cache locality is a big obstacle for multicore parallelism in a functional language. Functional programming involves in creating many short-lived objects; destruction of those objects may destroy coherence property of CPU caches. I have seen many suggestions how to improve cache locality in imperative languages, for example, here and here. But it’s not clear to me how they would be done in functional programming, especially with recursive data structures such as trees, etc, which appear quite often.

Are there any techniques to improve cache locality in an impure functional language (specifically F#)? Any advices or code examples are more than welcome.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T02:49:40+00:00Added an answer on May 23, 2026 at 2:49 am

    As far as I can make out, the key to cache locality (multithreaded or otherwise) is

    • Keep work units in a contiguous block of RAM that will fit into the cache

    To this end ;

    • Avoid objects where possible
      • Objects are allocated on the heap, and might be sprayed all over the place, depending on heap fragmentation, etc.
      • You have essentially zero control over the memory placement of objects, to the extent that the GC might move them at any time.
    • Use arrays. Arrays are interpreted by most compilers as a contiguous block of memory.
      • Other collection datatypes might distribute things all over the place – linked lists, for example, are composed of pointers.
    • Use arrays of primitive types. Object types are allocated on the heap, so an array of objects is just an array of pointers to objects that may be distributed all over the heap.
    • Use arrays of structs, if you can’t use primitives. Structs have their fields arranged sequentially in memory, and are treated as primitives by the .NET compilers.
    • Work out the size of the cache on the machine you’ll be executing it on
      • CPUs have different size L2 caches
      • It might be prudent to design your code to scale with different cache sizes
      • Or more simply, write code that will fit inside the lowest common cache size your code will be running on
    • Work out what needs to sit close to each datum
      • In practice, you’re not going to fit your whole working set into the L2 cache
      • Examine (or redesign) your algorithms so that the data structures you are using hold data that’s needed “next” close to data that was previously needed.

    In practice this means that you may end up using data structures that are not theoretically perfect examples of computer science – but that’s all right, computers aren’t theoretically perfect examples of computer science either.

    A good academic paper on the subject is Cache-Efficient String Sorting Using Copying

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.