Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8774177
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T18:29:40+00:00 2026-06-13T18:29:40+00:00

Debugging my program on big counts of kernels, I faced with very strange error

  • 0

Debugging my program on big counts of kernels, I faced with very strange error of insufficient virtual memory. My investigations lead to peace of code, where master sends small messages to each slave. Then I wrote small program, where 1 master simply send 10 integers with MPI_SEND and all slaves receives it with MPI_RECV. Comparison of files /proc/self/status before and after MPI_SEND showed, that difference between memory sizes is huge! The most interesting thing (which crashes my program), is that this memory won’t deallocate after MPI_Send and still take huge space.

Any ideas?

 System memory usage before MPI_Send, rank: 0
Name:   test_send_size                                                                                
State:  R (running)                                                                                  
Pid:    7825                                                                                           
Groups: 2840                                                                                        
VmPeak:   251400 kB                                                                                 
VmSize:   186628 kB                                                                                 
VmLck:        72 kB                                                                                  
VmHWM:      4068 kB                                                                                  
VmRSS:      4068 kB                                                                                  
VmData:    71076 kB                                                                                 
VmStk:        92 kB                                                                                  
VmExe:       604 kB                                                                                  
VmLib:      6588 kB                                                                                  
VmPTE:       148 kB                                                                                  
VmSwap:        0 kB                                                                                 
Threads:    3                                                                                          

 System memory usage after MPI_Send, rank 0
Name:   test_send_size                                                                                
State:  R (running)                                                                                  
Pid:    7825                                                                                           
Groups: 2840                                                                                        
VmPeak:   456880 kB                                                                                 
VmSize:   456872 kB                                                                                 
VmLck:    257884 kB                                                                                  
VmHWM:    274612 kB                                                                                  
VmRSS:    274612 kB                                                                                  
VmData:   341320 kB                                                                                 
VmStk:        92 kB                                                                                  
VmExe:       604 kB                                                                                  
VmLib:      6588 kB                                                                                  
VmPTE:       676 kB                                                                                  
VmSwap:        0 kB                                                                                 
Threads:    3        
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T18:29:41+00:00Added an answer on June 13, 2026 at 6:29 pm

    This is an expected behaviour from almost any MPI implementation that runs over InfiniBand. The IB RDMA mechanisms require that data buffers should be registered, i.e. they are first locked into a fixed position in the physical memory and then the driver tells the InfiniBand HCA how to map virtual addresses to physical memory. It is very complex and hence very slow process to register memory for usage by the IB HCA and that’s why most MPI implementations never unregister memory that was once registered in hope that the same memory would later be used as a source or data target again. If the registered memory was heap memory, it is never returned back to the operating system and that’s why your data segment only grows in size.

    Reuse send and receive buffers as much as possible. Keep in mind that communication over InfiniBand incurrs high memory overhead. Most people don’t really think about this and it is usually poorly documented, but InfiniBand uses a lot of special data structures (queues) which are allocated in the memory of the process and those queues grow significantly with the number of processes. In some fully connected cases the amount of queue memory can be so large that no memory is actually left for the application.

    There are certain parameters that control IB queues used by Intel MPI. The most important in your case is I_MPI_DAPL_BUFFER_NUM which controls the amount of preallocated and preregistered memory. It’s default value is 16, so you might want to decrease it. Be aware of possible performance implications though. You can also try to use dynamic preallocated buffer sizes by setting I_MPI_DAPL_BUFFER_ENLARGEMENT to 1. With this option enabled, Intel MPI would initially register small buffers and will later grow them if needed. Note also that IMPI opens connections lazily and that’s why you see the huge increase in used memory only after the call to MPI_Send.

    If not using the DAPL transport, e.g. using the ofa transport instead, there is not much that you can do. You can enable XRC queues by setting I_MPI_OFA_USE_XRC to 1. This should somehow decrease the memory used. Also enabling dynamic queue pairs creation by setting I_MPI_OFA_DYNAMIC_QPS to 1 might decrease memory usage if the communication graph of your program is not fully connected (a fully connected program is one in which each rank talks to all other ranks).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem with debugging sessions. My program executes very well in a
I'm debugging a program I wrote and noticed something strange. I set up an
I am debugging a program that fails during a low memory situation and would
I've been debugging my program, because it throws Out Of Memory exceptions. After stripping
In debugging my program with Valgrind, I have discovered a memory leak despite what
I have been debugging this program to find the error but couldn't succeed in
I've created a very basic 'debugging' program that checks if a c source file
I've created a program with some very big .cs files. So i tried to
I have a debugging program which I've written to attach to a process and
Is there a way how to stop debugging when program hits the breakpoint (i.e.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.