Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 958489
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T00:52:30+00:00 2026-05-16T00:52:30+00:00

I did some reading on cache misses optimization and come to know this stdlib

  • 0

I did some reading on cache misses optimization and come to know this stdlib function. It does some kind of memory alignment for optimization, but can any1 help me explain what this function really does? It takes 3 arguments: void* * memptr, size_t alignment, size_t size

The part that I don’t get is what the documentation means by

“allocated size byte aligned on a
boundary specified by alignment…”

What I understood from the reading is the function kind of allocate a block of memory with size of size, but after that, I don’t get what they means by “boundary”… Is it the memory block being dissect into smaller chunk with size of alignment?

Here is the documentation: http://www.opengroup.org/onlinepubs/9699919799/functions/posix_memalign.html

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T00:52:31+00:00Added an answer on May 16, 2026 at 12:52 am

    I did some reading on cache misses optimization and come to know this stdlib function. It does some kind of memory alignment for optimization, but can any1 help me explain what this function really does?

    The main purpose of the function is to allocate a buffer aligned to the page size. That is rarely made for performance – normally because a buffer suitable for a device driver/direct hardware access is required.

    Lion share of performance vs. memory alignment problems are already resolved by the compilers themselves. E.g. all basic types – char, short, int, long – are already positioned in memory (or inside of struct) on their natural alignment: the address of a variable (or struct’s field) is divisible by the size of the variable. To achieve that the padding is used. (E.g. in char a; int b; after the a, sizeof(char)-sizeof(int) bytes would be added to make sure that the b‘s address is aligned on sizeof(b).)

    I don’t get what they means by “boundary”… Is it the memory block being dissect into smaller chunk with size of alignment?

    H/W devices (esp. non-PCI ones) often see the memory as blocks on N bytes and can access only the N bytes at a time. Boundary in the context means the start of a block, as in “block boundary”.

    Now, reluctantly, I mention the impact of alignment on performance. Remember, premature optimization is a root of all evil. The tricks are highly platform and CPU specific, thus generally should no be used:

    • Page size alignment is desired in some cases when you want to improve locality of your data. CPUs to translate virtual addresses to physical RAM locations maintain caches. Less pages code accesses, less stress that puts on the CPU. (Most OSs already try to optimize page layout of applications to minimize the overhead of virt to phys address translation.) If you know that your very very often accessed structure fits the single page, then it is might be advisable to put it into a page aligned storage to guarantee that it would contained within single page. malloc() doesn’t provide the guaranteed and might put the structure so that it starts on one page and ends on another – crosses the page boundary – thus occupying two entries in TLB instead of desired single entry. (How to find page size.)

    • Cache line alignment. Though application can address memory in bytes, actually CPU can access physical RAM only the blocks, generally referred to as “cache line”. This is smallest addressable unit of the physical RAM. By utilizing cache line alignment of a structure, one aims to minimize cache foot print and cache misses of the code. Cache line size of DRAM/DDR is 16 bytes. It can be greater (32 or 64 bytes) if platform’s memory controller has wider data bus and accesses several memory modules in parallel. Same logic (as for page alignment) applies here too: if you put e.g. structure fields often accessed as a group together, aligned on the cache line size, you can minimize cache footprint of the data drastically. Simplest example would be a std::map< struct aaa *, void * >. If the struct aaa contains lots of fields, to minimize cache footprint one would put all fields used for comparison (key fields) at the beginning of the struct. If the key fields are spread over the structure, comparison would touch in worst case a cache line per key field. If the key fields are grouped together at the beginning of the struct, then comparison would likely touch much less cache lines. Less cache lines data needs, more cache is left for the rest of the application. Cache line size generally not available to the applications, though it can be found out by utilizing various tricks.

    I brushed a lot of little details to keep it relatively short. If you want to know more about it, then reading some CPU manual is advised. E.g. Intel has rather good developer’s manuals.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I was reading the wikipedia on the CPU cache here: http://en.wikipedia.org/wiki/CPU_cache#Replacement_Policies Marking some memory
I did some reading on this, and from questions similar to mine, it looks
While I was doing some reading on system calls, I did a search for
Did some googling and couldn't find a clear answer on this. My assumption is
So I've never done any assembly programming (although I did some reading/reasoning out the
I did some fairly thorough reading and searching through SO and didn't find anything
I did some reading on IDE`s (I am currently using Code::Blocks) and everyone appears
I am relatively new to java and I did some reading about private and
I know this has been asked a million times and I did do my
I did some reading on the API for File IO and read the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.