In almost all books and articles I have read about about HIGHMEM in Linux kernel, they say while using 3:1 split, not all of the 1GB is available to the kernel for mapping. And normally its 896MB or so, with the rest being used for kernel data structures, memory maps, page tables and such.
My question is, what exactly are these data structures? Page tables are normally accessed via a page table address register, right? And the base address of page table is normally stored as a physical address. Now why do one need to reserve a virtual address space for the entire table?
Similarly, I read about kernel code itself occupying space. What does that have to do with virtual address space? Is it not the physical memory that would be consumed for storing the code?
And finally, these data structures why do they have to reserve the 128MB space? Why can’t they be used out of the entire 1GB address space, as required, like any other normal data structure in kernel would do?
I have gone through LDD3, Professional Linux Kernel Architecture and several posts here at stack-overflow (like: Why Linux Kernel ZONE_NORMAL is limited to 896 MB?) and an older LWN article, but found no specific information on the same.
With regards to page tables, it’s true that the MMU wouldn’t care if the page tables themselves weren’t mapped in the virtual address space – for the purposes of address translations, that would be OK. But when the kernel needs to modify the page tables, they do need to be mapped in the virtual address space – and the kernel can’t just map them in “just in time”, because it needs to modify the page tables themselves to do that. It’s a chicken-and-egg problem, which means that the page tables need to remain mapped at all times.
A similar problem exists with the kernel code. For the code to execute, it must be mapped in the virtual address space – and if the code that does the page table modification were itself not present, we’d have a similar chicken-and-egg problem. Given this, it’s easier to leave the entireity of the kernel code mapped all the time, along with the kernel-mode stacks and any kernel data structures access by code where you wouldn’t want to potentially take a page fault. One large example of such data structures is the array of
struct pagestructures, representing each physical memory page.