Previous Entry Share Next Entry
Back to the OS class: Memory Allocation

A lot of us in India learn OS concepts from textbooks. The concepts really go directly from the textbooks into examination papers for most, including me. We would grab on to key phrases like "semaphores", "paging", "segmentation", "stack" and so on, and never really stop to wonder how this all is *really* implemented. Some of us aren't interested since all we care about is getting a well paying job while others find goofing around to be a more attractive alternative. Either way, very few really get it.

Four years since I finished college, six since I last took an OS class, and I can say now that I finally got it. Somewhat.

Recently there was a very interesting case I hit upon, which got me wondering how memory allocation was managed by the workhorse of memory allocation, the malloc() function. The malloc function is implemented in the glibc library for Linux (and also for other Unix systems if you want). Its major responsibility (along with its friends, free, mallopt, etc.) is to do accounting of memory allocated to a process. The actual *giving* and *taking* of memory is done by the kernel.

Now the fun part is that on a typical *nix system, there are two ways to request memory from the OS. One is using the brk() system call and the other is by using the mmap() system call. So which one should glibc use to allocate memory? The answer is, both. What malloc does is that it uses brk() to allocate memory for small requests and mmap() for large requests.

So what is the difference between mmap and brk you ask? Well, every process has a block of contiguous memory called the data area. The brk() system call simply increases one end of the data area and hence increases size, allocating memory to the process. To free, all it does is decrease the same end of the data area. This operation is quite fast most of the time. On the other hand, the mmap system call picks up a completely different portion of memory and maps it into the address space of the process, so that the process can see it. Additionally, the mmap call also has to put zeroes in the entire memory area it is about to allocate so that it does not end up leaking information of some old process to this one. This makes mmap quite slow.

So why have mmap at all if it is so slow? The reason is the inherent limitation that brk() has due to the fact that it only grows one way and is always contiguous. Take a case where I allocate 10 objects using brk and then free the one that I had allocated first. Despite the fact that the location is now free, it cannot be given back to the OS since it is locked by the other 9 objects. One way that malloc works around this is by trying to reuse these free spaces. But what if the size of the object I am about to allocate next is larger than any of these freed "holes"? Those holes remain and the process ends up using more memory than it really needs. This is "internal fragmentation".

So to minimize the effect of this internal fragmentation, glibc limits allocation of small objects to brk(). Larger objects are allocated with mmap(). A threshold was set at 128KB, so objects smaller than it are allocated using brk and anything larger is allocated using mmap. The assumption is that smaller object requests would come more often, so the little fragmentation is worth the improvement in speed. Oh, and as for the reuse of the memory holes, it does the "best fit algorithm" -- remember that phrase? ;)

But with more recent versions of glibc (around 2006 actually, so not *very* recent), this threshold limit is dynamic. glibc now tries to adjust to the memory allocation pattern of your program. If it finds that you are allocating larger objects and freeing them soon, it will then increase the threshold, expecting you to allocate larger objects and free them more often. There is of course an upper limit of 32 MB to this. So anything larger than 32 MB will *always* be allocated using mmap. This is quite awesome since it speeds up malloc quite a bit. But it obviously comes with the price of potentially larger memory holes.

There is so much more to this, like the actual details of the way accounting of the brk'ed memory is done, obstacks, arenas. The fun seems to be only beginning.

  • 1

Difference between sbrk() and mmap()

Hi Siddhesh,

I am a little confused.
When you say
"Well, every process has a block of contiguous memory called the data area"

Do you mean physically or virtually contiguous area? If it's only virtually contiguous then what is the difference between mmap and sbrk?
Because in that case sbrk() will also allocate memory in any physical location and maps it to process virtual address space keeping the process data area virtually contiguous.

Re: Difference between sbrk() and mmap()

Don't think about physical location of the memory at all because we're not even thinking about that level of abstraction when we allocate/free memory using the malloc functions. The contiguous/non-contiguous that we're talking about is for virtual memory, i.e. the address space. We want to optimize our use of the address space and then we trust the kernel to optimize actual memory usage (RSS, as opposed to VSZ) based on that.

The difference between mmap and brk is the address block it returns in the address space and things that you can do with it.

brk does not actually give you an address block, there is one block available at program start (called the program heap) and you can grow or shrink it using the sbrk call. The kernel will provide physical memory when you access it, i.e. read its contents. You may imagine that it is 'allocating' a new block in the address space by extending the current block or freeing address space by shrinking the block.

You can do that only at the end of the heap however and not in the middle, which is the primary difference from mmap.

mmap gives you a new address block anywhere in the address space. You may ask for a specific address location, but mmap may not always give that to you. Hence, subsequent mmap calls may give you locations all over the address space (it does not quite do that in practice for performance reasons, but ignore that for the moment) and hence when you're freeing blocks, you can free, say, the 2nd block you allocated without having to free the 3rd and 4th blocks.

PS: This post now resides here:

Re: Difference between sbrk() and mmap()

Thanx for the answer.
I thought mmap region is fixed and lies in between stack and heap.
Seems like it is not like that. It can be anywhere in process virtual address space?

Re: Difference between sbrk() and mmap()

Yes, it can be anywhere. Traditionally the heap used to be at the beginning and stack at the end of the address space, but since address space randomization, that is not necessary and your mmapped address block can be anywhere in the address space.

However, even if it was always going to be between the heap and stack, it can be anywhere in between. if the heap ended at 0x2000000000000000 and the stack ended (grows up) at 0x0000ffffffffffff, you still have a massive gap in between where your mmap blocks could be allocated. That is, you could have millions of allocations and frees anywhere in that region independent of each other.

With brk however, the heap grows and shrinks at only one end, so if you free a block in between, it is not going to be returned to the OS because the blocks near the end are still in use.

I hope this makes it clearer.

Re: Difference between sbrk() and mmap()

Thanx Siddhesh.

It's much more clear now. It's like mmap creates new vm_area_struct for every new allocation.

  • 1

Log in