Saturday, December 21, 2013

Secrets of the ANTs Data Server 07 --Memory Management


There are several disadvantages to the standard C/C++ memory management libraries represented by malloc() and free() or new and delete. These libraries have to handle the most general case of managing memory chunks in size from as small as a byte to as large as megabytes and do so in the face of completely arbitrary allocations and deallocations. Solving this problem and avoiding fragmentation requires a lot of runtime work to coalesce adjacent deallocated objects. It also requires an overhead of a word or two of for each allocation. This overhead is significant if a lot of your chunks are small.

Also, these libraries have no inherent protection against memory leaks or other memory-management problems such as freeing the same object twice. A server like ADS is intended to run for months at a time without stopping, so memory leaks and other memory-management errors are not tolerable.

For these reasons, and because ADS had to do a lot of memory management of small to medium-sized objects, ADS implemented its own memory management libraries --about four of them. These four memory management facilities were not destructor-friendly. That is, they did not guarantee to run C++ destructors, so they were only used for object that did not have destructors.


The arena allocator was used primarily in the SQL compiler. To use the Arena library, a task would begin by allocating an arena. There was only one operation on an arena other than deallocating the arena itself:
void* arenaAlloc(Arena* arena, size_t n)
This function acted like malloc(), it would return a new chunk of memory n bytes large. However, there was no way to free these individual objects. Instead, all chunks from the arena would remain live until the entire arena was deallocated, and then they would all vanish.

Arenas are convenient for allocating nodes in the parse tree and the other small tree and graph nodes used in a compiler. Because we were using an arena, we never had to worry about having multiple pointers to a parse tree node because it was never freed individually.

There were a few places where a node was used for a time and then no longer needed so it could have been freed, but the total memory used by such objects was never very large compared to the memory that could never be freed until the compilation was finished. It may be that even with these objects, the compiler used less memory overall because most compiler object are small and arena-allocated chunks did not need the 4 to 16 bytes per chunk of overhead that the standard allocation facility uses.

A new arena would be allocated with a single 64KB page and all allocations would take memory out of that page if there was enough space left. If there was not enough space, a new 64KB page would be allocated for the object. If the object was larger than 64KB then a page was allocated that was large enough for the object, but a multiple of 64KB to help out the higher-level memory management routines (I think we used malloc() for this this).

This would create a chain of allocation pages, each of which might contain some unallocated space at the end. If the unallocated space was large enough, then each allocation would first check to see if it could fit into one of these pages before going to the last page.

Memory Pools

A memory pool is an allocator that only lets you allocate fixed-size blocks. That is, every block allocated from a particular memory pool has the same size.Since all blocks are the same size, there is no danger of fragmentation and no need to coalesce freed blocks to avoid it. The memory for memory pools came out of 64KB pages and like arenas, memory pools could chain together multiple 64KB pages if the first page ran out of memory. However, memory pools only used 64KB pages allocated on 64K boundaries.

In ADS most memory pools were used for only a particular type of object. For example, there were memory pools that were only used to allocate update blocks. Typically there was one memory pool for each worker thread for each type. So that in some cases there was less contention for allocation.

Unlke arenas, memory pools supported freeing of objects. When an object was freed, it would be added to a linked list of freed objects for that memory pool. I believe that an allocation request would first be served from empty space at the end of the last 64KB page. This was the case where there was no contention over allocations. If there was no space left at the end of the page, but the free list was non-empty, then an item would be returned from the free list. In this case, there was potential contention since any thread might be freeing an object, so access to the free list had to be thread-safe. Finally if there was no space at the end of the last page and nothing in the free list, then a new page would be allocated.

Memory pools did not need to tag allocated blocks with information. The size was fixed and there was no need to record what page the block came from because all pages were allocated on 64K boundaries, so you could find the address of the page that a block was contained in by masking off the lower bits of the block address.

It always seemed to me to be overkill having a different allocator for each type, but this turned out to be useful in tracking down memory leaks. It was easy to instrument the allocator to tell you at least what kind of blocks were leaking, and this is the first step in tracking down a leak.

Variable Blocks

The variable block allocator was actually a collection of memory pools ranging in size by powers of two. I believe there were memory pools for 8, 16, 32, 64, 128, 256, 512, and 1024 bytes. When you requested a chunk of size n, it would give you a chunk from the smallest block allocator of at least size n.

Again, there is no need for the overhead of keeping the block size around because the deallocator can always find the page header by truncating the block address to a 64K boundary and from the page header it can find the size of the block.


ADS reserved a big chunk of 64MB of memory for special limited-use applications. Blocks could be allocated from this memory much like an arena, but with the additional restriction that the memory would be automatically freed at the end of the current phase so a task had to give up all pointers into this memory before relinquishing control.

As far as I know, the only use of the scratch pad was to sort large sets of data for index rebalancing and for sorted merge joins.

No comments:

Post a Comment