KS2007: Memory management

Benefits for LWN subscribers
The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
September 10, 2007

LWN.net Kernel Summit 2007 coverage

The kernel summit session on memory management was led by Mel Gorman and Peter Zijlstra. While the VM hackers have a lot going on, this session was dominated by three topics: large page support, test cases, and memory pressure notification.

There continues to be pressure for improved large-page support on Linux systems. For almost any architecture, proper use of large pages can help to relieve pressure on the translation lookaside buffer (TLB), with a corresponding increase in performance. Some architectures (SuperH, for example) have very small TLBs and, thus, a large motivation to use large pages whenever possible. This would be easier to do if Linux could support more than one size of large pages. Some processors have several different size options, some up to 1GB.

Large pages are currently made available via hugetlbfs, an interface which application developers have, in general, not yet learned to love. Hugetlbfs currently only provides a single size of large pages, so providing multiple page sizes will require an extension to this virtual filesystem. Initially, an extension might take a relatively rudimentary form, such as a mount-time page size option. Multiple sizes could then be accommodated by mounting hugetlbfs multiple times.

There are challenges involved in supporting some of these page sizes, though. 1GB pages are currently larger than MAX_ORDER, the largest chunk of contiguous (small) pages that the kernel tracks. Increasing MAX_ORDER is a bit more work than just changing a definition somewhere. Different sizes of pages also have to be established at different levels in the page table hierarchy, something which is not currently well supported by the kernel's page table API. Linus cut short discussion on API issues, though, warning against any attempts to generalize the generic API for all of the large page issues. So much of this problem is so incredibly architecture-specific that trying to solve it in generic code is likely to lead to bigger messes than it solves. So much of the work for large-page support will probably have to be done in architecture-specific code.

Mel spent much of the session trying to get the larger group to agree on what a proper test case for memory management patches is. Or, even if they wouldn't agree, to just get some suggestions for what could be a good test case. It would appear that he has grown just a little bit weary of being told that his patches need to be benchmarked on a real test case before they can be considered for inclusion. He seems willing to do that benchmarking, but, so far, nobody has stepped forward and told him what kind of "real workload" they are expecting him to use.

He got little satisfaction at the summit. The problem is that some kinds of workloads are relatively easy to benchmark, but other kinds of parameters ("interactivity") are hard to measure. So, even if somebody could put together in implementation of (say) swap prefetch, there is no real way to prove that it is actually useful. And, in the absence of such proof, memory management patches are notoriously hard to merge. There were not a whole lot of ideas for improving the situation. Your editor can say, though, that he will go out of his way not to be the next reviewer to ask Mel about which real workloads he has tested a patch on.

The final topic was working out a way to let applications help when the system is under memory pressure. Web browsers, for example, often maintain large in-memory caches which can be dropped if the system finds itself running out of memory - but that will only happen if the browser knows about the problem. There are other applications in a similar situation; GNOME and KDE applications, for example, tend to carry a certain amount of cached data which can be done without if the need arises.

The problem is figuring out how to tell the application that the time has come to free up some memory. Sending a signal might be an obvious way to send a notification, but nobody really wants to extend the signal interface. Responses to memory pressure notifications must often be done in libraries, and working with signals in library code is especially problematic. In the absence of signals, there will have to be a way for applications to somehow ask about memory pressure.

After a brief digression into the rarefied, philosophical question of just what is memory pressure in the first place, the discussion wandered into a different approach to the problem. Perhaps an application could make a system call to indicate that it does not currently need a specific range of memory, but, if the system doesn't mind, keeping it around might just be useful. If, at some future point, the application wants something that it had cached, it makes another call to query whether the given range of memory is still there. This would give the kernel a list of pages it could dump if it finds itself in a tight spot, but still keeps the data around if there is not a pressing need for that memory.

Linus cautioned that these system calls might seem like a nice idea, but that nobody would ever use them. In general, he says, Linux-specific extensions tend not be used. Developers do not want to maintain any more system-specific code than they really have to. Some people thought that there might be motivation for a few library developers to use these calls, though. But until such a time as a patch implementing them actually exists, this discussion will probably not go a whole lot further.

Index entries for this article
Kernel	Memory management/Conference sessions

KS2007: Memory management & application notification of low memory

Posted Sep 10, 2007 22:53 UTC (Mon) by vomlehn (guest, #45588) [Link] (5 responses)

The claim is that application developers will tend to avoid application notification of low memory conditions as it would be a Linux-specific extension. This is only an issue in situations where an application is targeted to multiple operating systems. In the embedded world, user-mode software tends to be more closely tied to the operating system. In addition, many embedded applications need the benefit of keeping around cached data as their network connections tend to be slower. As a result, I've already been toying with the notion of designing a framework for user mode software to return memory to the kernel on demand. I doubt I'm alone in this...and I certainly don't want to support it all myself...

KS2007: Memory management & application notification of low memory

Posted Sep 13, 2007 22:07 UTC (Thu) by oak (guest, #2786) [Link] (3 responses)

I think in Nokia internet tablets there's a notification about low memory
situations to applications through system D-BUS broadcasts[1]. Delivering
information with D-BUS messages has obviously a latency issue if
application is gobbling memory very fast, but I guess it works acceptably.

[1]
https://stage.maemo.org/svn/maemo/projects/haf/trunk/libo...

Btw. I think for applications which keep large freeable memory caches i.e.
have a very active and complex memory allocation schemes, (Glibc) heap
fragmentation is at least as large problem as getting rid of their caches.
The caches should at least be large enough that their allocations have
been memory mapped (>= 128KB is the Glibc default) instead of coming from
heap and therefore they are returned back to system immediately when they
are freed, but with heap fragmentation you can only hope that allocation
re-ordering helps (after you've analyzed which allocations are the
problem).

KS2007: Memory management & application notification of low memory

Posted Sep 13, 2007 22:30 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

The glibc mmap threshold *minimum* is 128Kb. In recent versions of glibc
the threshold is dynamically adjusted between 128Kb and 512Kb (on 32-bit
boxes) and 64Mb (on 64-bit). (The mmap threshold starts at 128Kb and rises
whenever the application frees mmap()ed memory, so that transient
allocations tend to use brk() instead.)

(This is new behaviour in glibc 2.5.)

KS2007: Memory management & application notification of low memory

Posted Sep 16, 2007 19:43 UTC (Sun) by oak (guest, #2786) [Link] (1 responses)

> The glibc mmap threshold *minimum* is 128Kb. In recent versions of glibc
the threshold is dynamically adjusted [...] (This is new behaviour in
glibc 2.5.)

Can you still limit that to be smaller with MALLOC_MMAP_THRESHOLD_
environment variable? This would indicate that the limit just increases
and never decreases, regardless of the environment variable:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/malloc/...

However, in some environments and with some applications, the memory
fragmentation is actually worse problem than some performance decrease. If
a device (e.g. an embedded one where you cannot just add more RAM) runs
out of memory because of heap fragmentation, that's going to have much
more drastic effect on performance than e.g. threading scalability
mentioned here:
http://sourceware.org/bugzilla/show_bug.cgi?id=1541

KS2007: Memory management & application notification of low memory

Posted Sep 16, 2007 23:11 UTC (Sun) by nix (subscriber, #2304) [Link]

MALLOC_MMAP_THRESHOLD_ still works: it's precisely equivalent to calling
mallopt (M_MMAP_THRESHOLD, ...); setting any of these mmap parameters
disables dynamic adjustment completely.

KS2007: Memory management & application notification of low memory

Posted Sep 20, 2007 11:20 UTC (Thu) by renox (guest, #23785) [Link]

>>The claim is that application developers will tend to avoid application notification of low memory conditions as it would be a Linux-specific extension.<<

Note that a low memory condition doesn't have to be handled by the application developer to be useful:
http://lambda-the-ultimate.org/node/2391 points to an article where a researcher patched a JVM GC, and a Linux kernel(2.4) VM so that they discuss, with the result that they have a GC which works well even under memory pressure, all this without application developer lifting one finger :-)

KS2007: Memory management

Posted Sep 11, 2007 19:20 UTC (Tue) by bronson (subscriber, #4806) [Link] (3 responses)

> Linus cautioned that these system calls might seem like a nice idea, but that nobody would ever use them. In general, he says, Linux-specific extensions tend not be used.

Does any system currently offer a weak_free call? Sometimes Linux has to be the first to do something. Like FUSE, a pretty notable success. (I know Fuse wasn't the first userspace FS... it's the first general purpose API, though, developed on Linux at first and now being adopted by other OSes).

I can understand why weak_free isn't so attractive on desktop systems where stale data is just paged to disk but it seems really handy on smaller systems: MP3 players, video players, phones, etc. Like vomlehn says.

Shame I don't have time to write the patch. All I can do is say that, yes, I would indeed use it if it existed.

KS2007: Memory management

Posted Sep 14, 2007 2:17 UTC (Fri) by jzbiciak (guest, #5246) [Link] (1 responses)

FUSE is a bit different story. I don't need to patch Firefox and Emacs to use files in a FUSE filesystem. I do need to patch every large-memory-footprint application to use the proposed memory-pressure message interface.

Interestingly, HURD does its paging in user-space as I recall. Ah yes:

http://kilobug.free.fr/hurd/pres-en/abstract/html/node9.html

KS2007: Memory management

Posted Sep 21, 2007 20:00 UTC (Fri) by ch (guest, #4097) [Link]

The user space pager came from Mach. And may have come from someone else before that.

I may be one of the few people on the planet who wasn't a Mach kernel hacker who has written his own Mach pager. This was an attempt to convince Mach to play better with CMU Common Lisp.

-- Christopher.

KS2007: Memory management

Posted Sep 24, 2007 14:38 UTC (Mon) by stereodee (guest, #47703) [Link]

> Does any system currently offer a weak_free call?

How about madvise(2)?

"MADV_FREE - Gives the VM system the freedom to free pages, and tells the system that information in the specified page range is no longer important. This is an efficient way of allowing malloc(3) to free pages anywhere in the address space, while keeping the address space valid. The next time that the page is referenced, the page might be demand zeroed, or might contain the data that was there before the MADV_FREE call. References made to that address space range will not make the VM system page the information back in from backing store until the page is modified again."

http://www.gsp.com/cgi-bin/man.cgi?section=2&topic=ma...