SMP alternatives
The kernel has a number of ways of dealing with this challenge. In some cases it can make decisions at run time, using processor features only if they are found to be present. Other features are only available by way of build-time configuration options; selecting these will result in a kernel which will not run on older systems. Yet another mechanism is the "alternatives" feature, which allows the kernel to optimize itself at boot time. Consider this example of alternatives use (from include/asm-i386/system.h):
#define mb() alternative("lock; addl $0,0(%%esp)", \ "mfence", \ X86_FEATURE_XMM2)
This macro places a memory barrier in the code, ensuring that all memory reads and writes initiated before the barrier complete before execution continues. The default implementation is essentially a bus-locked no-op; it will work anywhere. On newer systems, however, the more efficient mfence instruction is available, and it would be nice to use it.
The alternative() macro compiles in the default code, but also makes a note of its location (and alternative implementation) in a special ELF section. Early in the boot process, the kernel calls apply_alternatives(), which makes a pass through that special section. Every alternative instruction which is supported by the running processor is patched directly into the loaded kernel image; it will be filled with no-op instructions if need be. Once apply_alternatives() has finished its work, the kernel behaves as if it had been compiled for the processor it is actually running on. This mechanism allows distributors to ship generic kernels which can optimize themselves at boot time.
The 2.6 mainline uses alternatives sparingly: for barriers, prefetch hints, and saving the floating point unit state. Gerd Knorr, however, believes that the use of alternatives could be expanded to further reduce the range of kernels which distributors need to ship - and to improve runtime flexibility as well. In particular, he thinks that kernels can be optimized for single- or multiprocessor systems on the fly.
Gerd's SMP alternatives patch is an implementation of this concept. It creates an new macro (alternative_smp()) which can be used to specify optimal implementations of an operation on both uniprocessor and SMP systems; the proper version will then be selected at runtime. The main use of SMP alternatives in his patch is with spinlock operations; spinlocks can be patched in or edited out, as dictated by the configuration of the system at boot time.
There are a couple of interesting features in Gerd's patch. One is in the handling of the i386 architecture's lock prefix. This prefix, when applied to specific instructions, causes the instruction to run in a bus-locked, atomic manner. It is used for operations which must be seen coherently across a multiprocessor system; these include semaphore operations and the atomic_t implementation. Use of the lock prefix on uniprocessor systems imposes a runtime cost with no benefit; it would be nice to edit those out. The SMP alternatives patch takes a shortcut here; it simply remembers each location where a lock prefix appears. If the kernel boots on a uniprocessor system, all of those prefixes can be quickly overwritten with no-ops.
A more interesting - and more controversial - feature of this patch is that, when the kernel is converted between the SMP and uniprocessor mode, the overwritten instructions are remembered. At some point the the future, then, the alternatives code can reverse the change, switching the kernel back to the full SMP implementation. The code is then run whenever a CPU hotplug event happens, optimizing the kernel for the system's new configuration. A system can be initially booted with a single processor, and the alternatives code will edit out all of the SMP-related instructions. If another processor is added later on, the kernel will be automatically converted back into a fully SMP-capable mode. If processors are removed, the SMP code can be taken out too. All within a running system, with no need to reboot.
This feature may seem useful to a rather small minority of users - and it is. But that minority may be bigger than one thinks. Virtualization systems (and Xen in particular) are implementing the ability to configure the number of (virtual) CPUs in each running instance on the fly, in response to the load on each. So it may really be that a busy, virtualized server will have CPUs hot-plugged into it, and that those processors will go away when the load drops. Enabling the kernel to reconfigure itself on the fly when this happens will allow each Xen instance to run a kernel which is optimized for its current situation.
The CPU hotplug may be a hard sell - self-modifying code in a running
kernel tends to make people nervous. The rest of the SMP alternatives
patch seems likely to find a place in the mainline, eventually.
Index entries for this article | |
---|---|
Kernel | Alternative instructions |
Kernel | Virtualization |
That's why I love Linux...
Posted Dec 15, 2005 3:19 UTC (Thu)
by sbishop (guest, #33061)
[Link] (5 responses)
Posted Dec 15, 2005 3:19 UTC (Thu) by sbishop (guest, #33061) [Link] (5 responses)
The difference between this idea and the way Windows works is incredible. Self-modifying/optimizing code versus "Another processor? Do you have a license for that?" Wow.
I've realized recently that Linux often benefits technically from its license. This idea is an example of that.
I've seen another example at work, where I maintain a Linux USB driver used internally. It's based off the example USB driver that's included in the Linux kernel code, and it's about 600 lines long. The corresponding example Windows driver that comes with Microsoft's DDK (Driver Development Kit) consists of multiple files and around 8,000 lines total. Why the difference? Because the Linux developers aren't tied to an ABI, or API, for that matter. They can adjust the interface between the kernel and drivers (because they have the source for those too) until they get it right. The Windows driver contains much that really ought to be built-in. A +10x difference in code size--that's huge.
And guess which is more than twice as fast than the other? :)
That's why I love Linux...
Posted Dec 15, 2005 10:35 UTC (Thu)
by Duncan (guest, #6647)
[Link] (4 responses)
I'm certainly no MSWormOS fan, and I like your point, but in actuality, Posted Dec 15, 2005 10:35 UTC (Thu) by Duncan (guest, #6647) [Link] (4 responses)
from what I've read, MS is one of the better proprietaryware venders in
regard to multicore, at least. While the likes of Oracle continue to bill
per core, MS has been looking pretty flexible by comparison, saying it
will continue to license per CPU, even as the number of cores per CPU
begins to climb.
As to the USB driver, if it's useful for you, unless you are developing it
for as yet unreleased hardware, it's almost certainly going to be useful
to others as well. The GPL doesn't require publishing code for internal
work, but consider the benefits of either /not/ having to do that internal
maintenance, as it's done for you by the kernel crew, or submitting it for
kernel inclusion and becoming the maintainer yourself, thus gaining
visibility and favorable press throughout the Linux world, /plus/ having a
bit of the work still done for you, when someone changes out kernel code
you depend on from under you.
Of course, you likely have your reasons for not doing so, but come on, you
gotta admit such a posting to LWN is just /begging/ someone to ask why you
haven't yet made source available! <g>
Duncan
That's why I love Linux...
Posted Dec 15, 2005 16:18 UTC (Thu)
by sbishop (guest, #33061)
[Link] (1 responses)
Posted Dec 15, 2005 16:18 UTC (Thu) by sbishop (guest, #33061) [Link] (1 responses)
Yea, I suppose that I should have seen that coming. :)
I work for a manufacturing company. The hardware is a tester that only we'd be interested in. And the driver really isn't far distanced from usb-skeleton.c, so it very interesting either.
That's why I love Linux...
Posted Jun 1, 2006 16:41 UTC (Thu)
by cventers (guest, #31465)
[Link]
Is it a tester that no one else on Earth owns? :)Posted Jun 1, 2006 16:41 UTC (Thu) by cventers (guest, #31465) [Link]
I read that GregKH re-emphasized at the recent FreedomHEC that the kernel
has drivers with only a handful of users. Perhaps mainline inclusion
isn't an impossible thing after all.
Internal-use kernel code
Posted Dec 16, 2005 16:51 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
You silently merge the idea of making source code available and of getting it in the kernel.org tree. You also seem to merge, as many people do, the idea of submitting code for inclusion and of that code being included.
Posted Dec 16, 2005 16:51 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)
There's a significant cost to getting code into kernel.org. I myself write lots of kernel code, and while the world is welcome to all of it and much of it is published, I have never attempted to get any of it into kernel.org.
First, I'd have to translate it to a coding style I don't like and package it according to some pretty specific rules. Then I'd have to run the gauntlet of some mailing list, probably having to rework the code a few times. Some of that rework would be stuff I don't agree with. Some would be stuff I have no use for. At no point would I have any guarantee my work would result in any code going in.
So I suffer the costs of being out of tree (mainly, I can't use the most current kernel.org code), but it beats the cost of getting in tree.
Internal-use kernel code
Posted Dec 16, 2005 19:42 UTC (Fri)
by Duncan (guest, #6647)
[Link]
Well, I'm aware of the difference, but didn't go to my usual lengths to Posted Dec 16, 2005 19:42 UTC (Fri) by Duncan (guest, #6647) [Link]
specify it. How come other folks can take shortcuts, but every time I
abridge a detail, I get called on it? <g> I suppose it's likely because
folks are used to me being so detailed, tho some might be just
coincidence.
Anyway, yeah, thanks for making the code available, even if it's not
targeted at the kernel tree ATM. That's an important right of Free
source, too, being able to NOT have to go for merge, if desired, tho
because it's Free source, others can take it and go for that merge if they
want to, again, another important right.
Thanks also for filling in those details. It's quite possible the
additional information will be of use to future readers, and I /did/ fail
to mention it, so it's good someone came along to fill the gap. =8^)
Duncan
SMP alternatives
Posted Dec 15, 2005 9:44 UTC (Thu)
by dw (guest, #12017)
[Link] (5 responses)
Please excuse my ignorance of kernel development, but I don't understand how this can be made to work safely. I am not sure under which conditions CPU hotplug events are processed, but I would imagine that at that time numerous locks in the kernel would be in the locked state.Posted Dec 15, 2005 9:44 UTC (Thu) by dw (guest, #12017) [Link] (5 responses)
If the CPU hotplug code causes the unlock functions to become a NOOP, then after the event has been processed, any scheduled tasks that perform an unlock of an existing lock will not cause any change in the state of the kernel.
If the kernel is then patched for SMP again, and numerous locks which were 'unlocked' by tasks running under the non-SMP kernel are actually still marked in memory as locked, would that not cause severe system instabilty or a deadlock condition?
Again, I only know the kernel by concept, and have never written a line of code for it. Excuse my ignorance. :)
SMP alternatives
Posted Dec 15, 2005 14:05 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link] (2 responses)
You are confusing mutual exclusion/semaphore locks with 'bus' locks. There is no 'unlock' operation involved here; the 'lock' prefix instruction being referred to only affects the instruction it precedes, and there is an automatic 'unlock' of the bus when that instruction completes.Posted Dec 15, 2005 14:05 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (2 responses)
Switching from uni- to multi-processor mode won't even require holding all the kernel threads/processes in an idle state while this happens, it would just have to complete all the instruction patching before any threads could be allowed to run on the new CPU.
This is a very, very cool idea :-)
SMP alternatives
Posted Dec 15, 2005 18:39 UTC (Thu)
by Ross (guest, #4065)
[Link] (1 responses)
That's not how I read it.Posted Dec 15, 2005 18:39 UTC (Thu) by Ross (guest, #4065) [Link] (1 responses)
"The main use of SMP alternatives in his patch is with spinlock operations; spinlocks can be patched in or edited out, as dictated by the configuration of the system at boot time."
It sounds like spinlocks will be turned into noops. This may be ok when going SMP->UP, but maybe not the other direction, and I wonder what kind of lock state would be retained when going SMP->UP->SMP...
SMP alternatives
Posted Dec 15, 2005 18:59 UTC (Thu)
by jzbiciak (guest, #5246)
[Link]
To enable hotplug SMP -> UP -> SMP, you'd definitely need to retain spinlocks, or at least put a lightweight "take lock" instruction there as opposed to the full spin. Fully NOPping spinlocks out would be a disaster, as you note. Posted Dec 15, 2005 18:59 UTC (Thu) by jzbiciak (guest, #5246) [Link]
Elsewhere, though, you could nuke/replace LOCK prefixes as needed.
Safety
Posted Dec 15, 2005 14:55 UTC (Thu)
by corbet (editor, #1)
[Link]
Changing the functioning of spinlocks could certainly create trouble if parts of the kernel are certainly holding locks! By my reading of the patch, there are a couple of defenses against that problem, though:
Posted Dec 15, 2005 14:55 UTC (Thu) by corbet (editor, #1) [Link]
- A kernel built with the SMP alternatives maintains the counters for spinlocks, so lock state should be preserved in all configurations, and
- The hotplug CPU code has to quiesce the system anyway, so no atomic code should be running while alternatives are being applied.
SMP alternatives
Posted Dec 25, 2005 19:46 UTC (Sun)
by efexis (guest, #26355)
[Link]
Um, my guess is that you'd stop processes from running on the second processor -before- switching down to a single processor kernel. As only one processor could then even be holding locks, the locking mechanism becomes irrelevant, so can be NOOPed.Posted Dec 25, 2005 19:46 UTC (Sun) by efexis (guest, #26355) [Link]
SMP alternatives
Posted Dec 15, 2005 12:35 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (3 responses)
Virtualization systems (and Xen in particular) are implementing the ability to configure the number of (virtual) CPUs in each running instance on the fly, in response to the load on each. So it may really be that a busy, virtualized server will have CPUs hot-plugged into it, and that those processors will go away when the load drops.
Posted Dec 15, 2005 12:35 UTC (Thu) by NAR (subscriber, #1313) [Link] (3 responses)
Why bother with hotplugging virtual processors? Isn't it simpler to add more resources (e.g. more CPU splices per second) on the host to the particular virtualized server?
SMP alternatives
Posted Dec 15, 2005 15:26 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (2 responses)
Yes, as long as you have more slices per second to give.Posted Dec 15, 2005 15:26 UTC (Thu) by bronson (subscriber, #4806) [Link] (2 responses)
Once you've maxed out a single processor, your only recourse is to run the VM on more processors. AFAIK there's no good way of doing this unless the VM is partitioned (parallelized?) as well. Multiple virtual CPUs is the best way of doing this.
But the VM doesn't care! why not always run it in a quad-cpu configuration? Because that wastes cycles if it's just being run on a single CPU.
That's just a guess. It seems an overcomplex solution to me but I haven't looked at the code yet.
SMP alternatives
Posted Dec 18, 2005 19:26 UTC (Sun)
by bk (guest, #25617)
[Link] (1 responses)
Can these processors be on different physical machines? This sounds like an implementation of SSI clustering, which is pretty cool.Posted Dec 18, 2005 19:26 UTC (Sun) by bk (guest, #25617) [Link] (1 responses)
SMP alternatives
Posted Dec 21, 2005 7:08 UTC (Wed)
by csamuel (✭ supporter ✭, #2624)
[Link]
No. Posted Dec 21, 2005 7:08 UTC (Wed) by csamuel (✭ supporter ✭, #2624) [Link]
Xen is still running on a single system and whilst you can migrate a
system between Xen servers (if you've got a shared filesystem that can
cope) it only runs on one or the other with a tiny pause for the actual
transition.
SMP alternatives
Posted Dec 15, 2005 20:14 UTC (Thu)
by captrb (guest, #2291)
[Link] (1 responses)
Posted Dec 15, 2005 20:14 UTC (Thu) by captrb (guest, #2291) [Link] (1 responses)
Can anybody speculate whether this functionality will make various
power management easier on SMP machines? It seems (without any
knowledge to back it up) that each CPU could be removed until there
was a single CPU left, then the same procedure that works on
uniprocessor machines could be performed.
It's a really shame, especially with dual core CPU's and
hyperthreading, that there is a choice between fancy power
management and multiple processors.
SMP alternatives
Posted Dec 25, 2005 19:51 UTC (Sun)
by efexis (guest, #26355)
[Link]
No. This is talking about removing code once you've already switched down to a single processor, and adding code when you want to switch up.Posted Dec 25, 2005 19:51 UTC (Sun) by efexis (guest, #26355) [Link]
Adding/removing processors at runtime, I believe, is already possible (otherwise this code would be pretty useless)
SMP alternatives
Posted Dec 16, 2005 17:38 UTC (Fri)
by norsk (guest, #30746)
[Link] (2 responses)
Back in 1998, while working at Novell on the Netware SMP project, I did the same thing on self-modifying kernel mods. The kernel was built in SMP mode and when installed on a UP system, all the "lock" instructions, SMP and atomics called their respective init routines to determine whether UP or SMP and applied the correct op-codes. Gave us 2-5% improvement and a major cost in shipping of different kernels.Posted Dec 16, 2005 17:38 UTC (Fri) by norsk (guest, #30746) [Link] (2 responses)
I like the idea of reversing the mods when a CPU hotplug event occurs. Hardware at my time did not have the feature set.
I was wondering why this had not yet happened before in Linux. Still a far better world to work in, then the Netware kernel.
doug "norsk" thompson
SMP alternatives
Posted Dec 27, 2005 17:13 UTC (Tue)
by cajal (guest, #4167)
[Link] (1 responses)
If I'm interpreting your post right, you're saying this is a bad thing. A 2-5% improvement is pretty negligible, and it came at a major cost. Sounds like this is something Linux should avoid.Posted Dec 27, 2005 17:13 UTC (Tue) by cajal (guest, #4167) [Link] (1 responses)
SMP alternatives
Posted Dec 27, 2005 23:17 UTC (Tue)
by turpie (guest, #5219)
[Link]
He meant that it saved the major cost of shipping different kernels, by allowing them to ship a single kernel for both UP and SMP.Posted Dec 27, 2005 23:17 UTC (Tue) by turpie (guest, #5219) [Link]
User base
Posted Dec 22, 2005 11:22 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (3 responses)
The potential user base is larger than hot-plugging SMP users and Xen users. The many of the next generation of laptops will have dual core CPUs, and it's likely to be desirable to shut down a core when on battery (for example).Posted Dec 22, 2005 11:22 UTC (Thu) by ringerc (subscriber, #3071) [Link] (3 responses)
A dual core system is essentially SMP... and laptops are very power / performance concious. It would not hurt in the slightest to eke out every bit of speed (or efficiency, ie potentially less power use) while on a half-clocked single-core CPU for battery saving, then safely switch to full-bore dual-core mode when on mains.
This patch seems like a really good way to handle that need.
User base
Posted Dec 22, 2005 11:59 UTC (Thu)
by rrw (guest, #9757)
[Link] (2 responses)
Posted Dec 22, 2005 11:59 UTC (Thu) by rrw (guest, #9757) [Link] (2 responses)
A dual core system is essentially SMP... and laptops are very power / performance concious. It would not hurt in the slightest to eke out every bit of speed (or efficiency, ie potentially less power use) while on a half-clocked single-core CPU for battery saving, then safely switch to full-bore dual-core mode when on mains.
OK, I have a problem with this. Why would I need to clock down a notebook on batteries and clock it up on mains?
When a Centrino notebook on mains works with full speed it soon gets so hot that the fan works non stop, and the keyboard itself gets annoyingly warm.
On the other hand, when notebook runs on battery and I want to do something cpu intensive it is painstakingly slow. But where's the power consumption advantage? I have to do this anyway, slower or faster, and if I do something with downclocked CPU, power consumption per unit of time in display, gpu, harddisk doesn't decrease, so I actually eat more juice.
Isn't it better to just use sth like powernowd, which dynamically clocks processor depending on system load, wheather you use mains or battery?
RobertCPU usage optimisation
Posted Mar 30, 2006 5:20 UTC (Thu)
by xoddam (guest, #2322)
[Link] (1 responses)
Exactly. The CPU configuration and speed should be adjusted according to Posted Mar 30, 2006 5:20 UTC (Thu) by xoddam (guest, #2322) [Link] (1 responses)
demand for processing power, not supply of electricity.
There is never a case for consuming more power just because you *can*.
Otherwise we'd all leave all our computers and all our lights switched on
all of the time. Bring on global warming.
CPU usage optimisation
Posted Jul 22, 2021 1:41 UTC (Thu)
by muzg666 (guest, #139506)
[Link]
Posted Jul 22, 2021 1:41 UTC (Thu) by muzg666 (guest, #139506) [Link]
Single kernel image for UP & SMP
Posted Dec 22, 2005 15:41 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (4 responses)
Could this patch cause removing CONFIG_SMP in future? With kernel image identical for SMP and UP machines and runtime patching?Posted Dec 22, 2005 15:41 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (4 responses)
Single kernel image for UP & SMP
Posted Dec 22, 2005 23:30 UTC (Thu)
by renox (guest, #23785)
[Link] (1 responses)
Well, as some others have remarked for this to work you have to remember which (spin)locks are taken, even on a UP processor thus loosing some performance in UP.Posted Dec 22, 2005 23:30 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)
I think that even a minimal loss would be a hard sell to kernel developpers considering how little this UP<->SMP feature is going to be used..
Single kernel image for UP & SMP
Posted Dec 25, 2005 19:59 UTC (Sun)
by efexis (guest, #26355)
[Link]
There's no reason why you would have to do that (I won't repeat my last post). The speed differences would be down to the processor having to deal with the few noop's, eating up slightly more L1/L2 cache, and keeping in memory (and the kernel image file itself) - for those who worry about that - code for both scenario's.Posted Dec 25, 2005 19:59 UTC (Sun) by efexis (guest, #26355) [Link]
Single kernel image for UP & SMP
Posted Jun 9, 2006 23:42 UTC (Fri)
by niner (subscriber, #26151)
[Link] (1 responses)
I can't imagine that. Think of embedded applications. When you only have 2MB of flash and a couple of megabytes of RAM, you really start to care about the few kilobytes such a feature costs of both memory footprints.Posted Jun 9, 2006 23:42 UTC (Fri) by niner (subscriber, #26151) [Link] (1 responses)
Single kernel image for UP & SMP
Posted Jun 11, 2006 16:14 UTC (Sun)
by tyhik (guest, #14747)
[Link]
RAM and flash ships get larger and larger all the time. For smaller design companies it may already be cheaper to get 4mb than 2mb flash chip.Posted Jun 11, 2006 16:14 UTC (Sun) by tyhik (guest, #14747) [Link]
But of course the code size matters. Smaller code implies better cache usage.