| |
Subscribe / Log in / New account

Considering kernel pass-through interfaces

By Jonathan Corbet
September 20, 2024

Maintainers Summit
The kernel normally sits firmly between user space and the system's peripheral devices, and provides a standard interface to those devices. At times, though, a more direct interface to a device is desired — but such interfaces can be controversial. At the 2024 Maintainers Summit, the assembled developers considered a specific case — the proposed fwctl subsystem — as well as the role of such drivers in general.

Fwctl comes out of a longstanding disagreement over a driver initially called mlx5ctl; its purpose is to allow a user-space utility to adjust any of hundreds of tunable parameters within mlx5 devices (which implement InfiniBand, RDMA, and other protocols). Proponents say that this interface is necessary to configure the device properly; opponents say that it is a way to bypass the kernel and that device-independent interfaces should be designed for this task instead. The discussion went on over the course of a year or so, with no resolution in sight.

The session was led by Dan Williams and fwctl developer Jason Gunthorpe. Williams started by saying that he had two objectives in mind. One was to try to heal the community; he has been watching two developers he respects in conflict and the conversation deteriorating over time. Additionally, there were junior developers finding themselves caught in the middle who might just end up leaving the community. But Williams also had a more personal goal: he, too, is working on an API for device provisioning and feels the need to allow direct access to vendor commands to get the job done.

The argument against fwctl, he said, is that this kind of configuration should be done with the network subsystem's devlink interface instead; fwctl is seen as a shortcut to avoid creating a proper interface. Allowing fwctl, it is feared, would reduce the motivation to improve devlink. Proponents of fwctl, instead, say that devlink does not provide the needed functionality, and that insisting on it is forcing these interfaces to be maintained outside of the mainline instead.

Inspired by lockdown

Gunthorpe explained that the kernel lockdown feature disables access to /dev/mem, which is the interface by which these devices had traditionally been controlled. This interface provides direct access to the system's I/O memory, which presents no end of potential security problems. Fwctl is an attempt to keep the configuration software working in a locked-down world. He is well familiar with the devlink interface, but there is no standard behind the configuration of these complex devices; every manufacturer creates its own set of knobs that is hard to bring into a common interface.

Even so, the original plan for mlx5ctl had been to use devlink, but that quickly led to having to argue about 300 different parameters, each of which had to be standardized and approved independently. Additionally, devlink does not provide access to debugging information, while mlx5ctl and fwctl provide access to that sort of data. There is no alternative proposal out there, he said, that provides the needed access.

Devlink, he said, has its origins in the "WiFi debacle", where every driver provided a different configuration interface. It cleaned up that situation and led to the conclusion that providing direct access to device-configuration interfaces was a mistake. That conclusion made sense in the networking context, but there are no standards for these complex, server-oriented devices, so devlink is not a good fit.

Williams said that he would like the group to agree that non-generic device commands exist, and that users need access to those commands. For the CXL devices he works with, the policy has been that no such vendor commands would be enabled, that any needed functionality must be incorporated into the standards instead. "That has happened zero times", he said; making it harder to solve problems does not force vendors to come and talk to us. A different policy is called for here, he said. Gunthorpe added that generic interfaces are the right solution for WiFi configuration, but they are a poor match to key-value stores buried in device firmware.

There are, Williams continued, classes of device commands that the kernel just does not care about. These include device-specific configuration and access to debugging information. He asked the group to agree that, in such cases, the kernel should not stand in the way; the alternative is that vendors just push distributors to ship out-of-tree code instead.

Security boundaries

Dave Airlie worried that such interfaces could facilitate the compromise of the whole system; almost all firmware has that ability (by writing arbitrary system memory, for example) somewhere. Since locked-down systems are involved, security is clearly a concern; given the low quality of most firmware implementations, he is worried about providing this kind of access. Will vendors provide assurances that their systems cannot be used to compromise the kernel?

$ sudo subscribe today

Subscribe today and elevate your LWN privileges. You’ll have access to all of LWN’s high-quality articles as soon as they’re published, and help support LWN in the process. Act now and you can start with a free trial subscription.

Gunthorpe replied that devlink can be used now to flash new firmware, so all bets are off when using that interface too. Linus Torvalds, more bluntly, said that "lockdown is a joke", a public-relations feature that lets kernel developers make a show of being careful. With regard to fwctl, he said that he does not see what the argument is about; forcing the use of devlink has no benefits for either the kernel or the hardware vendors. He added that, if somebody is running as root, they can do what they want with the hardware; "we are not doing DRM".

Ted Ts'o said "if you don't trust the hardware, you might as well just go home". Arnd Bergmann asked why users wanting to run these configuration utilities don't just turn off lockdown. Gunthorpe answered that customers (especially governments) often require it to be enabled. Fwctl was created partially to support this kind of deployment; it includes a long list of rules on what commands are allowed to do, so that they do not compromise system security.

Damien Le Moal agreed that existing commands provide plenty of ways for somebody to damage their system; if they do so, it is their fault. But, he said, a device driver's job is to configure hardware properly; he wondered why this additional interface was needed. Gunthorpe answered that modern devices are hugely complex and must be configured to work within the environment in which they are used. Vendors can ship special configurations suited to the needs of the largest customers — the Googles and Metas of the world. Smaller customers, though, must customize their devices in the field; that is where this kind of interface is needed.

Kees Cook agreed that, in the end, there is no alternative to trusting the hardware. Existing ways of configuring hardware using /dev/mem are opaque; the fwctl interface is better, he said. Gunthorpe agreed that fwctl is far better for the reverse-engineering of hardware; it also makes it easier for the kernel to block or modify specific commands if they turn out to be problematic. Williams added that vendors can normally be trusted to abide by the system's security boundaries.

Ts'o pointed out that the SG_IO operation (which allows arbitrary commands to be sent to SCSI devices) has been supported by the kernel since the early days; it provides the same capabilities as fwctl. Perhaps, he said, SG_IO is only "grandfathered because CD burners", but it would be good to know what the policy is. Having distributors applying out-of-tree patches seems like a worse outcome than just including fwctl, he said.

Gunthorpe said that he is trying to create a general policy for this kind of interface. Torvalds said that users have to be able to run device-specific commands; SG_IO is a good example of the sort of capability that the kernel has always supported. It is fine for the kernel to apply a root-only policy to commands it does not recognize, but he has no interest in saying that the owner of a machine cannot manage their hardware.

The rule, he said, is that developers should try to prevent each device from implementing its own pass-through command; instead, there should be some sort of baseline (such as fwctl) for this access. Permissible commands should only change the device in question, but touch no other part of the system. There should be "no random DMA" operations. In the end, though, the kernel has to trust the hardware.

Will Deacon asked how fwctl would interact with common interfaces; will it be possible to spot commands that are common between devices and standardize them? Gunthorpe answered that common interfaces provide a better interface for users. Some of that needs to be provided by user-space utilities, though; it is easier to create shared interfaces there.

Cook said that having arbitrary applications accessing device memory via /dev/mem is bad; there is no way for the kernel to impose a policy in that setting. Torvalds answered: "we call that X11". Cook said that he wants to see documentation of the commands supported by a device and what they do. Airlie answered that there will only be a single command, analogous to SG_IO, that passes operations through to the device. Torvalds said that this interface is still better than using /dev/mem.

Cross-subsystem disagreement

Williams raised another aspect of this debate — that of cross-subsystem nacks. The fwctl patches have been blocked by the networking subsystem maintainer, even though that work is not a part of his domain. How far, Williams asked, can a maintainer's veto extend beyond their own subsystem? Airlie answered that this was Torvalds's problem. Gunthorpe said that there is a precedent for how to respond to this sort of block: the RDMA subsystem was started as a response to the blocking of support for TCP offload engines in the networking subsystem. That block was the right decision for networking, but there is still a place for TCP offloading in the kernel. RDMA is widely used and supported by open-source software now; it is, he said, a great success story. Fwctl could be a similar story.

There seemed to be a clear consensus in the room that work on fwctl should proceed and find its way into the mainline; Gunthorpe was asked about how he planned to do that. He answered that he likes to see three independent users before merging a new subsystem; that helps to show that the interfaces are correct. A proposed third fwctl user had showed up in his inbox that morning, adding to the existing RDMA and CXL users. For now, he plans to focus on getting the drivers into good shape, and expects to send out a pull request in roughly six months. An implementation will ship to Mellanox users sooner, though.

Six months may seem like a long time; Gunthorpe said that he has been taking it slowly and carefully because of the pushback he has been receiving. He respects the people who have opposed this work and wants to show that respect by doing the job properly. Even so, he expects the nack from the networking subsystem to persist; it is "the right position" for that subsystem to have, he said, but would like to have some peace and get this work done. The session closed with him saying that he would have further discussions with the developers involved.

Index entries for this article
KernelDevelopment model/Driver merging
KernelDevice drivers
ConferenceKernel Maintainers Summit/2024


One configuration per boot

Posted Sep 21, 2024 3:02 UTC (Sat) by proski (subscriber, #104) [Link] (2 responses)

That's an interesting story, thank you!

I wonder if anyone considered limiting the configuration commands to a single time window per system boot. That is, the system boots, the configuration utility opens the link to the device, configures the device and closes the link. Nobody can reopen the link as long as the system stays up. An attempt to exploit the interface would require the system to be rebooted, which would be visible to admins and could even thwart certain attacks. Networking should not be enabled until the hardware is fully configured.

That would be similar to loading the initrd and the device tree once per boot.

One configuration per boot

Posted Sep 21, 2024 20:06 UTC (Sat) by champtar (subscriber, #128673) [Link]

If the interface is also used for debugging any locking mechanism would need to be optional

One configuration per boot

Posted Oct 12, 2024 11:21 UTC (Sat) by sammythesnake (guest, #17693) [Link]

This sounds to me like it fits firmly into the "policy belongs in user space" bucket - expose a knob to turn the interface off until next boot* and let whoever administers the system decide whether/when to poke it...

* There may be use cases for a reversible one, too, protected by suitable access control

Why not use the existing firmware loader?

Posted Sep 21, 2024 17:06 UTC (Sat) by mfuzzey (subscriber, #57966) [Link] (1 responses)

>Vendors can ship special configurations suited to the needs of the largest customers — the Googles and Metas of the world. Smaller customers, though, must customize their devices in the field; that is where this kind of interface is needed.

Wouldn't the current kernel firmware loading mechanisms be enough for this? AFAIK nothing says that the blobs that a kernel driver can obtain from request_firmware() have to be actual firmware code they could be site specific configuration data.

This probably wouldn't work for more dynamic situations but maybe it would be a way to provide configurability without allowing "too much" bypassing of the kernel using vendor specific opaque operations for core functionality.

Why not use the existing firmware loader?

Posted Sep 23, 2024 15:37 UTC (Mon) by jgg (subscriber, #55211) [Link]

I've been against abusing the existing devlink flash in this way. It is not intended to configure the device, and I think trying to tunnel dynamic configuration in that interface would destroy community trust. Though several people have suggested the same idea quietly..

Device complexity

Posted Sep 22, 2024 7:01 UTC (Sun) by marcH (subscriber, #57642) [Link] (2 responses)

> But, he said, a device driver's job is to configure hardware properly; he wondered why this additional interface was needed. Gunthorpe answered that modern devices are hugely complex and must be configured to work within the environment in which they are used.

The term "device" has become somewhat misleading. A "device" sounds smaller and simpler than the CPU that "drives" it - because devices all used to be. But today a GPU has more TFLOPS drawing more power than the CPU. It may even run more code.

The Central "Processing" Unit is increasingly looking like a central.... "network hub"? You wouldn't want a hub requiring very fine-grained knowledge of the traffic going through it.

Device complexity

Posted Sep 22, 2024 8:39 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> The Central "Processing" Unit is increasingly looking like a central.... "network hub"? You wouldn't want a hub requiring very fine-grained knowledge of the traffic going through it.

This sounds like we're moving to the mainframe model, where the CPU is one of the most under-powered parts of the system. Making it gate-keep everything else sounds like a great recipe for making linux obsolete.

Cheers,
Wol

CPU as gatekeeper

Posted Sep 23, 2024 0:04 UTC (Mon) by ringerc (subscriber, #3071) [Link]

Have been for decades.

SCSI, MMU, IOMMU, DMA, nVME, packetized PCIe, it's all been a trend toward the CPU handling access control, configuration and coordination.

With a few unfortunate missteps like some FireWire implementations' "plug in a device and do whatever you want with host host memory" it has worked well.

Food for thought

Posted Sep 26, 2024 3:59 UTC (Thu) by milesrout (subscriber, #126894) [Link]

Thanks Jonathan, food for thought here definitely.

Conversely, it's the "hardware" that does not trust Linux

Posted Oct 27, 2024 7:26 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Kees Cook agreed that, in the end, there is no alternative to trusting the hardware.

Not only that, but conversely: the "hardware" / firmware does not trust Linux. This great talk describes how the rest of the system increasingly isolates Linux in its Application Process "sandbox".

https://www.usenix.org/conference/osdi21/presentation/fri...

"Powerful" can mean two totally different things: GFLOPS or security permissions. Neither implies the other :-)


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds