S2E1 – 2023/01/21

Prologue

This is the pilot episode for what will become season 2 of the Linux Kernel Podcast. Back in 2008-2009 I recorded a daily “kernel podcast” that summarized the happenings of the Linux Kernel Mailing List (LKML). Eventually, daily became a little too much, and the podcast went weekly, followed by…not. This time around, I’m not committing to any specific cadence – let’s call it “periodic” (every few weeks). In each episode, I will aim to broadly summarize the latest happenings in the “plumbing” of the Linux kernel, and occasionally related bits of userspace “plumbing” (glibc, systemd, etc.), as well as impactful toolchain changes that enable new features or rebaseline requirements. I welcome your feedback. Please let me know what you think about the format, as well as what you would like to see covered in future episodes. I’m going to play with some ideas over time. These may include “deep diving” into topics of interest to a broader audience. Keep in mind that this podcast is not intended to editorialize, but only to report on what is happening. Both this author, and others, have their own personal opinions, but this podcast aims to focus only on the facts, regardless of who is involved, or their motives.”

On with the show.

For the week ending January 21st 2023, I’m Jon Masters and this is the Linux Kernel Podcast. 

Summary

The latest stable kernel is Linux 6.1.7, released by Greg K-H on January 18th 2023.

The latest mainline (development) kernel is 6.2-rc4, released on January 15th 2023.

Long Term Stable 6.1?

The “stable” kernel series is maintained by Greg K-H (Kroah-Hartman), who posts hundreds of patches with fixes to each Linus kernel. This is where the “.7” comes in on top of Linux 6.1. Such stable patches are maintained between kernel releases, so when 6.2 is released, it will become the next “stable” kernel. Once every year or so, Greg will choose a kernel to be the next “Long Term Stable” (LTS) kernel that will receive even more patches, potentially for many years at a time. Back in October, Kaiwan N Billimoria (author of a book titled “Linux Kernel Programming”), seeking a baseline for the next edition, asked if 6.1 would become the next LTS kernel. A great amount of discussion has followed, with Greg responding to a recent ping by saying, “You tell me please. How has your testing gone for 6.1 so far? Does it work properly for you? Are you and/or your company willing to test out the -rc releases and provide feedback if it works or not for your systems?” and so on. This motivated various others to pile on with comments about their level of testing, though I haven’t seen an official 6.1 LTS as of yet.

Linux 6.2 progress

Linus noted in his 6.2-rc4 announcement mail that this came “with pretty much everybody back from winter holidays, and so things should be back to normal. And you can see that in the size, this is pretty much bang in the middle of a regular rc size for this time in the merge window.” The “merge window” is the period of time during which disruptive changes are allowed to be merged (typically the first two weeks of a kernel cycle prior to the first “RC”) so Linus means to refer to a “cycle” and not “merge window” in his announcement.

Speaking of Linux 6.2, it counts among new features additional support for Rust. Linux 6.1 had added initial Rust patches capable of supporting a “hello world” kernel module (but not much more). 6.2 adds support for accessing certain kernel data structures (such as “task_struct”, the per-task/process structure) and handles converting C-style structure “objects” with collections of (possibly null pointers) into the “memory safe” structures understood by Rust. As usual, Linux Weekly News (LWN) has a great article going into much more detail.

Ongoing Development

Richard Guy Briggs posted the 6th version of a patch series titled “fanotify: Allow user space to pass back additional audit info”, which “defines a new flag (FAN_INFO) and new extensions that define additional information which are appended after the response structure returned from user space on a permission event”. This allows audit logging to much more usefully capture why a policy allowed (or disallowed) certain access. The idea is to “enable the creation of tools that can suggest changes to the policy similar to how audit2allow can help refine labeled security”.

Maximillian Luz posted a patch series titled “firmware: Add support for Qualcomm UEFI Secure Application” that allows regular UEFI applications to access EFI variable via proxy calls to the “UEFI Secure Application” (uefisecapp) running in Q’s “secure world” implementation of Arm Trustzone. He has tested using this on a variety of tables, including a Surface Pro X. The application interface was reverse engineer from the Windows QcTrEE8180.sys driver.

Kees Cook requested a stable kernel backport of support for “oops_limit”, a new kernel feature that seeks to limit the number of “oopses” allowed before a kernel will “panic”. An “oops” is what happens when the kernel attempts to access a null pointer reference. Normal application software will crash (with a “segmentation fault”) when this happens. Inside the kernel, the access is caught (provided it happened while in process context), and the associated (but perhaps unrelated) userspace task (process) is killed in the process of generating an “oops” with a backtrace. The kernel may at that moment leak critical resources associated with the process, such as file handles, memory areas, or locks. These aren’t cleaned up. Consequently, it is possible that repeated oopses can be generated by an attacker and used for privilege escalation. The “oops_limit” patches mitigate this by limiting the number of such oopses allowed before the kernel will give up and “panic” (properly crash, and reboot, depending on config).

Vegard Nossum posted version 3 of a patch series titled “kmod: harden user namespaces with new kernel.ns_modules_allowed syscall”, which seeks to “reduce the attack surface and block exploits by ensuring that user namespaces cannot trigger module (auto-)loading”.

Arseniy Lesin reposted an RFC (Request For Comments) of a “SIGOOM Proposal” that would seek to enable the kernel to send a signal whenever a task (process) was in danger of being killed by the “OOM” (Out Of Memory) killer due to consuming too much anonymous (regular) memory. Willy Tarreau and Ted Ts’o noted that we were actually essentially out of space for new signals, and so rather than declaring a new “SIGOOM”, it would be better to allow a process to select which of the existing signals should be used for this process when it registered to receive such notifications. Arseniy said they would follow up with patches that followed this approach.

Architectures

On the architecture front, Mark Brown posted the 4th version of a patch series enabling support for Arm’s SME (Scalable Matrix Extension) version 2 and 2.1. Huang Ying posted patches enabling “migrate_pages()” (which moves memory between NUMA nodes – memory chips specific to e.g. a certain socket in a server) to support batching of the new(er) memory “folios”, rather than doing them one at a time. Batching allows associated TLB invalidation (tearing down the MMU’s understanding of active virtual to physical addresses) to be batched, which is important on Intel systems using IPIs (Inter-Processor-Interrupts), which are reduced by 99.1% during the associated testing, increasing pages migrated per second on a 2P server by 291.7%.

Xin Li posted version 6 of a patch series titled “x86: Enable LKGS instruction”. The “LKGS instruction is introduced with Intel FRED (flexible return and event delivery) specification. As LKGS is independent of FRED, we enable it as a standalone feature”. LKGS (which is an abbreviation of “load into IA32_KERNEL_GS_BASE”) “behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache.” This means that an Operating System can perform the necessary work to context switch a user-level thread by updating IA32_KERNEL_GS_BASE and avoiding an explicit set of balanced calls to SWAPGS. This is part of the broader “FRED” architecture defined by Intel in the Flexible Return and Event Delivery (FRED) Specification.

David E. Box posted version 2 of a patch series titled “Extend Intel On Demand (SDSi) support, noting that “Intel Software Defined Silicon (SDSi) is now known as Intel On Demand”. These patches enable support for the Intel feature intended to allow users to load signed payloads into their CPUs to turn on certain features after purchasing a system. This might include (for example) certain accelerators present in future chips that could be enabled as needed, similar to how certain automobiles now include subscription-locked heated seats and other features.

Meanwhile, Anup Patel posted patches titled “RISC-V KVM virtualize AIA CSRs” that enable support for the new AIA (Advanced Interrupt Architecture), which replaces the legacy “PLIC”, and Sia Jee Heng posted patches that enable “RISC-V Hibernation Support”.

Final words

A number of conferences are returning in 2023, including the Linux Storage, Filesystem, Memory Management, and BPF (LSF/MM/BPF) Summit, which will be held from May 8 to May 10 at the Vancouver Convention Center. Josef Bacik noted that the CFP was now open.

Don’t forget to give me your feedback on this pilot episode! jcm@jonmasters.org.

Leave a Reply

%d bloggers like this: