2009/06/16 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090616.mp3
Correction: Due to an editing error, the June 16th edition of the LKML Podcast incorrectly stated that Pekka Enberg was the driving force behind a push for GFP_BOOT. In fact, Nick Piggin is the primary push behind that, while Pekka has stated several times that he is in fact comfortable with either approach.
For Tuesday, June 16th 2009, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: the continuing 2.6.31 merge window, bulk CPU hotplug, and interrupts during pagefault.
The Continuing 2.6.31 merge window
Kernel development times. Greg Kroah-Hartman and Luis R. Rodriguez had an exchange of emails concerning Luis’ new rc-series and merge window docs. Greg questioned Luis’ figures for previous kernel release dates and development times, and Luis ultimately accepted Greg’s version of events. In Greg’s figures, over the last 10 kernel cycles, the minimum development time between kernels was 68 days (2.6.20) and the maximum was 108 (2.6.24). This places the next kernel release sometime in the early days of September.
Early SLAB allocation. Nick Piggin and Ben Herrenchmidt continued a debate between themselves (with occasional others chipping in) concerning whether it was appropriate to introduce special “boot time” versions of the kmalloc and vmalloc function calls (or more specifically, adding special boot time GFP_ flags that should be passed if an allocation might take place early in boot). Ben Herrenschmidt pointed out that there are many points at which allocations might happen and we wouldn’t think to use special flags – for example, even during suspend/resume one might be trying to perform a memory allocation that blocks pending IO to a disk that has already since gone offline. Ben pointed out that, in such cases, it’s far more likely to work out for the best if infrastructure components automatically degrade such that (for example) kmalloc automatically uses GFP_NOIO once suspend has started.
USB. Greg Kroah-Hartman posted a large number of updates via his git tree and requested Linus merge. Amongst the updates were USB 3.0 support (see Sarah Sharp’s blog posting for the details), various new drivers, Unicode bugfixes, power management, and core code cleanups. There were a few non-USB related patches that the tree depends upon but these had all received the blessings of those subsystems affected. Greg also posted a series of driver core patches – most of which were minor in nature – and these included API cleanups and documentation.
Btrfs. Chris Mason followed up again concerning changes to the physical on-disk format for Btrfs, noting that newer kernels (those post 2.6.30) will roll forward existing filesystems to a format not supported by older kernels. In order to help developers who might be using Btrfs, Chris posted some rescue disk images based upon the Arch Linux 2.6.30 distro to his kernel.org pages. These contain enough filesystem checking tools to repair damage, as well as git, gcc, make, and enough to compile a kernel. Separately the Fedora folks posted to fedora-devel announcing that rawhide will be picking up the format change in due course, and reminding everyone that breakage is entirely possible, and that Btrfs won’t be ready for prime time for a year yet. Also on the filesystem front came some minor updates for OCFS2 from Joel Becker (although he noted that these were almost entirely fixes), and David Howells posted some updates to the AFS filesystem support code.
Kdump crashkernel breakage. Chris Wright pointed out that a recent change to CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN will impact those who follow the documentation to use their (relocatable) kdump crashkernel loaded with a 64MB or 128MB window at 16MB. Doing so will now interfere with the stock kernel because it has moved from the old default physical start of 2MB. Chris suggests that the problem here is that the documentation needs to be updated reflecting this change since 2.6.30, but sought input.
Adding formatting to WARN(). Linus Torvalds and Ingo Molnar debated Arjan van de Ven’s idea of adding “\n” formatting to the WARN() macro, for the ease of formatting in kernel log files (and less corruption to logs posted on kerneloops.org). Linus liked the idea as much as Ingo, but he felt that blanketly applying formatting to all users would adversely affect existing “naked” printk’s at this point, and he didn’t much like the idea of forcing those users to migrate to using KERN_CONT explicitly. So, in true Linus style, Linus wrote a cunning macro that tries to do the right thing, only adding a “\n” if a KERN_xyz level is included at the start of the string, and changing the implementation of KERN_CONT so that it can still be used for continuation.
On a related note, Mike Frysinger posted an RFC patch series implementing a series of useful new functions for printk()ing during initcalls. Rather than simply using printk() directly, these wrappers – which include, for example, pr_info_init() and pr_cont_init() (for printk continuation) – cause the accompanying string to be stored in a separate ELF section of the kernel linked binary image(s), so they can be unloaded aswell as the initdata.
Also on the printk() front, Dave Young posted a generic version of the previous printk delay implementation for use during normal system operation. So, Linux now has an ability to insert delays between printk() messages on boot, on halt, and during normal operation with the use of a sysctl. This is specifically intended for certain kinds of embedded (and also similar) systems where it might not be easy to capture kernel output without a delay insertion.
Architecture updates include: Power management updates for s390, Blackfin, and SPARC. The latter gained dynamic per-cpu allocator support, and a new syscall. Jeremy Fitzhardinge posted some minor io_apic cleanups for x86 which he had noticed while pursuing his Xen work, these included further 32/64-bit merge fallout, loop restructuring, and comment fixing.
Non-merge specific concerns
Bulk CPU Hotplug support. Gautham R Shenoy posted an RFD patch series aimed at opening discussion surrounding the best way to move forward from the current CPU hotplug implementation. The current code allows one to online and offline a single “CPU” at a time, but this “CPU” might in fact be part of a multi-core processor or even larger package, where performing a whole series of CPU Hotplug events to take down the package is much slower than need be. Gautham posted some benchmarks (for PPC64 systems) and a fairly detailed proposal in which one could echo comma separated lists of CPUs to online or offline as a unit via the /sys/devices/system/cpu/online and /sys/devices/system/cpu/offline sysfs entries.
Interrupts during page fault (to trap or not to trap?). As part of a thread entitled “perf_count: x86: Fix call-chain support to use NMI-safe methods”, Ingo Molnar, Mathieu Desnoyers, and others engaged in a lively discussion surrounding the overhead of disabling interrupts during page faults and re-enabling them afterward (an cli/sti cycle doesn’t come free). Currently, Linux uses x86 architecture “interrupt gates” rather than “trap gates” in order to ensure interrupts are disabled starting from the moment that a page fault condition is generated. This is in order to prevent the Intel archictectural “CR2″ control register from being “messed up” by other subsequent interrupts. But if this register state is saved on the kernel within the IRQ handler instead, then the overhead (in this case of a special purpose register – SPR – write) is moved from the page fault handler having to disable/enable interrupts into the interrupt handler, which will now have to write to CR2 under certain circumstances. Ingo performed various benchmarks and agreed with Mathieu that this was an overall win due to the order of magnitude more page faults than interrupts likely on a typical x86 system.
In today’s announcements: lio-utils v3.0 configfs HOWTO for v2.6.30. Nicholas A. Bellinger announced a new HOWTO for Linux-iSCSI.org Target v3.0 users.
The latest kernel release is 2.6.30, which was released by Linus last Tuesday.
Stephen Rothwell posted a linux-next tree for June 16th. Since Monday, the kmemleak tree was removed (since it had served its purpose of testing the newer kmemleak patches), the tree continues to fail to build in an allyesconfig powerpc build time configuration, and a large number of other trees lost conflicts as the merge process continues. The total tree count is now down to 128 sub-trees, with the removal of kmemleak contributing to that.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

