2009/06/21 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090621.mp3
This podcast is brought to you in part by way too many California strawberries.
For the weekend of June 21st 2009, I’m Jon Masters with a summary of the weekend’s LKML traffic.
In today’s issue: The continuing 2.6.31 merge window, the “Ceph” distributed filesystem, IO scheduler based IO controllers, poisonous hardware, transcedent memory, and ksplice tainting.
The continuing 2.6.31 merge window
Core kernel. Ingo posted a few updates to the core kernel. Amongst these was a bugfix developed in collaboration with Thomas that included a new function named get_user_writeable for use by the futex code (which can’t rely upon the existing access_ok for private futexes). A dialog ensued between Linus, Ingo Molnar and Thomas Gleixner concerning use of get_user_pages_fast() in this code, which Linus pointed out could be replaced with a single instruction on Intel-esque systems at any rate.
DRM. Dave Airlie posted a final drm tree for 2.6.31. Amongst the major changes was a switch in the AGP code to use arrays of pages instead of arrays of unsigned long. Quoting Dave, “since pageattr grew patch array interfaces this is possible and should solve GEM on PAE issues”.
KVM Support for 1GB pages. Joerg Roedel posted version 3 of a patch series that gives KVM the ability to support 1GB pages. This relies upon nested paging support, a feature of modern CPUs which behaves very similarly to an additional level in the global page table hierarchy. The patch series relies upon exporting vma_kernel_pagsize to modules.
Per-cpu. Ingo Molnar responded to yesterday’s “percpu for 2.6.31″ pull request posted by Tejun Heo (that had gotten slightly warped in the posting and caused Linus to be slightly unhappy), pleading with Linus and company to reconsider taking the per-cpu changes due to the fact that the patches had been posted in a timely fashion, and the sheer amount of work Tejun will be committed to if he must maintain them for yet another cycle (170 files worth of changes).
Performance counters. Paul Mackerras noted that architectures like PowerPC64 define __u64 to be unsigned long rather than unsigned long long, which causes compiler warnings every time one prints such a value with the print format string of %Lx. To correct this, Paul posted a patch to these userspace tools providing their own implementation of the definition of types such as u64.
RCU. Paul E. McKenney posted version 8 of his “big hammer” expidited RCU grace periods patchset. This patchset uses the existing per-CPU migration kthreads, which are awakened in a loop and waited for in a second loop, in order to expidite the passage of an RCU grace period. Apparently, this patchset can reduce RCU grace periods to 40us on an 8-CPU POWER machine.
Syscall tracepoints. While it is yet to be decided exactly when Jason Barron’s proposed syscall tracepoints will make it in, Li Zefan did use the opporunity to discover a bug in seqfile handling in the kernel trace infrastructure for which he posted a series of patches.
David Miller noted that stack backtrace support had broken sometime in the past day or so, which Stephen Rothwell was already aware of. Stephen forwarded a patch from Mike Frysinger that fixed it, which was also good news for Ingo.
Miscellaneous updates include: MMC updates (Pierre Ossman), Cryptography (Herbert Xu), ALSA (Takashi Iwai), NFS (Trond Myklebust, including support for version 4.1 of the NFS standard), Watchdog (part 2, apologies for not having space to mention part 1 yesterday), the usual level of tree posting insanity from Ingo (IRQs, scheduler – including another attempt to hide runqueues from those that would poke at them, timers, tracing, and x86), IDE (Bartlomiej Zolnierkiewicz), input updates (Dmitry Torokhov) and some kbuild fixes from Sam Ravnborg.
Architecture updates include: PowerPC (Benjamin Herrenschmidt), Blackfin (Mike Frysinger), and Microblaze (fixing a build problem caused by the previous round of Microblaze architectural updates).
Non-merge specific concerns
Ceph distributed filesystem client. Sage Weil posted a 21 part patch series implementing a “Ceph” distributed filesystem client, in the staging tree. “Ceph” is apparently a distributed filesystem designed for reliability, scalability, and performance, which relies on btrfs underneath. It features the usual kinds of things – data replication, no single points of failure, and fast recovery from node failures, although the fact that it’s only just going into the “staging” tree obviously means you shouldn’t rely on this client for critical stuff at this point. Separately, Greg posted a large number of changes to Linus for the “staging” tree (and by large, we mean 658 files changed, 165585 insertions, and 240493 deletions). Quoting Greg, “We are removing more crap than we are adding, looks like progress to me!”.
IO Scheduler based IO Controller. Vivek Goyal posted version 5 of his IO scheduler IO controller patchset. This patchset aims to introduce an ability to assign and control IO bandwidth consumed by tasks through IO throttling. A number of additional changes have been made since version 4, but this are mostly fixes and it looks like the patchset is stabilizing now.
Poisonous Hardware. Fengguang Wu posted version 6 of his HWPOISON patchset. This version has many of the changes discussed previously in this podcast. Included amongst those are the switched default to “late” kill except for those processes that have specificially requested an “early” kill via a per-process tunable option, as proposed by Nick Piggin and Hugh Dickens. Other changes include killing off the “uevent” emission idea, tainting the kernel on posioned page detection, and not “mess”ing with dirty/writeback pages for now.
Transcendent memory (”tmem”). Dan Magenheimer posted a 4 part patch series (first as an email attachment, then as a normal series), implementing what he described as “tmem” for Linux. Essentially, this is support for transient memory of a “dynamically variable size”, addressable only indirectly by the kernel, and which might disappear without warning. It may seem (on the face of it) to have little utility, but the application is in virtual machines (or other non-virtualized environments, including hotplug memory, SSDs, page cache compression, and even highmem on non-highmem kernels and using space VRAM) being provided with memory for cacheing (and similar purposes) that might be taken away at any moment without any warning. Since it requires kernel assistance, it’s application is mostly for in-kernel caches. The patch series is fairly comprehensive, and there will be a talk on the design on the first day of the 2009 Linux Symposium in Montreal, Canada.
Finally today, the ksplice guys requested a new TAINT flag so those loading ksplice updates into their kernels would be able to detect this easily (especially vendors of those concerned). Peter Zjilstra objected on the grounds that ksplice isn’t upstream, although it does still seem (to this author) that it would be a worthwhile thing to have in mainline anyway.
The latest kernel release is 2.6.30, which was released by Linus on June 9th.
Stephen Rothwell posted a linux-next tree for June 19th. Stephen added one fix (for symbol checking, affecting ARM), and noted that Linus tree gained a build failure due to a compiler bug (for which he reverted the offending commit). A few other trees lost conflicts, and the tree continues to fail to build for those seeking an allyesconfig build configuration on PowerPC. The total number of sub-trees remains steady at 128 again today (apologies for missing the total in yesterday’s summary podcast).
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

