2009/07/26 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090726.mp3
For the weekend of July 26th 2009, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: Fanotify, IPI, MM, Mutexes, and Scheduling.
Fanotify. Eric Paris posted to let everyone know (in a thread entitled “fanotify – overall design before I start sending patches”) that he plans to begin sending patches for fanotify in the next week or two. Meanwhile, he would like to receive any helpful comments about the existing design. As he notes in his lengthy email on the subject, fanotify is a notification system originally inspired by the anti-malware vendor’s perceived need to monitor and “approve” certain file operations before they are allowed to proceed. Eric has done a great job of moving this forward from the abstract concept it once was.
IPI (Inter-Processor-Interrupts). Xiao Guangrong posted a patch implementing a lockless version of call_function_data. This is used as part of the generic IPI signalling as part of the core Linux SMP support and presumably represented a bottleneck for Fujitsu. Andrew Morton says it looks good to him, and I have no further information about what this buys in real world performance.
MM. David Rientjes posted a patch that was a variation on the previous theme of inherited oom_adj (”badness”) values for tasks. Rather than have tasks simply inherit the oom_adj value, as required by the existing implementation (in which the value is literally shared between tasks during a clone), David introduces a new oom_adj_child (/proc/pid/oom_adj_child) that can (only) be increased from userspace to make cloned threads more likely to get killed.
[Correction: I mean one doesn't have multiple simultaneous holders of a mutex, since it's not a "counting lock", but I seem to have phrased this badly]
Mutexes. The Real Time patch performs various changes to the core kernel as it converts spinlocks into sleeping mutexes and generally converts semaphore users into mutexes aswell (since mutexes are essentially a special case of semaphore in which the lock is either held or not, but without the additional potential for multiple waiters). Thomas Gleixner and others have done a good job (as part of their work on the Real Time patchset) at locating the obvious mutex candidates and converting them over in the patchset, but there still remain various examples in the unpatched upstream kernel of obvious mutex conversion candidate locks. Thomas posted a 37 part patch series aimed at implementing the switch for many of these example locks. He requested one of the patches be merged immediately, since it is merely preparatory pre-.32.
Scheduling. Sen Wang posted a complaint about the Real Time scheduler algorithm. Apparently, when enabling rt_bandwidth (throttling), Sen is surprised (and deems rediculous) to see the idle task getting picked. But Peter Zijlstra points out that it is sometimes necessary to give other tasks a little time to run, and that this is why the throttle was implemented (I believe it’s around 5%) once Real Time tasks exceed a certain threshold. And, as Peter points out, if one doesn’t like the threshold, it can be disabled. Linux Weekly News previously covered this topic on their Kernel Page in greater detail, back when it was first introduced (unless I’m crazy).
In today’s miscellaneous items: some ALSA updates (Takashi Iwai), some ISDN updates (Karsten Keil), a potential SLQB allocator locking bug (Sebastian Andrzej Siewior), a potential serial USB regression (”Hartmut”, who goes by the email address of “e9hack” and doesn’t include a full name), version 2 of the previously mentioned patch series implementing userspace MMU mapping change event notification, version 7 of the IO scheduler IO controller patches (Vivek Goyal) – which includes a group_idling feature similar to CFQ’s slice_idle that is intended to aid with fairness and a lot of other changes, a fix to stop tracing in oops_enter() (Steven Rostedt), another suggestion that built-in modules are included in /sys (Tomas Carnecky), some v4l/dvb fixes (Mauro Carvelho Chehab), a number of consolidated kmemleak updates (Catalin Marinas) that are getting ready for the next merge window, a question concerning ext4 online debugging (Clemens Eisserer), some S390 patches (Martin Schwidefsky), wall time support for the ring_buffer (Zhao Lei), version 2 of the patch series previously covered implementing uid mount options for ext2/3, take two of the FAT root timestamp patches, version 3 of the kcore cleanup patches (Kamezawa Hiroyuki), and the addition of an EXPORT_SYMBOL fro kmap_atomic_prot as required by TTM (Thomas Hellstrom). Tejun Heo was unsurprised to learn that the patches he had previously posted – and explicitly said he was unable to test on real hardware – converting IA64 to dynamic per-cpu allocation did indeed not boot on real IA64 systems.
Finally today, Laurent Pinchart would like some advice concerning the preference for using kmap vs. kmap_atomic, and in particular the pressure placed upon the VM by the possibilities. He could use kmap outside of interrupt context, which is expensive (but needed infrequently), or repeatedly use atomic mappings from within the interrupt itself, brief in duration. He is concerned that keeping many pages kmap()ed for a long time is unplesant but perhaps less so that calling kmap_atomic 4500 times per second for a 640×480fps video stream. Perhaps some folks will offer him advice.
In today’s announcements: Linux 2.4.37.4. Although many have long since moved to 2.6, the venerable 2.4 series kernel remains widely used (especially in older embedded systems), and as such Willy Tarreau does a great job maintaining it for its users. In the latest release, a build error is fixed, NULL pointer security issues with mmap_min_addr are discussed (in the announcement), and various other minor fixes are provided also. Willy notes that the security fixes only really guard against faulty setuid root tasks, since only suitably privileged tasks can map the zero page in any case.
Containers version 0.6.3. Daniel Lezcano posted to let everyone know about the latest release of the Linux container “lxc” tools.
Git version 1.6.4.rc3. Junio C Hamano posted RC3 of the Git 1.6.4 release. The impending release was already covered in the last podcast.
Man Pages. Michael Kerrisk announced the release of the kernel manpages version 3.22 thereof. Since you might read his blog, I would like to also draw attention to his forthcoming book (No Starch Press) on the Linux kernel-userspace and glibc APIs. Watch for that in 2010.
The latest kernel release is 2.6.31-rc4, which was released by Linus last week. A number of regressions have been reported (including in tools), so it seems unlikely that we’ll be ready for a .31 final yet.
Rafael J. Wysocki took a break from being merely awesome to be more awesome in compiling a list of existing regressions between 2.6.30 and 2.6.31-rc4. The total number of reported regressions is generally increasing (a bad sign), having doubled over the past month, of which more than half an unresolved. Most of these are driver problems (as perhaps expected), however there are various core kernel concerns in there also. These include a boot failure (Gene Haskett), various suspend/resume problems, more tty layer instabilities, another lockdep limit hitting bug (didn’t we just raise the limit?), and a VM problem. And those are just the regressions Rafael posted without patches, there are a number of other issues for which patches are known to exist.
Greg Kroah-Hartman pushed out another round of stable kernel updates (2.6.30.3 and 2.6.27.28), which aim to resolve a boot problem some have experienced. On a related note John Hawley noted that requests to update the front page of the kernel.org web pages were worthwhile, especially in pointing out “long term” releases (such as .27) however the scripts currently used are aging and nobody has had enough time recently to overhaul those and deal with the other recent activities. One would generally assume geodns is more useful right now but obviously these scripts will get fixed up in due course.
Stephen Rothwell posted a linux-next tree for July 24th. Since Thursday, the tree still fails to build in an allyesconfig build configuration on powerpc, the sound tree lost its conflict, and the ttydev tree lost its build failure but gained another for which a patch was applied. The current sub-tree count in the latest compose remains consistent at 134 trees.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

