2009/08/10 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090810.mp3
For Monday, August 10th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: CGroups, Ftrace, Modules, RCU, Spinlocks, Swap, System Calls, TTY, and VM.
CGroups. Ben Blum posted version 3 of a 7 part patch series implementing support for a “cgroup.procs” file that allows the user to quickly display all the unique thread IDs in a particular cgroup, as well as move a collection of existing processes sharing the same thread ID into a particular cgroup.
Ftrace. John Reiser complained that recordmcount, which is run during kernel build against every .o object as a means to extract mcount data for use with the dynamic function patching code in ftrace can add many minutes to a full kernel compile. He suggests that the problem is in repeated calls to “ld -r”, which can be batched into one call based on the output from recordmcount or the other way around. Either way, he says, the data output is the same. He was concerned that his 900 line “recordmcount.c” replacement might be too long for the mailing list (perhaps he has not seen the size of some patches) but will likely be persuaded to send it if the developers are interested.
Modules. Eric Paris posted requesting thoughts on how permissions checks are currently implemented on request_module(), and if it makes sense. As he says, request_module() is used to request the kernel helper thread spawn out a modprobe userspace thread to do a module load. It is called in a number of places within the kernel (apparently, approximately 128 unique callsites) and only three check to see if the requesting process has some sort of module loading permissions (CAP_SYS_RAWIO). Amongst the suggestions, Eric would like to see the request_module() code perform this security check for itself. Also on the subject of modules, Ozan Caglayan posted version 2 of a recent patch implementing a fix in the markup_oops script that will use modinfo to lookup module information when the EIP within a oops is within a module that has a “-” instead of a “_”. This is a semi-frequent occurance with module naming, so should avoid confusion.
RCU. Martin Schwidefsky had posted on Friday evening concerning a 2.6.30 system that was hanging due to a bad interaction between RCU and NOHZ. Paul McKenney followed up today with a congratulatory reply saying, “Congratulations, Martin! You have exercised what to date has been a theoretical bug identified last year by Manfred Spraul. This fix is to swich from CONFIG_RCU_CLASSIC to CONFIG_RCU_TREE, which was added in 2.6.29″. Martin replies that SLES11 uses 2.6.27 and classic RCU, and he believes the bug is present there also, so therefore does need to be fixed. On a only maringally related tangential note, Martin also mentioned that he is working on NOHZ some more to improve delay performance by not having a CPU go fully tickless if it did some work in the last timer tick (which causes an unnecessary timer tick if the CPU goes truly idle, but generally he thinks will improve performance – and Martin requests comments on this approach from the wider LKML Congress).
Spinlocks. Heiko Carstens posted an RFC patch series allowing inlined spinlocks once again, since this apparently can lead to a 1%-5% speedup on some (s390 in this particular case) systems under certain workloads. The patch introduces CONFIG_SPINLOCK_INLINE as a conditional selector for this feature.
Swap. Nitin Gupta posted an RFC patch implementing a callback function whenever a swap slot is freed, for use on (in this example) systems with compressed RAM devices backing the swap device, allowing the memory to be instantly freed rather than when the “swap discard” bio is eventually processed by the block layer. Apparently, this is “essential” for the “compcache” project to which he posted a link.
System Calls. Jason Baron posted an interesting 12 part patch series implementing a runtime system call to name mapping function that allows one to pass a string representation of a system call and returns the ID of the call. Initially, it is for the syscall event tracer within ftrace, although one can imagine other projects would be interested in picking this up in-kernel.
TTY. The ongoing saga with the TTY layer came up again today (but only marginally). Artur Skawina noticed a ^S/^Q sequence resulted in data loss within his xterm. That seemed to be caused by a recent commit that had removed a check for tty->stopped in pty_write_buffer() for “no clear reason”, according to Linus Torvalds, who posted a patch that fixed the problem for Artur.
VM. Bill Speirs noticed a problem with VMA merging. The Linux VM uses VMAs (Virtual Memory Areas) to represent ranges of pages allocated to a task, complete with their protections and flags. A typical task has a number of different VMAs representing load code, library functions, program text, data, and so forth. Typically, the kernel will coalesce adjacent VMA regions if they share contiguous (virtual) memory and protection. However, in the case Bill cited, where he maps three pages with PROT_NONE and then sets the middle one to PROT_WRITE protection before setting it back, the kernel fails to reconcile these three pages back into a single VMA. This is not true if the same experiment is done using PROT_READ. Bill sees this issue because he is in reality mapping 200,000+ pages and rapidly changing permissions is causing him to exceed the max_map_count ulimit. This is worthy of investigation.
In today’s miscellaneous items: a power management fix (removing a run-time warning) from Rafael J. Wysocki, some performance counters fixes from Ingo Molnar (who states that he hopes it is still fine to make a few changes, but is willing to trim the patchset down to minimal changes if Linus prefers), the usual round of other updates from Ingo (x86, irq), some PCI fixes from Jesse Barnes, version 6 of a patch series adding trace events to the page allocator from Mel Gorman (who requests a “yey or nay” on whether these should be merged), a memory leak in security/selinux/hooks.c, identified by “iceberg” (which is about as useful as calling yourself only “debiandeveloper” or one of the many other nickname-only posters on LKML) and later patched in a posting from James Morris, version 2 of a VFS patch converting superblock s_maxbytes to an loff_t, a patch giving waitqueue spinlocks their own lockdep classes when they are initialized from init_waitqueue_head() from Peter Zijlstra by way of David Howells, who needed it to address a lockdep false positive situation in CacheFiles, a powerpc fix that allows “direct” DMA (non-iommu) to work for devices that have a < 32-bit DMA mask when the machine simply has no enough memory to go over the chip addressing limit from Ben Herrenschmidt, a patch implementing vhost, a kernel-level virtio server, from Micael S. Tsirkin, and a rethink of command line precedence on MicroBlaze.
Finally today, Ted T’so posted an update to the Kconfig description for EXT3_DEFAULTS_TO_ORDERED better explaining the tradeoffs in terms of journal options on ext3, which he says has been vetted by the developers as being more informative for users. Hopefully, some users will agree with that assertion.
The latest kernel release is 2.6.31-rc5, which was released over a week ago.
Matthias Dahl reported an oops in 2.6.31-rc5-git5 in kmem_cache_alloc and Eric Paris noticed a NULL pointer deference in kmemcheck in linux-next. There was also some whining that ARM doesn’t test with “randconfig” builds that often.
Stephen Rothwell posted a linux-next tree for August 10th. Since Friday, there are two new trees added – ide and hwpoison (the old ide became ide-current). The nfsd and drm trees gained conflicts, while the trivial tree lost its conflict. Given the two new tree additions, there are now 140 sub-trees. Stephen reminded Andi Kleen (author of HWPOISON) that linux-next is intended only for patches “destined for the next merge window”, which Andi affirmed.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.










