2010/04/04 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100404.mp3
For the weekend of April 4th 2010, I’m Jon Masters with a summary of today’s LKML traffic.
In today’s issue: BKL, KVM, Networking, and recvmmsg.
BKL. In the latest round of Big Kernel Lock (BKL) removal discussion, Arnd Bergmann posted some patches to the TTY layer, noting that it was “one of the trick[ie]r bits in the BKL removal series, so let’s discuss it here”. Arnd’s code is similar to the earlier Big Kernel Semaphore (BKS) concept but it uses a Big TTY Mutex instead. This is based upon a mutex, not a semaphore, that does not autorelease on sleep, and is intentionally confined to TTY use. Alan Cox replied suggesting that he wasn’t too bothered if these patches went in because he was working to remove the need for giant locks whatever they happen to be called. So the Bit TTY Mutex may be a short lived piece in otherwise killing the BKL sooner than later. Having said that, Alan wanted to hold off a little while he took care of “low hanging fruit” first. Others agreed.
KVM. Jiri Kosina inquired about a kernel warning generated on 32-bit KVM guests when using an AMD guest CPU on an AMD host. The emulated guest CPU is an AMD model 2, stepping 3, which is one of the models AMD apparently explicitly did not support using in SMP configurations. Jiri wondered whether it was worth adding a specific hack for KVM (since SMP emulation does work), Andi Kleen suggested perhaps just killing the code that generates a warning on those systems as it is by now very old, while Andre Przywara really didn’t like removing the warning and favored simply emulating a better model instead. Pavel Machek agreed that emulating an explicitly SMP-capable CPU model was likely the solution.
Networking. Christoph Lameter inquired as to future network stack support for the PGM protocol (RFC 3208). Currently, there exists the openpgm implementation, which runs as a userspace application using raw sockets, but there are a number of limitations in so doing, not the least of which is a performance hit. Christoph feels that PGM belongs at the same level as both UDP and TCP support, though the conversation didn’t go much beyond discussing possible prototypes.
recvmmsg(). Linux 2.6.33 added a new system call called recvmmsg() that intends to complement recvmsg() in allowing for multiple packets to be received and processed at once, rather than performing one system call (or even more) per individual packet. Unfortunately for Brandon Black, who was trying to use this new feature in his DNS server implementation, calls to recvmmsg() on a blocking socket will result in the call blocking until the maximum requested number of packets are available, not just one single packet. Although Brandon says he is willing to work around this, he prefers a more configurable blocking behavior in use of recvmmsg(). Ulrich Drepper agreed; Brandon posted a patch.
In today’s miscellaneous items:
*). A couple of IDE reverts to deal with missing devices.
*). Some new cpu-hotplug wrapper functions (cpu_notify, __cpu_notify, and cpu_notify_nofail).
*). Some followup discussion on a new CPU flag bit on recent Intel CPUs that enables the CPU to declare that it explicitly has a synchronized TSC.
*). Some percpu module handling fixes for module static percpu from Tejun Heo.
*). An async firmware loading patch from Johannes Berg, intended to allow for non-blocking immediate rejection of unavailable firmware early during boot that is requested via request_firmare_nowait prior to boot completion.
*). Tilman Schmidt noted that CONFIG_PROVE_RCU is incompatible with proprietary kernel modules because it will result in the creation of a reference to a GPL only exported symbol even in modules that do not use RCU. He suggests that those building proprietary modules disable PROVE_RCU. Paul McKenney thanked him for sharing this solution with others who might be affected.
*). A fix for __module_ref_addr() use on stable kernels prior to 2.6.34 (where percpu use has been refactored) by Mathieu Desnoyers.
*). A scheduler bug present since November 12 2009 was identified in an email thread posted by Torok Edwin (and bisected by Mike Galbraith) in which use of latencytop results in the runtime of random tasks being set to really high values afterward due to the broken commit.
*). Version 10 of the “use lmb with x86″ patches was posted by Yinghai Lu. There was some further discussion about the plan to essentially replace e820 handling on x86 with a modified version of the Logical Memory Block code that will now be modified to support parsing e820 tables.
*). A small tweak to the ordering of TLB flushig on S4 resume for i386 via a patch from Shaohua Li.
*). A discussion started by Torok Edwin concerning 32-bit perf tracing with a 64-bit kernel. Torok had been slightly confused by needing to re-install perf for a 32-bit build and this lead Ingo Molnar to ponder whether it was time to have a variant of perf for each architecture variant built.
*). A nice summary of the various printk macros (pr_, dev_, netdev_, netif_, etc.) from Joe Perches after Neshama Parhoti asked about them.
*). A patch from Robert Schone modifying power_frequency events such that changing the frequency on another CPU results in it being traced rather than the CPU that initiated the frequency change operation.
*). A patch making it easier to disable fragmentation when doing PPP multilink from Richard Hartman. Apparently this reduces “packet loss and massive ping spikes” that are seen by Richard and others.
*). Lin Ming asked Corey Ashford whether he was still working on performance event support for “uncore” or “nest” CPU units (these are additional functional units on the same die as the CPU cores but not in-core). Corey said that he was not actively working on it but is working on nest events for IBM’s “Wire-Speed” processor using the existing infrastructure due to some time constraints. It looks like more will happen here in due course.
*). Some shadow page cache discussion for KVM MMU from Xiao Guangrong.
*). Some discussion between Peter Zijlstra, Rusty Russell and Tejun Heo concerning the latter’s “cpuhog” patches and the fact that Peter doesn’t like the name. Rusty on the other hand quite likes it, because “ugly things should have ugly names”. Tejun did propose an alternative set of names, including functions such as stop_cpu() and stop_cpus() but these don’t really stop CPUs, they hog them. So the CPU hog name is more apt.
*). Lee Schermerhor posted some comparitive benchmarks between a Red Hat 2.6.18 and upstream 2.6.32, 2.6.33 kernels showing recent upstream performance regressions. Plots: http://free.linux.hp.com/~lts/Pft/
In today’s announcements:
OSPERT 2010. Peter Zijlstra announced the official Call For Papers for the 2010 Operating System Platform for Embedded Real-Time applications conference. It is to be held on July 6th in Brussels, Belgium in conjunction with the 22nd Euromicro International Conference on Real-Time Systems, which happens between the 7th and the 9th of July also. Those working on embedded Real Time systems may find this particularly interesting. The paper deadline was April 4th.
Git 1.7.0.4. A maintenance GIT release was announced by Junio C Humano.
LTP. Rishikesh K Rajak announced that the Linux Test Project (LTP) for March 2010 has now been released. It includes some last minute fixes and is available at the usual sourceforge.net/projects/ltp location.
LTTng 0.208. Mathieu Desnoyers announced the latest LTTng release 0.208 for Linux kernel 2.6.33.2 is now available. It uses waits with msleep() in place of cpu_relax() in order to handle !PREEMPT uniprocessor (UP) configurations.
The latest kernel release was 2.6.34-rc3 during the time period covered by this podcast episode.
Greg Kroah-Hartman announced the release of stable series kernels 2.6.27.46, 2.6.31.13, and 2.6.33.2. Existing users of these stable kernels should upgrade.
Finally today, Jeff Merkey surfaced from wherever he’s been recently and let everyone know that he has been issued US patent number 7,684,347, which was noted seems to be simply an abstract “really fast” packet sniffer. Jan III Sobiesk suggested that someone should patent a “really fast operating system”. Jeff should have waited a few days for April 1st, the same day that the kernel.org website featured 180 degree (or pi if you prefer) rotated text on the main page – that wasn’t a hack, it was John and Peter showing some humor.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

