Archive

Archive for April 13th, 2010

2010/04/04 Linux Kernel Podcast

April 13th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100404.mp3

For the weekend of April 4th 2010, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: BKL, KVM, Networking, and recvmmsg.

BKL. In the latest round of Big Kernel Lock (BKL) removal discussion, Arnd Bergmann posted some patches to the TTY layer, noting that it was “one of the trick[ie]r bits in the BKL removal series, so let’s discuss it here”. Arnd’s code is similar to the earlier Big Kernel Semaphore (BKS) concept but it uses a Big TTY Mutex instead. This is based upon a mutex, not a semaphore, that does not autorelease on sleep, and is intentionally confined to TTY use. Alan Cox replied suggesting that he wasn’t too bothered if these patches went in because he was working to remove the need for giant locks whatever they happen to be called. So the Bit TTY Mutex may be a short lived piece in otherwise killing the BKL sooner than later. Having said that, Alan wanted to hold off a little while he took care of “low hanging fruit” first. Others agreed.

KVM. Jiri Kosina inquired about a kernel warning generated on 32-bit KVM guests when using an AMD guest CPU on an AMD host. The emulated guest CPU is an AMD model 2, stepping 3, which is one of the models AMD apparently explicitly did not support using in SMP configurations. Jiri wondered whether it was worth adding a specific hack for KVM (since SMP emulation does work), Andi Kleen suggested perhaps just killing the code that generates a warning on those systems as it is by now very old, while Andre Przywara really didn’t like removing the warning and favored simply emulating a better model instead. Pavel Machek agreed that emulating an explicitly SMP-capable CPU model was likely the solution.

Networking. Christoph Lameter inquired as to future network stack support for the PGM protocol (RFC 3208). Currently, there exists the openpgm implementation, which runs as a userspace application using raw sockets, but there are a number of limitations in so doing, not the least of which is a performance hit. Christoph feels that PGM belongs at the same level as both UDP and TCP support, though the conversation didn’t go much beyond discussing possible prototypes.

recvmmsg(). Linux 2.6.33 added a new system call called recvmmsg() that intends to complement recvmsg() in allowing for multiple packets to be received and processed at once, rather than performing one system call (or even more) per individual packet. Unfortunately for Brandon Black, who was trying to use this new feature in his DNS server implementation, calls to recvmmsg() on a blocking socket will result in the call blocking until the maximum requested number of packets are available, not just one single packet. Although Brandon says he is willing to work around this, he prefers a more configurable blocking behavior in use of recvmmsg(). Ulrich Drepper agreed; Brandon posted a patch.

In today’s miscellaneous items:

*). A couple of IDE reverts to deal with missing devices.

*). Some new cpu-hotplug wrapper functions (cpu_notify, __cpu_notify, and cpu_notify_nofail).

*). Some followup discussion on a new CPU flag bit on recent Intel CPUs that enables the CPU to declare that it explicitly has a synchronized TSC.

*). Some percpu module handling fixes for module static percpu from Tejun Heo.

*). An async firmware loading patch from Johannes Berg, intended to allow for non-blocking immediate rejection of unavailable firmware early during boot that is requested via request_firmare_nowait prior to boot completion.

*). Tilman Schmidt noted that CONFIG_PROVE_RCU is incompatible with proprietary kernel modules because it will result in the creation of a reference to a GPL only exported symbol even in modules that do not use RCU. He suggests that those building proprietary modules disable PROVE_RCU. Paul McKenney thanked him for sharing this solution with others who might be affected.

*). A fix for __module_ref_addr() use on stable kernels prior to 2.6.34 (where percpu use has been refactored) by Mathieu Desnoyers.

*). A scheduler bug present since November 12 2009 was identified in an email thread posted by Torok Edwin (and bisected by Mike Galbraith) in which use of latencytop results in the runtime of random tasks being set to really high values afterward due to the broken commit.

*). Version 10 of the “use lmb with x86″ patches was posted by Yinghai Lu. There was some further discussion about the plan to essentially replace e820 handling on x86 with a modified version of the Logical Memory Block code that will now be modified to support parsing e820 tables.

*). A small tweak to the ordering of TLB flushig on S4 resume for i386 via a patch from Shaohua Li.

*). A discussion started by Torok Edwin concerning 32-bit perf tracing with a 64-bit kernel. Torok had been slightly confused by needing to re-install perf for a 32-bit build and this lead Ingo Molnar to ponder whether it was time to have a variant of perf for each architecture variant built.

*). A nice summary of the various printk macros (pr_, dev_, netdev_, netif_, etc.) from Joe Perches after Neshama Parhoti asked about them.

*). A patch from Robert Schone modifying power_frequency events such that changing the frequency on another CPU results in it being traced rather than the CPU that initiated the frequency change operation.

*). A patch making it easier to disable fragmentation when doing PPP multilink from Richard Hartman. Apparently this reduces “packet loss and massive ping spikes” that are seen by Richard and others.

*). Lin Ming asked Corey Ashford whether he was still working on performance event support for “uncore” or “nest” CPU units (these are additional functional units on the same die as the CPU cores but not in-core). Corey said that he was not actively working on it but is working on nest events for IBM’s “Wire-Speed” processor using the existing infrastructure due to some time constraints. It looks like more will happen here in due course.

*). Some shadow page cache discussion for KVM MMU from Xiao Guangrong.

*). Some discussion between Peter Zijlstra, Rusty Russell and Tejun Heo concerning the latter’s “cpuhog” patches and the fact that Peter doesn’t like the name. Rusty on the other hand quite likes it, because “ugly things should have ugly names”. Tejun did propose an alternative set of names, including functions such as stop_cpu() and stop_cpus() but these don’t really stop CPUs, they hog them. So the CPU hog name is more apt.

*). Lee Schermerhor posted some comparitive benchmarks between a Red Hat 2.6.18 and upstream 2.6.32, 2.6.33 kernels showing recent upstream performance regressions. Plots: http://free.linux.hp.com/~lts/Pft/

In today’s announcements:

OSPERT 2010. Peter Zijlstra announced the official Call For Papers for the 2010 Operating System Platform for Embedded Real-Time applications conference. It is to be held on July 6th in Brussels, Belgium in conjunction with the 22nd Euromicro International Conference on Real-Time Systems, which happens between the 7th and the 9th of July also. Those working on embedded Real Time systems may find this particularly interesting. The paper deadline was April 4th.

Git 1.7.0.4. A maintenance GIT release was announced by Junio C Humano.

LTP. Rishikesh K Rajak announced that the Linux Test Project (LTP) for March 2010 has now been released. It includes some last minute fixes and is available at the usual sourceforge.net/projects/ltp location.

LTTng 0.208. Mathieu Desnoyers announced the latest LTTng release 0.208 for Linux kernel 2.6.33.2 is now available. It uses waits with msleep() in place of cpu_relax() in order to handle !PREEMPT uniprocessor (UP) configurations.

The latest kernel release was 2.6.34-rc3 during the time period covered by this podcast episode.

Greg Kroah-Hartman announced the release of stable series kernels 2.6.27.46, 2.6.31.13, and 2.6.33.2. Existing users of these stable kernels should upgrade.

Finally today, Jeff Merkey surfaced from wherever he’s been recently and let everyone know that he has been issued US patent number 7,684,347, which was noted seems to be simply an abstract “really fast” packet sniffer. Jan III Sobiesk suggested that someone should patent a “really fast operating system”. Jeff should have waited a few days for April 1st, the same day that the kernel.org website featured 180 degree (or pi if you prefer) rotated text on the main page – that wasn’t a hack, it was John and Peter showing some humor.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/03/28 Linux Kernel Podcast

April 13th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100328.mp3

For the weekend of March 28th 2010, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Filesystems, Interrupts, LMB vs. e820, Multitouch, PHY and phylib, the VM, and VMWare.

Filesystems. Josef Bacik posted a patch entitled “Introduce freeze_super and thaw_super for the fsfreeze ioctl”. In the patch, Josef notes that the existing fsfreeze code actually works too much at the block level, assuming every superblock is backed by a (typically a single) block device. For some modern filesystems – such as is the case with btrfs (Josef is a btrfs developer) – there can be a number of backing block devices, some of which may be added and removed while a filesystem is mounted. Consequently, Josef wishes to split out the freeze process to include dedicated superblock manipulating functions that don’t require the superblock s_bdev to be populated with one backing device. Al Viro had some typically useful comments about the patch, including some further followup to a reply by Nigel Cunningham containing some information about how TuxOnIce does filesystem freezing that Al was not too happy about.

Interrupts. Andi Kleen posted a patch entitled “Prevent nested interrupts when the IRQ stack is near overflowing”, in which he attempted to address the issue of too many IRQ vectors assigned to a given CPU all firing in rapid succession and causing the interrupt stack to overflow. Thomas Gleixner, in rejecting the patch first noted that Andi’s changelog was “utter nonsense” because it refered to interrupt nesting from same interrupt source rather than many vectors, and then noted that simply disabling further interrupts in such cases was not the correct solution. Thomas favored doing away with IRQF_DISABLED and instead finishing the task of converting to threaded IRQ handlers with the small hard handler always running with IRQs disabled, and he wouldn’t take the patch “unless you come up with a real convincing story”. Alan Cox wondered if there was “anyone [Thomas had] forgotten to offend”, to which Thomas responded matter of factly that he wasn’t sure since he hadn’t measured IRQ handler run times “for quite a while”. Linus first told Thomas he was “wrong” in always disabling interrupts, and then seemed to change direction, giving some comments on removing IRQF_DISABLED entirely.

LMB vs. e820. Two different mechanisms for accounting and tracking physical memory layout are in common use within the kernel. Intel (x86) systems use the Intel e820 BIOS provided tables (and support code with the same name) to track which memory ranges are assigned to particular uses, while other architectures – including SPARC, POWER/PowerPC – use LMB (Logical Memory Blocks). The latter was made an architecture independent library in 2008 and lives in lib/lmb.c. The fact that there are two different systems came to a head when Yinghai Lu posted an early_res patch aiming to move the more architecture independent pieces of the existing e820 code into fw_memmap.c. David Miller (the SPARC maintainer) did not like this, since he believed that Yinghai wasn’t listening to earlier advice that LMB provided all of the support in an indepedent fashion and should be adapted to replace the e820 bits instead. Thomas Gleixner added that, “All we get are some meager bones thrown our way”, and suggested that this wasn’t the best way to interact with the community. The thread started a mini-architecture flamewar with Ingo Molnar noting that he really wished “non-x86 architectures apprec[ia]ted (and helped) the core kernel work x86 is doing”, and Benjamin Herrenschmidt more than taking offense at this statement. But that aside, Ingo did point out that Yinghai had been doing a lot of very difficult work that was certainly of use, even if in the end another approach to unifying various bits of LMB and e820 is taken. Yinghai later posted a new patch series entitled “use lmb with x86″

Multitouch. Just in time for this author to buy a shiny new Macbook Pro that suffers from the same problem (and also uses the nouveau driver, that has had its own interesting ride recently), the discussion of multitouch finger tracking was raised again. Modern (laptop) hardware touchpads feature an ability to accurately track the position of multiple fingers at a time, and this allows for the kinds of gestures that are becoming popular today. At the same time, the X Window system that powers most graphical Linux desktops today has only minimal support and cannot handle such things as click and drag with two fingers. This means that your author has to use a custom hacked up mouse driver to support click and drag. I’m not the only one, and this prompted Henrik Rydberg to wonder recently whether it was time to add software finger tracking into the kernel. He pointed to an X.org discussion that had originally raised the idea back in summer 2009. Having discounted the idea then, he was now much more amenable to reconsidering. It seems likely that something will happen, it’s just a question of whether it will be directly in the input layer, in a new mtdev handler, or in an external library that is provided for userspace code to link against. In any case, your author is glad to see this in kernel, where it belongs.

PHY and phylib. Stefani Seibold posted in a thread entitled “fix PHY polling system blocking”, inquiring about the existing implementation for PHY link detection with MII (Media Independent Interface – the means through which network MAC chips communicate portably with various possible PHYs). The existing mechanism does not always use interrupts and can block for a few milliseconds (up to 4ms in one example with e100), while the chip that Stefani is using sees approximately 450us delay. Stefani made various proposals for adjusting the existing phylib, one of which was explicitly disliked by David Miller because it would break link-type changes.

VM. Mel Gorman followed up to a previous patch he had posted (in which he attempted to address some concerns with an IO intensive workload running with little available RAM that the VM may be calling congestion_wait in cases where something other than strict congestion is at fault) with some test results showing that the number of times kswapd and the page allocator have been calling congestion_wait and the time it spends in that function have been increasing since 2.6.29. Quoting Mel, “120+ kernels and a lot of hurt later;”. He posted very detailed test reproducer information, noting that the increase in calls to congestion_wait wasn’t due to any one change, and itemizing a few of the recent changes that have played a part. These include the TTY layer using higher order allocations more frequently, some CFQ fairness changes, and so on. He, Rik van Riel, Corrado Zoccolo, and Johannes Weiner bounced ideas around about the real reasons for performance regressions on the IO workload that was being tested. Simply adding more RAM was not the point.

VMWare. Dmitry Torokhov posted an RFC patch implementing a virtio extension for the VMWare balloon driver. Balloon drivers allow for virtualized guests to expand and contract their memory requirements at runtime, through a co-operative interaction with the hypervisor. In the case of VMWare, Dmitry says VMWare are interested in using the existing Linux virtio framework to communicate between Linux guests and the VMware hypervisor, but with a few tweaks – for example, their hypervisor may refuse to lock certain pages, or may (under certain circumstances) reset the balloon via a notification to the guest, without requiring the guest to explicitly notify on every page released back to the hypervisor as a consequence. Dmitry is interested in various other capabilities that could be exposed over virtio but is first interested to hear from the Linux community. So far that community is only represented in replies by Avi Kivity (KVM), who favors VMWare having their own balloon driver, or splitting out a shared “balloon core”.

In today’s miscellaneous items:

* Brian Gerst posted version 2 of a patch implementing merged fpu and simd exception handlers in one function.

* The final round of task_struct->signal stability cleanups from Oleg Nesterov.

* Support for nested pid namespaces from Serge E. Hallyn.

* A patch from Jason Baron implementing support for enabling the kmemleak checker and memory hotplug support simultaneously in the kernel config.

* Some changes to TAINT_ flag handling from Ben Hutchings (intended to distinguish non-harmful errors such as missing firmware from more serious issues that would tradionally have set the taint flag).

* Some work in progress discussion about reading remapped performance counters on x86 systems from Stephane Eranian (but the current patch breaks the already working implementation on POWER/PowerPC).

* The latest version (5) of the Memory Compaction patches from Mel Gorman.

* A patch allowing different tracers to be compiled intependently from Jan Kara.

* The latest version (5) of the Jump Label patches from Jason Baron.

* An ARM port of the Linux Checkpoint-Restart patches from Christoffer Dall

In today’s announcements:

The latest kernel release on the original date of this podcast was 2.6.34-rc2, which was released on March 19th. The current release is a higher revision.

Rafael J. Wysocki posted a list of reported regressions from 2.6.32 and 2.6.33 that were still possibly affecting 2.6.34-rc2.

Git 1.7.0.3. Junio C Hamano announced that version 1.7.0.3 of GIT is available. The latest release includes fixes for ACL support on the underlying filesystem, and various other fixes also.

IIO mailing list. Jonathan Cameron announced the creation of a new “Industrial input / output” mailing list since a lot of such discussions had been happening off list already. The new (majordomo) list is linux-iio@vger.kernel.org, and can be subscribed to via sending email to majordomo@vger.kernel.org as usual.

SystemTAP version 1.2. Frank Ch. Eigler announced the release of SystemTAP version 1.2 by posting some release notes. This includes various fixes for use with kernel version 2.6.9 from 2.6.34-rc.

util-linux-ng v2.17.2. Karel Zak announced version 2.17.2 of the util-linux-ng package. This is a bugfix release.

Sachin Sant reported a hotplug test failure on -rc2, and Rafael J. Wysocki posted a link to an existing patch that corrected the problem.

Frederic Weisbecker inquired as to whether anyone would mentor the Linux Wireless Google Summer of Code (GSoC) project, to which there were no replies. Therefore it seems that some folks at Portland State University will be asking around amongst the student population for interested parties.

Finally today, Michael Gilbert noted that CVE-2009-4537 had been publicly disclosed for a while but an official (non-vendor) fix was not upstream. Neil Horman said he would take care of making a posting about it, and he did post an official fix for the r8169 frame length error a few days later.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: