Home > episodes > 2009/06/18 Linux Kernel Podcast

2009/06/18 Linux Kernel Podcast

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090618.mp3

Support for this Podcast comes from an unhealthy amount of coffee. Mine’s a double Americano, what’s yours?

For Thursday, June 18th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: The continuing 2.6.31 merge window, direct mmap for FUSE/CUSE, racing in TCP receive, problems with sys_mount(), and kernel.org front page kernels.

We’re playing catchup here, largely because this is the first merge window this podcast has had to cover and it takes, well, a certain mind set.

The Continuing 2.6.31 merge window

Dynamic per-cpu. Tejun Heo posted an updated per-cpu git tree for 2.6.31, that takes into account many of the recent per-cpu fixes (including dynamic allocation of per-cpu data). Linus objected to the tree on the grounds that it hadn’t been in linux-next, and had been created only moments before posting with (potentially) little time for test. Andrew Morten re-affirmed the lack of linux-next usage, adding ‘If this doesn’t mean “you missed 2.6.31″ then what does?” (he did also observe that there are some special cases such as this where some critical core kernel feature is modified and it’s not just “an ordinary old git merge like all the others”). The situation was clarified by Tejun: the git tree was being created from quilt patches that had been posted a number of times already, but there had been a glitch in the quilt import. He agreed that the lack of exposure in linux-next warranted delaying until 2.6.32 and stated that he would prep a tree for Stephen to pick up in linux-next soon.

Making executable pages the first class citizen. This podcast has covered this patch series several times before, but it is worth noting some feedback since this has now hit mainline, as Jesse Barnes pointed out. He found that one of his sample workloads went from creating an unusual machine to simply a slighlty sluggish machine. Fengguang Wu was happy to hear this, but keen to point out that Rik van Riel had also helped with his protecting active file LRU pages from being flushed by streaming IO. On a VM tangent, Fengguang Wu also posted in response to the ongoing HWPOISON patchset with a modified version of the “only early kill processes who installed SIGBUS handler” which only does so for processes that register an interest in doing so via a prctl. This allows applications to easily be modified, without breaking existing expectations of applications currently deployed in the field.

Fixing returng from kernel to tasks with a 16-bit stack. Alexander van Heukelum posted a detailed explanation and patch series, describing a bug in the kernel support (on x86 systems) for returning from the kernel into userspace tasks that use a 16-bit stack. Obviously, this doesn’t happen too often, but it does in emulation software such as WINE and dosemu. Due to a quirk in the manner in which an Intel processor restores state in such situations, only the lower 16 bits of the userspace stack pointer are preserved, while the upper 16 bits are kept from the kernel stack. The kernel has an existing special “espfix” segment that is abused to ensure that the upper 16 bits of the returning stack pointer will be correct, but this wasn’t always being setup correctly, especially not in a return from NMI.

Architecture updates include: microblaze (generic headers switch), and Super H fixes from Paul Mundt. On a tangent, it looks like John Williams (the author of the microblaze port has got a new .com email, possibly indicating a move)

Miscalleneous updates include: md updates from Neil Brown (including support for non-power of two chunk sizes in RAID0), ftrace updates from Steven Rostedt (including support for bypassing read locks inside the NMI handler – as you may know, Steven’s unique page swapping on read means we only need a lock on read, not on write to an active ring_buffer), a trivial documentation update to kthread_stop from Oleg Nesterov (reminding everyone that kernel threads can now call do_exit and be kthread_stop()ed, the two were previously mutually exclusive), cleanups to MAINTAINERS from Joe Perches, ext4 updates from Ted T’so, some relatively straightforward network stuff from David Miller (including wireless bits from John Linville, and bug fixes for NetXen and E100), and minimal HTC Dream Support (Google Andriod) via a reposted patch series from Brian Swetland (including some patches signed off by the somewhat quieter these days Robert Love).

Apologies to Gregory Haskins for not covering the latest iteration of his irqfd and eventfd work in detail, since it hasn’t changed hugely. But if you’d like to read about precisely how network packets are received and routed to KVM via vbus, take a look at the latest eventfd thread.

Non-merge specific concerns

Implementing direct mmap for FUSE/CUSE. Tejun Heo was busy today. In addition to posting per-cpu updates, he also posted the third version of a patchset implementing direct mmap support for FUSE/CUSE. This allows users of a FUSE filesystem to request an mmaped region, which will be satisfied on the backend by a kernel anonymous mapping, and still populated by the FUSE userspace server. The server gets to decide how mappings are shared so this has additional performance benefits for those implementing on FUSE/CUSE.

A rare race in TCP receive. Jiri Olsa posted to say that he had found a rare race in the TCP layer using a older RHEL4 kernel (that happens to be based upon 2.6.9, which is fairly long in the tooth). It turned out that, because of a missing smp_mb() and a combination of known errata in certain Intel CPUs, it was possible for tp->rcv_nxt updates made by one CPU to not propogate correctly to the others and result in a system sleeping forever. Jiri posted a patch citing the various errata, documentation, and including a fairly comprehensive analysis of the situation, although he said that he could not reproduce this upstream due to the rarity of its occurance.

Fixing an overflow in sys_mount(). Today’s tip of the hat goes to Vegard Nossum, who dilligently tracked down a bug reported by Ingo Molnar. It turns out that kernel code calling sys_mount() can be bitten by the fact that the aforementioned function will copy an entire page passed for the “type” parameter, even though less data is typically required for this string. If the content of the page happens to contain stray “wild” pointers, we might follow those and wreak some random havoc. Vegard (obviously) suggests stopping after we find the first NULL.

Finally today, Randy Dunlap resurrected an email thread from several weeks ago in which it was proposed that references to the old “mm” tree be removed from the front page of kernel.org. He added that 2.2 kernels might go the same way.

The latest kernel release is 2.6.30, which was released by Linus on June 9th.

Stephen Rothwell posted a linux-next tree for June 18th. Since Wednesday, the tree contains a few fixes, some conflicts due to deltas between Linus’ ongoing changes to his tree and developer trees, and the tree still fails to build in an allyesconfig build configuration for powerpc.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

  • Print this article!
  • del.icio.us
  • Facebook
  • TwitThis
  • Identi.ca
  • Digg
  • Google Bookmarks
  • Slashdot
  • RSS
Categories: episodes Tags:
  1. No comments yet.
  1. No trackbacks yet.