Home > episodes > 2009/09/17 Linux Kernel Podcast

2009/09/17 Linux Kernel Podcast

October 3rd, 2009 jcm Leave a comment Go to comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090917.mp3

For Thursday, September 17th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: devtmpfs, performance counters, ummunotify, VFS, and VMI.

Devtmpfs. Eric W. Biederman posted a patch entitled “Remove broken by design and by implementation devtmpfs maintenance disaster”, which was bound to get some attention in the process. Devtmpfs is a recent effort from Kay Sievers (and several others) to implement some of the better minimal features of an in-kernel udev-style device tmpfs for pre-udev or non-udev environments. It isn’t intended to be a replacement for a userspace device filesystem (udev came about following previous attempts at in-kernel filesystems such as Richard Gouch’s devfs), but can help with initial device node population. Eric wasn’t buying it though, and criticized devtmpfs for breaking tmpfs, not handling errors, being in the wrong kernel tree location, having an incorrect Kconfig, and fundamentally having a “bogus” justification for existing. He (Eric) believes that it doesn’t solve any hotplug problems (which can be addressed – in his opinion – with a static dev), and that doing it in userspace is not slower but is more flexible. He also takes issue with the way devtmpfs was developed and the fashion in which it was merged, especially suggesting that review issues were “dismissed, ignored, or met with lies”.

Kay answered each of Eric’s points calmly and with a carefully reasoned response, which included pointing out devtmpfs isn’t strictly a filesystem (it just populates a tmpfs superblock, which is why it doesn’t live in fs/), isn’t intended to exist for speed purposes (he asks Eric to re-read the archive), and includes another explanation of how static /dev filesystems are unreliable and unpredictable and why this is useful, given that it is being proposed by precisely the same people who develop the userspace tools and generally prefer to have as much hotplug functionality in userspace as they can. Overall, Kay’s response is very much worth reading for anyone else who is confused about the raison d’etre for devtmpfs. Greg’s reply was much shorter (and very much not liked by Eric, who said he wasn’t “relevant to this discussion”). Greg wondered aloud why Eric didn’t see fit to CC both him and Kay on the original message, saying “if I was a paranoid person, I would think that you were somehow trying to skirt around us for some unknown reason”. He defered to Kay’s response (which Eric thought showed how Kay “wasn’t paying attention”) for the remaining technical justifications.

Alan Cox agreed with Eric’s assertion that someone in the filesystem camp should sign off on devtmpfs. He prefers that someone to be Al Viro (as Eric had suggested), and is less concerned about arguments of “devfs2″ than whether any implementation of an in-kernel device filesystem is technically correct and “doesn’t screw stuff up”.

Performance Counters. Ingo Molnar announced a new utility forming part of the ever-growing family of “perf” tools. “perf sched” is a utility to “capture, measure and analyze scheduler latencies and behavior”. It is intended to provide hard data to meet the “ambitious goal” of using performance events to objectively characterize arbitrary workloads from a scheduling and latency point of view. Using “perf sched”, one can record and visualize various aspects of scheduling workloads such as latencies and context-switches. Ingo includes several full examples, documentation, and a new branch on his “tip” tree that he requests Linus consider merging into the tree immediately. Ingo would clearly prefer this utility for more “apples to apples” type of comparisons between future competing scheduler implementations.

ummunotify. Peter Zijlstra followed up to the recent request that ummunotify be pulled in reviving a previous comment (apparently from Aton Blanchard) that this stuff “might be integrated with perf-counters” (since performance counters already features mmap() tracking and provides events through an mmap buffer). It’s a stretch, but one can see where Peter is coming from without going too far down the “let’s put everything in perf. counters” road. Still, Roland Dreier didn’t think it was a good fit to be integrating these. As he puts it, he’s trying to solve the problem of allowing an app to request notification when a (small) subset of its address ranges are backed by mappings that then become invalidated, whereas performance counters doesn’t provide a mechanism to track individual ranges (only all mmap() traffic). Peter thinks this could still be added to performance counters though.

VFS. Jan Kara posted version 3 of a 7 part RFC patch series entitled “Improve VFS to handle better mmaps when blocksize < pagesize”. This is intended to solve problems that arise with mmap()’d writes when the blocksize is less than the pagesize. Jan explains that we would like to use page_mkwrite() to allocate such blocks (the filesystem can return a page fault in certain error situations), but cites a situation where only one block is allocated for a page that – on later write – suddenly needs additional blocks allocated, that we ideally should have allocated ahead of time). So far, apparently ext2 and ext4 have “survived some beating”, so Jan is seeking further comments.

VMI. Alok Kataria posted to let everyone know that the folks at VMWare have been performing experiments to compare the performance of VMware’s paravirtualixation technique (VMI) with modern hardware MMU technologies in recent Intel and AMD processors, on VMware’s hypervisor. They found that in most of the benchmarks, the hardware EPT/NPT technologies are at par or provide better performance than using the older VMI approach. For this and other reasons explained in the email, VMWare have decided to discontinue support for VMI and they request comments on how best to go about “retiring” the VMI code from mainline Linux in due course. Various others were appreciative of the heads up and supportive of removing code that won’t be supported in future.

In today’s pull requests: a series of tracing fixes for 2.6.32 from Steven Rostedt, some MFD updates for 2.6.32 from Samuel Ortiz (requested twice, the second time without one of the drivers being incorrect), some FUSE updates from Miklos Szeredi, some Blackfin patches from Mike Frysinger, a request to pull the async_tx tree in order to receive dmaengine, async_tx, and RAID6 updates from Dan Williams (the Intel one), some networking and SPARC updates from David Miller (including John Linville’s latest wireless updates), some sound fixes from Takashi Iwai, some further tracing updates from Steven Rostedt, some timechart patches from Arjan van de Ven, some libata updates from Jeff Garzik, some further tracing updates from Ingo Molnar, some scheduler updates from Ingo Molnar (which include the remainder of the rework on the load balancer rewrite, but which are very new and potentially have some risk), some additional x86 fixes from Peter Anvin, and some x86/mce fixes from Peter Anvin.

In today’s miscellaneous items: a Makefile patch from Caveh Jalali fixing build problems for external modules on certain architectures, ongoing discussion of mm-of-the-moment (mmotm) merge plans for 2.6.32, a series of USB console fixes for 2.6.32 from Jason Wessel, version 2 (and then 3, and then 4) of an RFC patch checking for negative f_pos handling from Kamezawa Hiroyuki (the 4th version introduces a new flag S_VERYBIG for which negative offsets will be allowed – covering certain special system files), version 3 of a patch series moving use_mm/unuse_mm from “aio” into the core “mm” directory from Michael S. Tsirkin, a defense of DRBD from Lars Marowsky-Bree (in response to a – typically blunt – commentary from Christoph Hellwig), a suggestion from Mel Gorman that having an in-kernel user of the altered hugetlbfs interface might need to be a pre-requsite to mergeing of the recent hugetlb patches, a fix for softirq_to_name from Li Zefan, a question from Jan Kara as to whether the nobh_ versions of various functions in fs/buffer.c are still useful (was it because buffer heads consume memory? if so, why are only ext2/3/4 using them?, and in only a limited capacity), a patch killing off kernel markers (”now that the last users of markers have migrated to the event tracer code”) from Christoph Hellwig, a patch supporting the recent addition to qemu of a VIRTIO_BLK_F_FLUSH flag marking a virtual disk has having a volatile write cache, a large number of mmotm merge plan comments from Oleg Nesterov, a report of a 2.6.31-rt10 crash from John Kacur (a stack corruption bug during a “make modules_install install”), a patch adding support for dumping the stack and VM state on OOM kill from David Rientjes, and version 3 of the “compcache” compressed swap patches from Nitin Gupta.

Finally today, Linus rants about how everything could be implemented as a system call (including, in some ideal world, even representing page faults as pseudo-system calls for tracing purposes), and criticizes “idiotic packet interface[s]” (he was replying to a thread discussing fanotify) that are “just a fancy way to do ioctl’s, and everybody knows that ioctl’s are bad and evil. Why are fancy packet interfaces suddenly much better?”. Why indeed :) Arjan van de Ven decided that adding page faults to his timechart utility was probably a good idea, based on Linus’ tracing comment.

The latest kernel release was 2.6.31.

Ingo Molnar notes that a previous round of PCI updates from Jesse Barnes have been causing “nasty bo tup crashes” in the PCI code for -tip. He cites an example of such a failure.

Andrew Morton posted an mm-of-the-moment for 2009-09-17-18-00.

Stephen Rothwell posted a linux-next tree for September 17th. Since Wednesday, the blackfin tree changed location and owner, the ia64 tree returned after the author came back from vacation, and the reiserfs-blk (part of the ongoing effort to kill off the Big Kernel Lock or BKL) was temporarily removed. Stephen reports that conflicts are still bouncing from one tree to another as Linus merges trees: the tty.current, input-current, blackfin, ia64, ext4, rr, and nfsd trees had issues, while the thumb-2, microblaze, sh, pci, and driver-core trees lost their previous issues. The total sub-tree count remained steady at 140 trees in the latest compose.

Eric Paris reports that linux-next trees after September 14th are unbootable on his KVM guests. He posts a bisect that comes down to a series of scheduler fixes, along with the panic message from his Fedora system.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

  • Print this article!
  • del.icio.us
  • Facebook
  • TwitThis
  • Identi.ca
  • Digg
  • Google Bookmarks
  • Slashdot
  • RSS
Categories: episodes Tags:
  1. No comments yet.
  1. No trackbacks yet.