Archive

Archive for October 3rd, 2009

7 catchup podcasts, more to come shortly

October 3rd, 2009 jcm No comments

I just recorded 7 podcasts and have a few more going through production at the moment. I am doing my best to get up to date today…here’s hoping. Of course, feel free to volunteer to help :)

Jon.

Categories: Uncategorized Tags:

2009/09/17 Linux Kernel Podcast

October 3rd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090917.mp3

For Thursday, September 17th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: devtmpfs, performance counters, ummunotify, VFS, and VMI.

Devtmpfs. Eric W. Biederman posted a patch entitled “Remove broken by design and by implementation devtmpfs maintenance disaster”, which was bound to get some attention in the process. Devtmpfs is a recent effort from Kay Sievers (and several others) to implement some of the better minimal features of an in-kernel udev-style device tmpfs for pre-udev or non-udev environments. It isn’t intended to be a replacement for a userspace device filesystem (udev came about following previous attempts at in-kernel filesystems such as Richard Gouch’s devfs), but can help with initial device node population. Eric wasn’t buying it though, and criticized devtmpfs for breaking tmpfs, not handling errors, being in the wrong kernel tree location, having an incorrect Kconfig, and fundamentally having a “bogus” justification for existing. He (Eric) believes that it doesn’t solve any hotplug problems (which can be addressed – in his opinion – with a static dev), and that doing it in userspace is not slower but is more flexible. He also takes issue with the way devtmpfs was developed and the fashion in which it was merged, especially suggesting that review issues were “dismissed, ignored, or met with lies”.

Kay answered each of Eric’s points calmly and with a carefully reasoned response, which included pointing out devtmpfs isn’t strictly a filesystem (it just populates a tmpfs superblock, which is why it doesn’t live in fs/), isn’t intended to exist for speed purposes (he asks Eric to re-read the archive), and includes another explanation of how static /dev filesystems are unreliable and unpredictable and why this is useful, given that it is being proposed by precisely the same people who develop the userspace tools and generally prefer to have as much hotplug functionality in userspace as they can. Overall, Kay’s response is very much worth reading for anyone else who is confused about the raison d’etre for devtmpfs. Greg’s reply was much shorter (and very much not liked by Eric, who said he wasn’t “relevant to this discussion”). Greg wondered aloud why Eric didn’t see fit to CC both him and Kay on the original message, saying “if I was a paranoid person, I would think that you were somehow trying to skirt around us for some unknown reason”. He defered to Kay’s response (which Eric thought showed how Kay “wasn’t paying attention”) for the remaining technical justifications.

Alan Cox agreed with Eric’s assertion that someone in the filesystem camp should sign off on devtmpfs. He prefers that someone to be Al Viro (as Eric had suggested), and is less concerned about arguments of “devfs2″ than whether any implementation of an in-kernel device filesystem is technically correct and “doesn’t screw stuff up”.

Performance Counters. Ingo Molnar announced a new utility forming part of the ever-growing family of “perf” tools. “perf sched” is a utility to “capture, measure and analyze scheduler latencies and behavior”. It is intended to provide hard data to meet the “ambitious goal” of using performance events to objectively characterize arbitrary workloads from a scheduling and latency point of view. Using “perf sched”, one can record and visualize various aspects of scheduling workloads such as latencies and context-switches. Ingo includes several full examples, documentation, and a new branch on his “tip” tree that he requests Linus consider merging into the tree immediately. Ingo would clearly prefer this utility for more “apples to apples” type of comparisons between future competing scheduler implementations.

ummunotify. Peter Zijlstra followed up to the recent request that ummunotify be pulled in reviving a previous comment (apparently from Aton Blanchard) that this stuff “might be integrated with perf-counters” (since performance counters already features mmap() tracking and provides events through an mmap buffer). It’s a stretch, but one can see where Peter is coming from without going too far down the “let’s put everything in perf. counters” road. Still, Roland Dreier didn’t think it was a good fit to be integrating these. As he puts it, he’s trying to solve the problem of allowing an app to request notification when a (small) subset of its address ranges are backed by mappings that then become invalidated, whereas performance counters doesn’t provide a mechanism to track individual ranges (only all mmap() traffic). Peter thinks this could still be added to performance counters though.

VFS. Jan Kara posted version 3 of a 7 part RFC patch series entitled “Improve VFS to handle better mmaps when blocksize < pagesize”. This is intended to solve problems that arise with mmap()’d writes when the blocksize is less than the pagesize. Jan explains that we would like to use page_mkwrite() to allocate such blocks (the filesystem can return a page fault in certain error situations), but cites a situation where only one block is allocated for a page that – on later write – suddenly needs additional blocks allocated, that we ideally should have allocated ahead of time). So far, apparently ext2 and ext4 have “survived some beating”, so Jan is seeking further comments.

VMI. Alok Kataria posted to let everyone know that the folks at VMWare have been performing experiments to compare the performance of VMware’s paravirtualixation technique (VMI) with modern hardware MMU technologies in recent Intel and AMD processors, on VMware’s hypervisor. They found that in most of the benchmarks, the hardware EPT/NPT technologies are at par or provide better performance than using the older VMI approach. For this and other reasons explained in the email, VMWare have decided to discontinue support for VMI and they request comments on how best to go about “retiring” the VMI code from mainline Linux in due course. Various others were appreciative of the heads up and supportive of removing code that won’t be supported in future.

In today’s pull requests: a series of tracing fixes for 2.6.32 from Steven Rostedt, some MFD updates for 2.6.32 from Samuel Ortiz (requested twice, the second time without one of the drivers being incorrect), some FUSE updates from Miklos Szeredi, some Blackfin patches from Mike Frysinger, a request to pull the async_tx tree in order to receive dmaengine, async_tx, and RAID6 updates from Dan Williams (the Intel one), some networking and SPARC updates from David Miller (including John Linville’s latest wireless updates), some sound fixes from Takashi Iwai, some further tracing updates from Steven Rostedt, some timechart patches from Arjan van de Ven, some libata updates from Jeff Garzik, some further tracing updates from Ingo Molnar, some scheduler updates from Ingo Molnar (which include the remainder of the rework on the load balancer rewrite, but which are very new and potentially have some risk), some additional x86 fixes from Peter Anvin, and some x86/mce fixes from Peter Anvin.

In today’s miscellaneous items: a Makefile patch from Caveh Jalali fixing build problems for external modules on certain architectures, ongoing discussion of mm-of-the-moment (mmotm) merge plans for 2.6.32, a series of USB console fixes for 2.6.32 from Jason Wessel, version 2 (and then 3, and then 4) of an RFC patch checking for negative f_pos handling from Kamezawa Hiroyuki (the 4th version introduces a new flag S_VERYBIG for which negative offsets will be allowed – covering certain special system files), version 3 of a patch series moving use_mm/unuse_mm from “aio” into the core “mm” directory from Michael S. Tsirkin, a defense of DRBD from Lars Marowsky-Bree (in response to a – typically blunt – commentary from Christoph Hellwig), a suggestion from Mel Gorman that having an in-kernel user of the altered hugetlbfs interface might need to be a pre-requsite to mergeing of the recent hugetlb patches, a fix for softirq_to_name from Li Zefan, a question from Jan Kara as to whether the nobh_ versions of various functions in fs/buffer.c are still useful (was it because buffer heads consume memory? if so, why are only ext2/3/4 using them?, and in only a limited capacity), a patch killing off kernel markers (”now that the last users of markers have migrated to the event tracer code”) from Christoph Hellwig, a patch supporting the recent addition to qemu of a VIRTIO_BLK_F_FLUSH flag marking a virtual disk has having a volatile write cache, a large number of mmotm merge plan comments from Oleg Nesterov, a report of a 2.6.31-rt10 crash from John Kacur (a stack corruption bug during a “make modules_install install”), a patch adding support for dumping the stack and VM state on OOM kill from David Rientjes, and version 3 of the “compcache” compressed swap patches from Nitin Gupta.

Finally today, Linus rants about how everything could be implemented as a system call (including, in some ideal world, even representing page faults as pseudo-system calls for tracing purposes), and criticizes “idiotic packet interface[s]” (he was replying to a thread discussing fanotify) that are “just a fancy way to do ioctl’s, and everybody knows that ioctl’s are bad and evil. Why are fancy packet interfaces suddenly much better?”. Why indeed :) Arjan van de Ven decided that adding page faults to his timechart utility was probably a good idea, based on Linus’ tracing comment.

The latest kernel release was 2.6.31.

Ingo Molnar notes that a previous round of PCI updates from Jesse Barnes have been causing “nasty bo tup crashes” in the PCI code for -tip. He cites an example of such a failure.

Andrew Morton posted an mm-of-the-moment for 2009-09-17-18-00.

Stephen Rothwell posted a linux-next tree for September 17th. Since Wednesday, the blackfin tree changed location and owner, the ia64 tree returned after the author came back from vacation, and the reiserfs-blk (part of the ongoing effort to kill off the Big Kernel Lock or BKL) was temporarily removed. Stephen reports that conflicts are still bouncing from one tree to another as Linus merges trees: the tty.current, input-current, blackfin, ia64, ext4, rr, and nfsd trees had issues, while the thumb-2, microblaze, sh, pci, and driver-core trees lost their previous issues. The total sub-tree count remained steady at 140 trees in the latest compose.

Eric Paris reports that linux-next trees after September 14th are unbootable on his KVM guests. He posts a bisect that comes down to a series of scheduler fixes, along with the panic message from his Fedora system.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/16 Linux Kernel Podcast

October 3rd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090916.mp3

For Wednesday, September 16th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Blackfin, Fanotify, KVM, and tracing.

Blackfin. Daniel Walker ranted about Mike Frysinger’s posting of Blackfin patches to the LKML. Mike responded that he hadn’t seen any complaints from others about large patch series being posted, and that those with “sane” mail clients wouldn’t have to deal with much more than skipping an entire thread anyway. Andrew Morton then steered the conversion towards why Blackfin wasn’t being pulled into linux-next, to which Mike replied with a request for further information about the process for getting his patches into Stephen’s tree.

Fanotify. In response to comments from Linus on the previous afternoon (in which Linus questioned “what’s so wonderful about fanotify that we would actually want yet-another-filesystem-notification-interface”), Eric Paris posted a rather detailed overview of the kinds of features offered by his interface. Chiefly, Eric cites the fact that fanotify passes an open fd with each event rather than “some arbitrary ‘watch descriptor’, along with an extensible data format, and a commitment from several anti-malware companies to use the new interface once it is available in the mainline kernel. Jamie Lokier sent a rather terse (however quite lengthy) response that criticised various features not currently available with fanotify (such as subtree notification) and suggested that his main concern was avoiding mistakes made with the previous inotify and dnotify mechanisms, while encouraging Eric to consider other use cases not covered by anti-malware consumers.

KVM. Today being a KVM Wednesday, there were a number of patches posted. First off were some rather cool looking patches from Avi Kivity implementing “just in time” MSR switching. These aim to reduce the need to perform expensive MSR update operations on guest pre-emption. KVM already optimizes to avoid such writes on every guest entry/exit, but it will now defer to the very last possible moment before performing such an update. Not to be outdone, Joerg Roedel posted some nested SVM “fixes and cleanups” aswell.

Tracing. Steven Rostedt inquired as to collective viewpoints on a dedicated tracing list, to be held on vger.kernel.org. The idea would be to have a separate place for discussions related to users of tracers and the “perf” performance counters utility. This “will not be a place for kernel development”, according to Steven, who wishes to address the issue of LKML being intimidating for those having questions, without causing development fragmentation by taking important discussions elsewhere.

In today’s pull requests: an XFS update from Alex Elder (including a large number of the fixes previously discussed from Christoph Hellwig), some networking and SPARC fixes from David Miller (containing quite a few important looking fixes in both cases), an official request to merge the latest version of Andi Kleen’s hwposion patches into 2.6.32, part two of some AMD64 EDAC updates from Borislav Petkov, some ext3 fixes from Jan Kara, an email from Roland Dreier (in which he also reminds us that he’s been showering and brushing his teeth, so shouldn’t be entirely ignored) asking Linus where his ummunotify request stands, some DLM updates for 2.6.32 from David Teigland, some tracing updates from Steven Rostedt, and the latest round of wireless patches from John Linville (who describes them as being “nothing too controversial”).

In today’s miscellaneous items: some further kmem and hwpoison bits from Fengguang Wu, a rant from Daniel Walker about Mike Frysinger’s posting of Blackfin arcihtectural patches to the LKML (which nobody else endorsed), a patch marking the SLQB allocator as “broken” on PowerPC and s390 from Pekka Enberg, a fix to use the correct export symbol to walk a system ram range in infiniband from Kamezawa Hiroyuki, a question about explicitly putting a PCI device into an ACPI D0 state from Michal Witkowski, a patch to disable preemption within stop_machine from Xiao Guangrong, a suggestion from Balbir Singh that the memcg patches in Andrew’s mm tree be merged for 2.6.32, version 3 of a patch series from Jiri Olsa implementing multiple pids within the tracing set_pid_ftrace file, a patch changing the name of the kernel thread managing an md raid device according to the type of RAID level in use (rather than the default of merely always using “5″) from Zen Chen Jin, an RFC patch series from Sheng Yang entitled “Xen Hybrid extension support” intending to allow guests to run in other than ring0 context (thus avoiding the TLB flush overhead in context switching between guest and hypervisor), some comments from Vivek Goyal concerning the fairness (or lack thereof) of the ioband patches when running on rotational media (as compared with his own IO scheduler based IO controller patches), a regression in kallsyms reported by Paul Mundt (who says that both Sam Ravnborg and Lai Jiangshan have yet to respond), version 2 of a patch adding support for walltime to ftrace from Zhao Lei, an RFC patch to handle a negative f_pos when manipulating /dev/kmem from Kamezawa Hiroyuki, some updates to the page-types utility from Fengguang Wu, version 4 of the post merge per-bdi writeback patches from Jens Axboe, the latest version of the “Memory Protection Units” (MPU – a simpler MMU alternative implementation) from Mike Frysinger and originally authored by Bernd Schmidt, a patch finally updating the MAINTAINERS file with the new location of the ARM linux mailing lists on infradead from Joe Perches, some linker script cleanups from Tim Abbott (enabling ksplice pre-requisites), a patch warning when selecting symbols with unmet direct dependencies in kbuild from Catalin Marinas, a fix for degraded performance when all inodes are under writeback from Jan Kara, a reiserfs patch also from Jan Kara, a defense of the vnet bus patches from Gregory Haskins (in which he lays out the case for kernel-to-kernel virtualizable communication), a fix for an oprofile related ring buffer regression from Christian Borntraeger, and some futex comments from Darren Hart.

Finally today, various responses came in related to the “Tricks to speed up kernel builds”. Thomas Fjellstrom says he prefers using icecream with make -jX, while David Lang provides further build analysis.

In today’s announcements: Greg Kroah-Hartman posted review patches for the stable 2.6.27.35 and 2.6.30.8 kernels, and Jakub Narebski announced that the Git User’s Survey 2009 had ended (for which results will be available soon). On that note, Junio C Hamano announced the release of git version 1.6.4.4 (which includes an important fix for users of github occasionally experiencing an HTTP 500 error response).

The latest kernel release was 2.6.31.

Andreas Mohr reported a regression in which USB autosuspend no longer functions after performing a suspend-to-RAM and resume cycle.

Stephen Rothwell posted a linux-next tree for September 16th. Since Tuesday, conflicts continue to bounce between various trees as Linus continues to perform various merges. Linus’ tree had a build failure (for which a patch was applied), the rr tree also had a build failure, while the drbd and staging trees lost their conflicts. The total sub-tree count remains steady at 140 trees in the latest compose. Stephen reminds everyone not to post patches destined for 2.6.33 until at least 2.6.32-rc1 has been released.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: