Archive

Archive for August, 2009

2009/08/09 Linux Kernel Podcast

August 13th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090809.mp3

For the weekend of August 9th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: clone_with_pids(), HTC Dream, Nested SVM, and Performance Counters.

clone_with_pids(). Sukadev Bhattiprolu posted version 4 of a 7 part patch series implementing a new clone_with_pids() system call for use with checkpoint application restarting support. The idea is to request that the kernel give a newly created task or tasks a specific set of process IDs so that code being resumed from a checkpoint will have a consistent process ID. Sukadev is interested in feedback on the proposed system call interface and offers two alternatives for consideration.

HTC Dream. Pavel Machek posted yet more patches for the HTC Dream. At this point he has done a large amount of work at pushing this support into the staging tree. The latest effort included support for input devices connected to GPIO pins, and a number of other fixes. Separately, Jiri Slaby posted a buffer overflow fix for the Dream, which he is unable to even build test as he doesn’t have an ARM toolchain (this suggests he doesn’t have hardware either).

Nested SVM. Joerg Roedel posted version 2 of a series of nested SVM cleanups. The patchset has been tested with the use case of KVM within KVM and has shown apparently no regressions (with the first-level guest using nested and shadow paging). The latest version of the patch enables nested SVM support by default, although the user must still invoke qemu with -enable-nesting.

Performance counters. There were a number of small patches to performance counters over the weekend, from a number of people, suggesting that many are starting to play with these now. Of the patches, there was support for displaying per-thread event counters from Brice Goglin, and a fix to avoid oopsing on PowerPC CPUs without performance counter hardware support from Paul Mackerras.

In today’s miscellaneous items: some reposted patches implementing asm-generic and dma-mapping-common for SPARC from Tujita Tomonori, a futex bugfix from Darrent Hart, a series of fsnotify patches from Eric Paris, a series of patches converting parts of the kernel over to using printk_once from Marcin Slusarz, version 2 of a patch fixing an oops in identify_cpu() on CPUs without the CPUID instruction on x86 from Ondrej Zary, some timer, tracing, core, and x86 fixes from Ingo Molnar, some critical KVM updates for 2.6.31-rc6 from Avi Kivity (including a guest-initiated DoS fix), a winbond IR driver from David Hardeman, a possible regression in XFS in 2.6.30.4 raised by Justin Piszcz, a number of updates to the staging tree and some USB fixes (including addressing some EHCI warnings folks are seeing – Greg included – and a number of other fairly minor fixes) from Greg Kroah-Hartman, some RT fixes for ARM from Uwe Kleine-Konig, some fixes for SDHCI (high speed and 4-bit SD cards) from Anton Vorontsov, a few “relatively small” bug fixes for btrfs from Chris Mason, a pull request for some wireless updates from John Linville, version 5 of a patch adding trace events to the page allocator from Mel Gormon, version 4 (apparently “for the upstream community, this is revision 3″ – worth fixing that to adopt one numbering scheme soon) of support for the Intel RAR Register from Mark Allyn, a lockdep warning in 2.6.31-rc5-rt1.1 from Clark Williams, an update to CPU topology detection for AMD Magny-Cours from Andreas Herrmann, a fix to a memory leak in the ring_buffer free code from Eric Dumazet (which was immediately released as a pull request from Steven Rostedt), version 3 of a patch allowing file truncations on files with suid and write permissions set, which previously incorrectly failed with EPERM, from Amerigo Wang, a patch changing superblock s_maxbytes for an loff_t type from Jeff Layton, and yet another round of DRM fixes for 2.6.31-rc6 from Dave Airlie.

Finally today, Robert P. J. Day inquired as to whether any official standard existed for determining when/if tools should be moved into the top level directory of the same name. As an example, he cited Documentation/fs/slabinfo.c as a candidate.

In today’s security items: A read buffer overflow fix for FAT from Roel Kluin, and the aforementioned KVM guest DoS fixes from Avi Kivity.

The latest kernel release is 2.6.31-rc5, which was released over a week ago.

Hugh Dickens wonders if CONFIG_PREEMPT_RCU is supposed to be working in next/mmotm at the moment, because he suspects it is failing on his PowerPC G5 system, as evidenced by a parallel kernel compilation test the fails in what appears to be a manner consistent with RCU failing to reap the “filp” SLAB. Separately, Martin Schwidefsky wondered whether there was a race in the case of RCU and NOHZ being defined at kernel build time. Martin posted an example interaction showing how this might happen and requested input from Paul E. McKenney, who is the inventor and implementor of RCU support.

Rafael J. Wysocki posted a list of regressions between 2.6.29 and 2.6.30 and also between 2.6.30 and 2.6.31-rc5-git5. The former list of regressions appears to be leveling off for the older kernel (a total of 37 unresolved bugs are cited from the upstream kernel.org bugzilla), however the more recent regressions have increased, with a total of 24 unresolved regressions. Of course, these are just regressions for which there is a tracking bug.

Stephen Rothwell posted a linux-next tree for August 7th. Since Thursday, the following trees gained conflicts and/or build failures: net, security-testing, tip. The following trees lost conflicts and/or build failures: rr, agp. The total sub-tree count remains steady at 138 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/06 Linux Kernel Podcast

August 13th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090806.mp3

For Thursday, August 6, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: AlacrityVM, blk-iopoll, CPU features, PCI Identifiers, Performance Counters, and Tux3.

AlacrityVM. Michaeal S. Tsirkin replied to Gregory Haskins’ announcement of the “AlacrityVM” (which is a fork of KVM) with a suggestion that the Alacrity folks work to start merging with the host side of the project, copy the kvm lists on development, and perhaps update the comparison graphs between KVM and Alacrity to reflect a more “apples to apples” comparison.

blk-iopoll. Jens Axboe posted a patch series implementing a polled completion API for the block layer, with the hope of targeting a merge in 2.6.32. As he puts it, “basically this implements NAPI for block devices, and much of the core is essentially lift from there [the network code]“. Jens has seen good performance results on SSD devices, reducing the interrupt rate a lot (for example, a 28% reduction on a fast box doing 50k IOPS – even with interrupt coalescing support in the hardware being enabled), but up to 95% fewer interrupts on a slow box doing 30k IOPS). Sounds like fun, and was hinted at in the recent “state of the kernel” address at this year’s Linux Symposium.

CPU features. Kevin Winchester noted that his AMD64 system incorrectly reports having X86_FEATURE_LAHF_LM, which the CPU does not actually support (as evidenced by test code which fails with an “illegal instruction”). He tracked this down to an AMD errata that states that the BIOS should program an MSR to indicate that this feature is present, which it might be erroneously doing in his particular case. He suggests that the kernel could automatically remove this feature flag from early Athlon 64 processors known not to support it.

PCI Identifiers. Dave Jones noted the inconsistent approach to handling pci_ids.h, a file containing global PCI identifiers. Officially, this file is supposed to only have entries for drivers that need to share a PCI ID with other drivers (for example, for multi-port cards or alternative drivers), but it has turned into a kind of free-for-all, that Dave aims to fix with a comment explaining when to add new entries to this file.

Performance counters. There were a number of updates and patches to the perfcounters code. These included symbol parsing fixes, reporting fixes, and other updates from third parties. Included amongst these was a patch from Frederic Weisbecker implementing support for ftrace event record sampling.

Tux3. In continuing discussion of the tux3 filesystem, and its future, Daneil Philipps had mentioned how he is more likely to put greater effort into tux3 merging if invited to do so. Otherwise, he says, “if we are not invited to merge, nobody has any cause to complain about progress slowing down”. This caused Ingo Molnar to send a lengthy reply politely explaining that in his 14 years of Linux hacking, he had never seen nor had such an invitation. Linux doesn’t work this way, but instead relies upon people requesting to merge.

In today’s miscellaneous items: version 4 of the trace events for the page allocator patches from Mel Gorman, a patch from Li Zefan allowing one to specify which filter type should be used for TRACE_EVENTs (existing support allowed only customized filters for static and dynamic strings), some test-for-null kmalloc/kzmalloc checks added in PowerPC from Julia Lawall, a minor update to the CPU topology documentation from Andreas Herrmann (adding mention of new attributes for the recent mutli-node processor support), a suggestion from Joe Perches that the MAINTAINERS file more prominently mention the linux-arm mailing list (which Russell King had previously suggested he saw no signs of people moving to), a patch killing the BKL in compat ioctl handling from Arnd Bergmann, a number of /proc/kcore cleanup patches (6 patches actually) from Kamezawa Hiroyuki aimed at removing many per-arch hooks and supporting e.g. VM hotplug, a lengthy question email concerning the correct way to handle DMA and cache on ARMv7 systems from Laurent Pinchart, a patch implementing __[un]register_chrdev() from Tejun Heo allowing one to specify a subset of minor numbers to register and unregister (used by the ALSA OSS cleanups), a new ALS (Ambient Light Sensor) device class in sysfs from Zhang Rui, some tracing fixes for 2.6.32 from Frederic Weisbecker, version 2 of the “crashkernel=auto” patches from Amerigo Wang, some input updates from Dmitry Torokhov, some more DRM fixes from Dave Airlie, callchain support in performance counters and allowing performance counters to access user memory at interrupt time for PowerPC from Paul Mackerras, a request to track down a problem with shmem and TTM from Thomas Hellstrom, and a patch implementing devtmpfs_wait_for_dev() from Mind Lei that builds upon yesterday’s re-posting of devtmpfs and allows the kernel to generically wait for a root device to appear without polling and using other hacks. A number of people have now noticed that one needs to set CONFIG_SYSFS_DEPRECATED_V2 on recent RT kernels if testing on e.g. older Enterprise Linux distribution releases, such as RHEL5.

The latest kernel release is 2.6.31-rc5, which was released over a week ago.

Andrew Morton posted an mm-of-the-moment for 2009-08-06-00-30.

Stephen Rothwell posted a linux-next tree for August 6th. Since Wednesday, Stephen has added support for signed next-yyyymmdd tags, and three minor conflicts were addressed. The tree continues to have 138 sub-trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/05 Linux Kernel Podcast

August 13th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090805.mp3

For Wednesday, August 5th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CPUs, devtmpfs, and KVM.

CPUs. Gautham R Shenoy posted an RFC patch series implementing an idle state framework for offline hotplug CPUs. As Gautham points out, when we go into an offline transition state on current systems, we put the affected CPU(s) into a HLT loop (or the equivalent) rather than using the lower C-states that are available. Previous patches have proposed various alternatives – including putting the CPUs into the lowest power C-states available – but the guys at IBM favor giving the user a choice over which state will be chosen. The patch implements a new “available_offline_states” entry in sysfs, from which one can determine a valid low-power state and configure via “preferred_offline_states”.

devtmpfs. Greg Kroah-Hartman reposted “devtmpfs”, which is a patch series originally created by Kay Sievers. Unlike the earlier devfs, this patch series doesn’t attempt to implement device filesystem functionality entirely in the kernel. Instead, the patches provide an implementation that makes life easier for bootstrapping a system by supplying a pre-populated tmpfs filesystem on boot, containing entries for all the initial hardware devices detected. This can be used to boot a system without “complex userspace bootstrap logic to provide a working /dev”. Once devtmpfs is populated, udev takes over and can freely create, manage, and delete any entries it likes as ususal. For those who don’t want to run udev, devtmpfs also offers a cleaner way out.

KVM. Fengguang Wu mailed to let everyone know that Jeff Dike had discovered that KVM pages are being refaulted in 2.6.29. Quoting Fengguang, who cited Jeff, “Lots of pages between discarded due to memory pressure only to be faulted back in soon after. These pages are nearly all stack pages. This is not consistent – sometimes there are relatively few such pages and they are spread out between processes”. Fengguang posted a patch that “drastically reduces” the problem by respecting the referenced bit of all anonymous pages, but suspects that it may re-introduce a previous scalability issue. Discussion continued at some length between the various KVM folks on this one.

In today’s miscellaneous items: a new version 0.12 of the Ceph distributed filesystem from Sage Weil (including several fixes, and some documentation), some networking updates from David Miller (including a lockdep regression that was triggering for a number of people, and was discovered by Ingo Molnar in the previous day’s networking fixes), automatic crash kernel memory allocation from Amerigo Wang (via the new crashkernel=auto boot parameter), some minor s390 updates from Martin Schwidefsky, some OProfile updates from Robert Richter, a suggestion to setup a patchwork (quilt) instance for linux-alpha (although Jeff Garzik cannot have been the only person to wonder if a demonstrated need exists for this), an update to checkincludes.pl from Luis R. Rodriguez that can remove duplicate header inclusions in place (useful, he says for porting “crap” drivers – he also now closes files as soon as he’s done with them rather than keeping file descriptors lying around), conditional support for MSI in sata_nv from Tony Vroon (so far only for MCP55), some build system fixes from Andi Kleen (mcount handling, gold linker support, gcc 4.5 support), an x86 IOAPIC RFC from Cyrill Gorcunov that will only panic on irq-pin binding if needed (i.e. allow failure in the case of PCI), yet another version of the HWPOISON patches from Andi Kleen, version five of the ZERO_PAGE patches from Kamezawa Hiroyuki (with minor fixes), a version 3 of the “security processor” kernel driver from Intel (now with additional support for re-distributing the no-longer-built-in firmware files), and some DRM fixes from Dave Airlie.

Finally today, Dave Airlie expressed some obvious frustration (citing “shitty scripts”) at the lack of verbosity for make V=1 builds. The builds currently fail to display all scripts that are being executed during a build – in particular, Dave Airlie’s case, the ftrace function pre-patching script.

In today’s announcements: SystemTAP version 0.9.9. Josh Stone announced that version 0.9.9 of SystemTAP is now available. It features faster script compilation, improved userspace probing, support for new DWARF_OPs, self-monitoring markers, enhaced variable access, new SNMP tapset, new dentry tapset, bug fixes…and much more.

linux-2.6.31-rc5-tr1.1. John Kacur announced that he and Clark Williams had put together an unofficial preempt-rt kernel release while Thomas Glexixner was out at summer camp (Thomas volunteers every summer with a local camp).

The latest kernel release was 2.6.31-rc5, which was released over a week ago.

Kosaki Motohiro experienced a “poison overwritten” issue with -rc5, which was triggered by the netdev SKB allocation code, but was not able to reproduce it.

Stephen Rothwell posted a linux-next tree for August 5th. Since Tuesday, the tree gained a few minor conflicts, and remains steady at 138 sub-trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/04 Linux Kernel Podcast

August 10th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090804.mp3

Apologies for lagging again, what can I say? I spent most of the weekend working on work stuff and had a root canal last week to deal with. Here’s a hint to anyone considering avoiding the dentist for 6 years – it will catch up with you in the end. Especially if you’re British like this poor author.

For Tuesday, August 4th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Initdev, OOM, poisonous hardware, SIMPLE_PM_OPS, and TTY.

Initdev. If at first you don’t succeed. David VomLehn reposted a patch series intended to implement generic support for delaying the kernel boot waiting for devices to be enumerated and setup, as well as promptly continuing once the lack of an optional device has been determined. The idea behind these patches may be specific to Cisco, but they could help reduce the need for hacks such as scsi_wait_scan and “rootwait”. David says that the 7 part patch series could go in as a unit, but has split it out for easier review. And on that note, he thanks Alan Stern for his comments on the last posting.

OOM. Kosaki Motohiro brought up the change in oom_adj semantics again. This time, he describes the shared oom_adj for processes sharing a struct mm as a regression that requires fixing prior to 2.6.31 final. Other patches floating around add a new “oom_adj_child” that will affect only settings for subsequent processes, but it is not clear that Kosaki is happy with this. It certainly feels a little like someone needs to make a call on exactly how the oom_adj pseudofilesystem entries are going to be cleaned up, documented, and exposed through to userspace in a fashion that won’t cause livelocks.

Poisonous hardware. Andi Kleen posted the latest version of HWPOISON, which is a series of patches intended to support Intel MCA recovery (recovery from certain classes of hardware memory error). HWPOISON implements a high level machine check handler that can catch accesses to pages that have gone bad (mm/memory-failure.c) and can often do something about it. The latest version introduces a new VFS operation of “error_remove_page” that will trigger on a page used by a filesystem going bad. Andi requests further test.

SIMPLE_PM_OPS. Albin Tonnerre posted in regard to the ongoing migration to the new dev_pm_ops dynamic power management interface, suggesting that the current migration was too prone to regression (since all fields must be assigned in order for the new PM code to function), and proposing a SIMPLE_PM_OPS macro that can be used to initialize a dev_pm_ops with at least sane defaults.

TTY. Linus Torvalds continues to be fairly involved with the ongoing TTY layer saga (perhaps he should just own that stack and be done with it). He posted some ldisc locking rewrite patches, which (new TTY maintainer) Greg Kroah-Hartman picked up into the TTY tree, and then promptly sent back to Linus as a merge request. Greg noted, “As you wrote these, I think you know what they fix :) ”.

In today’s miscellaneous items: Some ALSA fixes (Takashi Iwai), some block bits from Jens Axboe (one pre-requisite topology fix, and removing the long overdue “experimental” label from bsg), a large number of networking fixes from David Miller (several regressions targeted), some performance counters fixes from Ingo Molnar (including a useful perf top fix from Arnaldo Carvalho de Melo to ignore mwait_idle_with_hints), some scheduler fixes (also Ingo Molnar), some timer fixes (also Ingo Molnar), a fair number of x86 fixes (also Ingo Molnar), version 5 of the GPIO regulator patches from Roger Quadros, ome NILFS2 fixes from Ryusuke Konishi, a “bundle of fixes” (most of which target s390 systems) for SCSI from James Bottomley, an analysis of lockdep behavior from Peter Zjilstra and David Howells, some patches to convert alpha to asm-generic from Christoph Helwig, the ability to export and unexport named GPIO devices in GPIOlib from Ben Dooks, a regression in AoE support from Bruno Premont, a new version 0.2 of Stefani Seibold’s “kfifo” API, some DRM fixes from Dave Airlie, a suggestion from Vivek Goyal that benchmarks of his IO scheduler based IO controller should be repeated several times and averaged (as some of the figures don’t seem to quite make consistent sense), an updated version 2 of the vbus_enet driver from Gregory Haskins as used in the Novell “AlacrityVM” fork of KVM (which uses the new “vbus”), some new trace events in the VM (Mel Gorman), and a rant from Pavel Machek that the linux-arm-kernel mailing list is not allowing him to crosspost with LKML when there are ARM regressions to discuss.

Finally today, Arnd Bergmann reminds us of the reason why one cannot perform a rename() over a directory symlink, by referenceing several Open Group specificiations.

The latest kernel release was 2.6.31-rc5 and still is current.

Andrew Morton posted an mm-of-the-moment for 2009-08-04-14-22. Several people encountered problems with the previous one, especially in SLQB.

Stephen Rothwell posted a linux-next tree for August 4th. Since Monday, a number of trees lost builds failures (md-current, cpufreq, tip, and oprofile), while the tree otherwise remained consistent at 138 sub-trees in the compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/03 Linux Kernel Podcast

August 4th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090803.mp3

For Monday, August 3rd, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Cacheing get_current() and get_thread_info(), kfifo, KSM, Tracing, userspace DMA, and a standard VME implementation.

Cacheing get_current() and get_thread_info() values. Linus Torvalds, upon examining some of the assembly code generated for recent kernel compiles (in particular the accessor code to read these values) decided that it was long since time to provide some form of cacheing of these values. So, Linus being Linus, he hacked up a new version of percpu_read called percpu_read_stable. The former will continue to generate code accessing the percpu variable every time it is used to retreive one of these thread related information objects, whereas the latter “stable” version allows the value to be cached if the compiler and assembler can arrange for that to be the case.

kfifo. Stefani Seibold posted an RFC patch series implementing a new “generic kernel FIFO implementation”, known as “kfifo”. According to Stefani, the current kernel FIFO API is not very widely used because it has too many constraints (there are only 13 files using it in 2.6.30 according to Stefani). Stefani views FIFOs as a kind of basic type akin to a list and so wants to remove the constraints (such as requiring a lock whether additional locking is needed or not). He implements kfifo_alloc, kfifo_free, kfifo_reset, and so forth, all using the struct kfifo type. For further information see the original list posting, or the inevitable Linux Weekly News coverage.

KSM. Kernel Shared Memory allows one to scan physical pages of memory and reconcile identical copies of data as copy on write pages. It is especially useful for virtual machines, where two independent machine images would not otherwise automatically share data pages that happened to be identical. A lot of work has recently gone into KSM, especially for obvious KVM uses. This year’s Linux Symposium featured a presentation from Andrea Arcangeli on the subject in fact. Meanwhile, Hugh Dickins has been cleaning up KSM for 2.6.32 and has posted a series of updates that he would like to apply both for the upcoming merge window, and also into Andrew’s mm-of-the-moment (mmotm). These include a number of fixes (including an endless OOM loop) and the removal of the VM_MERGEABLE_FLAGS, as well as additional tunables and documentation.

Tracing. Lai Jiangshan posted an RFC patch modification to the crash utility in order to allow it to read in-flight kernel trace buffers captured from the crash dump image. This allows tracing to, quote “act as a flight recorder” in preserving the buffers post-mortum. The work was well received by Christoph Hellwig who described it as “Nice!” (high praise indeed!). Christoph also took the opportunity to suggest that the impressive “CC” list suggested it is time for a linux-trace mailing list. Your author was greatful as always to be on the CC, although I do make an effort to read most mails send to the LKML :)

Userspace DMA. Leon Woestenberg mailed to ask whether it was acceptable to have a PCI device DMA-read from pages that belong to a file mmap()ed by userspace, why get_user_pages() might fail in the process of allocating, and what one should do in general when fewer pages are returned that requested. He wants to implement a userspace scatter-gather buffer and DMA directly into it without having an in-kernel copy operation (or splice perhaps). Hugh Dickens responded that it was appropriate to DMA into userspace directly, and that the general idea was right (including example code) although perhaps Leon is having a low-memory situation in userspace, lack of appropriate permissions, or has not setup the mmap size correctly.

VME. Greg Kroah-Hartman posted a 5 part patch series on behalf of Martyn Welch, who had implemented VME Bus support for the staging tree. Apparently, Martyn has been working with the three different existing implementations of VME support on Linux systems to merge them into a single official one. He sought out and received all the appropriate legal agreements in the process – good job there to Martyn and thanks to Greg for assisting in that effort.

In today’s miscellaneous items: a patch to pin kern mounts as writable (Dave Hansen), an XFS status update for July (Christoph Hellwig) containing little in the way of changes aside from some bug fixes, some wireless updates (John Linville), a patch dropping superfluous casts in nr_free_pages() callers from Geert Uytterhoeven (who I’d obviously forgotten was at Sony these days), a dialogue between Peter Zijlstra and Sherif Fadel concerning the latter’s desire to somehow treat a processor as a “scheduling co-processor” and have the scheduler treat it specially (for which Peter suggest the fix was to “write code”), some questions about whether Linux is booting on certain AMD Geode processors (specifically the SCx200 and LX800) – according to Martin-Eric Racine it has not been booting since 2.6.23 and 2.6.31-rc4 respectively on those Geode CPUs, a request from Ranjith Kannikara for help in decoding ext3 filesystems for some kind of (academic?) foresenic recovery project in progress, version 3 of a patch implementing avoidence of access to holes in vmalloc on reading from /proc/kcore (which had been causing crashes) from Kamezawa Hiroyuki, a patch to make block2mtd work with block devices larger than 4GB in size (Tobias Diedrich), a discussion surrounding the addition of a new scanning feature in MMC to detect which cards a controller supports in place of the existing scanning code (in which outgoing maintainer Pierre Ossman uses the line ‘Linux patches generally need to provide the answer to “Why?”, not just be able to avoid “Why not?”), generic support for ACPI ALS and other ALS devices (Zhang Rui), a trivial fix to kvm_init removing the debugfs entries if the architectural initialization fails, and a patch to scripts/get_maintainer.pl adding optional “git blame” checking of patches (Joe Perches).

Finally today. John Hawley repeated once again the fact that he (and other kernel.org folks also) are well aware that the code generating the front page links is out of date – and so is not creating links to mmotm/linux-next – and that it is on his radar and will be fixed in due course, just as soon as the fallout from other stuff (the bind exploits, conferences, and so forth) has settled and he has chance to move onto this activity.

In today’s announcements: AlacrityVM hypervisor project. In a world where there are many hypervisors, one man announced the creation of yet another. The AlacrityVM is based on KVM and is targeted specifically at performance sensitive workloads such as HPC and Real Time. There is more information about the project on the developer.novell.com website. There are already two mailing lists to discuss the project – I’ll be taking a look, as I’m sure will many other Real Time folks interested in shocking virtualization hybrid setups. Separately, Greg posted some guest drivers for the new VM.

The latest kernel release is 2.6.31-rc5, which was released by Linus on Friday.

Stephen Rothwell posted a linux-next tree for August 3rd. Since Friday, the new tty maintainer trees (tty.current, and tty) for Greg Kroah-Hartman has been added, and a net gain in build failures caused Stephen to do his usual herculean effort to fix up various trees and get the compose out of the door. There are now 138 sub-trees in the linux-next tree, with the addition of the tty trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/02 Linux Kernel Podcast

August 4th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090802.mp3

Apologies for lagging behind. Last week was pretty busy and the box hosting the podcasts got attacked by script kiddies over the weekend. Here we go with a mega update round of podcasts for your edutainment.

For the weekend of August 2nd 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Big Kernel Lock, Futexes are tricky, Kbuild, Ksplice, MMC, RAR, TTY, and a tux3 filesystem update.

Big Kernel Lock. Frederic Weisbecker posted some ftrace tracing patches enabling Big Kernel Lock tracing and filter regex support for doing so. Using these patches, one can monitor BKL events (presumably with an eye toward removing them over time as part of the BKL reduction efforts underway). Separately, Federic also emailed to “announce” version 2 of the reiserfs/kill-bkl tree, which one assumes benefits from his ftrace patches in isolating and removing dependencies upon the Big Kernel Lock. His posting includes some benchmarks, and a lot more detail about his work.

Futexes are tricky. Eric Dumazet posted a nice summary of an ongoing infinite loop futex bug that several folks have been seeing on their kernels recently. As Eric says, clone() provides special support for the TID of created threads, allowing one to request that an integer in user memory be updated on creation (with it’s TID), and cleared on thread death. But because this integer location is in userspace memory, we need to be careful that the kernel doesn’t keep this pointer after an execve() – since the userspace is entirely replaced with another one. The fix is straightforward (setting clear_child_tid to NULL on execve to ensure the kernel will not try to write into it afterward).

Kbuild. Robert P J Day asked again whether there was any interest in a Kbuild “maturity level” that would enable one to attribute various levels of code maturity to kernel configuration options. For example “maturity DEPRECATED”, “maturity OBSOLETE”, or even “maturity BLEEDING_EDGE”. In many ways this feels like the existing EXPERIMENTAL dependency, but it is not a dependency. Instead, this is an entirely new kind of Kbuild attribute that is not as much “just plain hacky” as the existing support for enabling EXPERIMENTAL features. Speak now, or forever hold your peace. Robert would like your opinion.

Ksplice. Tim Abbott posted a preparatory patch series that cleans up the kerne’s explicit references to sections such as .data.page_aligned, .bss.page_aligned, and .data.init_task, replacing them with macros so that they can later be renamed if compiling with GCC -ffunction-sections -fdata-sections. This is a necessary pre-requesite for ksplice getting into the upstream kernel, although, of course it also helps those who want to compile the kernel using split out sections for reasons of memory footprint optimization and removal of unwanted code, for example on embedded systems.

MMC. Pierre Ossman (outgoing MMC maintainer) posted to let everyone know about the current patches that have been lingering in his inbox. These include patches for Intel Moorsetown, an “agressive clocking framework”, “prevent dangling blockdevice from accessing stale queues”, and a number of other patches. Many of them have hints about merge suitability and the idea was clearly to provide Andrew Morton (who has previously said that he now becomes the “de facto” MMC maintainer) with some hints to help him to get going. One hopes that someone else will step up and be in a position to take over rather than having Andrew keep stewardship of yet another kernel subsystem. Separately, Pierre posted his last “git pull” request for MMC.

RAR! Ossama Othman posted a series of patches implementing support for Intel Moorsetown supported Restricted Access Regions (RAR). These are regions of physical memory that cannot be accessed by the CPU (and thus Linux) once they have been locked down using RAR. Ossama notes that didn’t want to try fine grained page level allocations to lock down these regions, but instead wanted a blunt approach using a simple allocator. He has recently also discovered the lib/genalloc.c allocator though and suggests a future version may convert to using that allocator instead of providing his own. Comments are appreciated.

TTY. As hinted previously, it looks like Greg Kroah-Hartman really was stricken with some kind of horrible bug and decided to take on TTY maintainership. He posted a patch in which he says “Clearly, I am a glutton for punishment. I’ll see if I can see Alan’s changes through to the end, otherwise I’ll be fending off a lof of bug reports for usb-serial devices.” This suggests he has reluctantly concluded he might be the best person for the job right now. The MAINTAINERS patch includes a quilt tree, which he has asked Stephen Rothwell to begin to immediately pull into linux-next. Stephen followed up to say that he would be doing just that.

Tux3. The tux3 filesystem aims to offer various write-anywhere, atomic commit, fully versioned features of the kind that one can find in other modern file system projects currently under development. It is based on tux2, which was never released (apparently due to “evil patents sighted”) and is developed by (the highly vocal) Daniel Phillips – these days largely in his spare time. There have recently been a number of postings on the tux3 mailing list, cross posted to LKML, concerning the future of the project and how more volunteers could be encouraged to help, given that Daniel is short on time himself. He has posted some janitorial projects that those interested in getting involved can look into. More generally, Ted T’so suggested that Daniel consider what exactly tux3 offers as it’s main selling points over and above other filesystems – perhaps echoing the thoughts of others who might be wondering exactly why another filesystem is needed to compete with btrfs.

In today’s miscellaneous items: Gui Jianfeng posted some followup benchmarks of Vivek Goyal’s IO scheduler based IO controller (version 7 thereof) showing even better performance figures (for fairness set to zero as before), some ALSA trivial fixes (Takashi Iwai), some Fujitsu laptop specific fixes (Jonathan Woithe), some kmemleak detected memory leak fixes in case radeon_driver_load_kms fails on startup (Xiatian Feng), some (network related) kmemleak reports from Zdenek Kabelac, an explanation of the relative support status of RTL8192SE parts vs. over Realtek network chipsets (Barlomiej Zolnierkiewicz), Roger Quadros followed up as a result of his own previous question concerning use of generic CPU GPIO pins to drive voltage regulator circuitry with a patch he had produced, a fix from Robert Richter to avoid losing samples within the ring_buffer code if a padding event is returned from ring_buffer_consume calling rb_buffer_peek (leading to rb_advance_reader() being called twice) – he also posted a fix for rb_buffer_peek itself, some block bits from Jens Axboe (described as “some minor bits”), some additional x86 fixes from Peter Anvin, version 3 of the previously featured ummunotify, a comment from Frederic Weisbecker that the “Big Kernel Lock” page on the upstream RT wiki was not writeable (which turned out to mean that anonymous writes are not allowed without creating a user account first, as he later noted in a followup message), some x86 IOAPIC simplification and bugfixes (Cyrill Gorcunov), a ctags usability fix (Stefani Seibold), a question about “problems with CONFIG_KVM_GUEST” from Ted T’so (including the exact commands that he is using to start up the qemu backed KVM guests for ext4 testing), a series of “semantic patch” changes fixing various minor readability problems (Julia Lawall), a patch moving resource counters to percpu counters (Balbir Singh), a watchdog fix (Wim Van Sebroeck – only a single fix for a specific device that needs additional interrupt handling logic), some md fixes (Neil Brown), some earlyprintk improvements for those seeking to be debugging over EHCI USB devices on x86 platforms (Jason Wessel), some XFS fixes (Felix Blyakher), a fix for the /proc/kcore reading panic (Kamezawa Hiroyuki), a new “virtio” IDs file to make assigning new IDs much easier and less error prone than it had been having them all over the kernel sources (Fernando Luis Vazquez Cao), and someone else asked about mm/linux-next releases being on the front page of kernel.org once again (Dave Young).

A security item: Ulrich Drepper pointed out an information leak in signalstack caused because the stack_t data structure was defined “before people cared much about 64-bit architectures”. It has a hole in the middle that can leak information to userspace.

In today’s announcements: Linux 2.6.31-rc5. Linus Torvalds announced Release Candidate 5 of the upstream Linux kernel, on Friday evening at 17:49 (PDT). Linus says he wanted to push this now because there are a number of fixes for regressions whereas he’s not so sure about some of the stuff still queued so he wants to get this out there first. Gene Haskett has already posted a fresh crash report before his “amanda” backup process ran, showing a “bad page state” in a tar task. Gene is seeking suggestions to track this down.

Upstart version 0.6.3. Scott James Remnant announced version 0.6.3 of upstart, which includes a bugfix for a job’s main process being terminated while it is already in a stopping state, and a number of other bug fixes. Scott strongly suggests that those distributions using 0.3 migrate to 0.6 in order to help detect and correct bugs in the code, and to benefit from various features present in newer releases of the legacy init replacement.

The latest kernel release is 2.6.31-rc5, which was released by Linus on Friday evening.

Rafael J. Wysocki posted a list of regressions introduced between 2.6.29 and 2.6.30, and also a list of regressions introduced from 2.6.30. The list shows a recent drop in unresolved regressions (co-incidental with Linus’ latest release, which was heavy on regression fixes), but there are still a number of nasty problems there. Separately, Rafael posted individual bugzilla updates.

Stephen Rothwell posted a linux-next tree for July 31st. Since Thursday, the requested OProfile tree was added, Catalin Marinas’ kmemleak tree was undropped as the build failure was resolved, and a number of other trees overall lost their build failures. With the addition of the oprofile tree, there are now 136 sub-trees in the linux-next compose. I shall discontinue notifying that the powerpc tree doesn’t build in an allyesconfig build configuration because it hasn’t done so in many months at this point.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/30 Linux Kernel Podcast

August 4th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090730.mp3

Apologies for lagging behind. Last week was pretty busy and the box hosting the podcasts got attacked by script kiddies over the weekend. Here we go with a mega update round of podcasts for your edutainment.

For Thursday, July 30th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFQ, dm-ioband, MMC, OOM, and the Montreal Power management mini-summit.

CFQ. Shan Wei posted a first attempt at documenting the CFQ (Completely Fair Queing) IO elevator’s tunables. His document goes into quite some detail concerning how one can define timings, data sizes, idle timeouts, and quantum sizes and is worth reading.

dm-ioband. Ryo Tsuruta announced version 1.12.2 of the dm-ioband patches, which is a minor update and rebase to the current dm-devel tree. Ryo asked Alasdair Kergon for his thoughts in regard to merging the dm-ioband patches into upstream. Let’s not forget that there are about three different competing IO bandwidth limiting patchsets at the moment, so if this ends up upstream it will have a significant headstart on the others (for example Vivek Goyal’s IO controller based bandwidth patches) that are still in RFC.

MMC. Segher Bossenkool followed up to the previous discussion of MMC specs with a note that the MMC Assocation has merged with JEDEC, and so the specs are now available freely online at the jedec website. This should aid whoever decides to pick up maintainership of the MMC stack.

OOM. David Rientjes followed up to previous discussion of his patches introducing a new per-task oom_adj_child. He notes that the OOM killer now only relies upon the highest oom_adj score for multiple threads sharing an mm since otherwise there would be an inconsistency in reporting oom_score *and* a livelock potential in the case that one thread set OOM_DISABLE. David suggests that while this is a behavioral change for those who expect setting a thread oom_adj to not affect other threads, it is better than a livelock, and there exists now an inheritable parameter oom_adj_child that one can use instead to inform the OOM killer about the default oom_adj value for newly cloned tasks.

The Montreal Linux Power Management Mini-Summit. Len Brown posted some minutes from the power management “mini-summit” that took place concurrent with the Montreal Linux Symposium a couple of weeks ago. This time, the summit had been open to non-invitees to drop in if they felt so inclined. Some did so. The topics of discussion apparently included ACPI platform BIOS compatibility fixes, hibernation, suspend/resume framework rework by Rafael J. Wysocki, power aware scheduling, tools, and a lot more. The full email was quite long and thoroughly comprehensive, so look for it if you have further interest.

In today’s miscellaneous items: some fixes from Huang Ying that enable the MCE testsuite to work properly, a suggestion from Pavel Machek that the uid mount options for ext2 and ext3 default to an uid other than root (for example, -1), which is less “dangerous”, some chiding of Thomas Hellstrom (also from Pavel Machek) that he was “slowing the kernel down” without adequately explaining himself in making x86 use clflush() instead of wbinvd() in changing memory mappings, a missing mutex lock discovered using a French “semantic patch” tool (Stoyan Gaydarov) – the tool, Coccinelle looks worthy of investigation, a request to pull powerpc updates (Benjamin Herrenschmidt), an minor attack on grubby (the tool used on Fedora systems to automatically add entries to GRUB), some GFS2 fixes (Steven Whitehouse – including some pre-pull requests), a confirmation that Eric Dumazet’s fix to the infinite loop in get_futex_key solves the problem in 24 hours of testing (Jens Roseboom – who adds a “tested-by” signoff), a suggestion from Chris Mason that it’s ok to now send READA hints (instead of the READs currently being sent down) in bios to the elevator since it won’t become a transient failure any more like it used to on occasion, some scheduler fixes (Gregory Haskins), a quota fix to silence a lockdep warning (Jan Kara), some UDF fixes (also Jan Kara), some questions about interrupt state preservation (Michael S. Zick), some btrfs updates (Chris Mason), a simple fix to add new devices to a bus’ list before probing (Alan Stern), some tracing and timers fixes (Thomas Gleixner), a question about Thinkpad X20 docking support – or lack thereof – (Meelis Roos), a CPU hotplug fix so memory allocated at insertion is not lost on removal for CPUMASK_OFFSTACK=y (Li Xefan), some lguest and virtio fixes (Rusty Russell), version 13 of the per-bdi writeback flusher threads patch series from Jens Axboe, and Roger Quadros inquired as to the best way to control a voltage regulator using GPIO pins of his (one presumes embedded) CPU with the regulator framework.

Finally today, Zhao Lei posted version 4 of his calendar time to broken down time for universal use patches, including use of a long type for tm.tm_year. Evidentally the concerns about supporting beyond 2 billion years were serious enough to warrent the Linux kernel’s calendar support outliving our own Sun.

In today’s announcements: Greg Kroah-Hartman announced linux 2.6.27.29 and .30.4, encouraging all users of these “stable” series kernels to upgrade.

The latest kernel release was 2.6.31-rc4, which was superceeded by 2.6.31-rc5 by the time Friday came around.

Andrew Morton released an mm-of-the-moment for 2009-07-30-05-01, against -rc4.

Stephen Rothwell posted a linux-next tree for July 30th. Since Wednesday, the kmemleak tree is really included this time (it was acidentally ommitted even though yesterday’s announcement said it had been incorporated), but it was dropped shortly thereafter on a temporary basis due to a build problem. The tree fails to build in an allyesconfig build configuration on powerpc and a number of other gains and losses were recorded in other subtrees. The total tree count within the linux-next compose remains at 135 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/29 Linux Kernel Podcast

August 4th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090729.mp3

Apologies for lagging behind. Last week was pretty busy and the box hosting the podcasts got attacked by script kiddies over the weekend. Here we go with a mega update round of podcasts for your edutainment.

For Wednesday, July 29th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Clocksources, kmap vs kmap_atomic, nested virtualization, page cache readahead, and the tty layer deathmarch.

Clocksource. As mentioned previously, Martin Schwidefsky had found the clocksource switching code to be broken on his systems and was working on providing a fix. This has now turned into a quote “full fledged code rework” that is currently providing him with clocksource switching on s390 and Athlon systems – the latter having a broken TSC clocksource. Martin’s code rework is using stop_machine to effect the actual switchi (and a watchdog to trigger a switch if needed), which is expensive but should not called all that often. Martin thanks the usual suspects (John Stultz in particular) for their aid.

Kmap vs. kmap_atomic. Laurent Pinchart previously inquired about the relative merits of using fixed and persistent mappings vs. atomic in-interrupt mappings for copying in-flight video data pages received over USB. Jonathan Corbet (LWN) followed up suggesting that Laurent consider not performing the actual copy from within interrupt context but instead avail himself of more straightforward solutions outside of the interrupt handler. Of course, with threaded interrupt handlers, this situation could change somewhat. Laurent followed up with a round of discussions concerning the best ways to allocate DMA-able memory and ensure it is present – various others have helped to straighten out his understanding and use of technical terms accordingly.

Nested virtualization. Joerg Roedel posted a 12 part patch series implementing nested SVM cleanups and allowing one to successfully perform “KVM in KVM”, even when using nested SMP. The patches include support for intelligently handling the nesting of e.g. vmexit, vmrun, and nested intercepts so that the performance impact of the nested virtualization is not as significant as otherwise might have been the case. There was considerable discussion surrounding these patches, including various implementation questions.

Page cache readahead. Lars Ellenberg, after conducting some profiling runs, posted asking why __do_page_cache_readahead submits READ bios and not READA bios to the block layer. Lars noted that he “was surprised that READA is basically only used for filesystem internal meta data (and not ever for all file systems), but _never_ for file data. Various others followed up suggesting that it used to be the case that more READA requests had been submitted, but that this had other problems and so the kernel had switched at some point. Jens Axboe suggested that the behavior change wasn’t intentional but that he had done various testing in the past – he posted a link to some patches.

TTY. Discussions (or rather somewhat “heated” dialog) continued between Alan and Linus concerning the TTY layer and various outstanding issues with it (add to that a report today from Mikael Petterson that the GCC testsuite breaks on -rc4 based systems – in GNU “expect”). Alan continued to raise a number of issues with Linus’ description of the problems and his proposals to work around some of them. As part of the TTY layer discussions and testing of fixes, Andrew Morton provided Gene Haskett with an updated link to the latest “git-quick” tutorial, which he noted “some dope” had removed from its previous location on kernel.org. Gene will use this to pull a particular version of the kernel tree containing a fix that Linus would like him to test on his system(s). Separately, Catalin Marinas mentions a new VT memory leak.

In today’s miscellaneous items: A suggestion of a possible SLAB entry leak (task_delay_info) from Paul Rolland in -rc4 (for which he provided OOM output, and slabtop information), a confirmation that the previous infinite loop in get_futex_key is still present (Jens Rosenboom – who sent a reproducer), a note (Sam Ravnborg) that he thinks online mailing list archives for kernel.org are generally setup manually and not using some automated process (I can confirm that this appears to be true for archives such as gmane.org), a note (Dave Airlie) that a recent bug really only affects combination Intel IGP/AGP systems (AGP and IOMMU) to which David Woodhouse responded that he wondered how this would affect those with plugin cards on IOMMU systems, some staging radeon kms updates (also Dave Airlie – who is pulling these into F12), a question concerning network hangs on 2.6.12 (allegedy also affecting more less-dead recent kernels), a promised followup posting from Gui Jianfeng concerning Vivek Goyal’s latest IO scheduler patches (this time showing a performance loss of up to 4.9%, which is lower than previous test runs – there are of course also performance gains in some tests also), a patch from Mel Gorman that improves hugepage allocation success rates, a patch from Stanislaw Gruszka (and another similar one targeting POSIX CPUCLOCKs) that aims to improve itimers periodic ticks precision, a conversion of the hv driver in staging to use struct hv_driver (Nicolas Palix), some post scheduling fixes from Gregory Haskins (and Steven Rostedt and Peter Zjilstra), a repost of additional locking support in kmemleak from Catalin Marinas, the addition of LZO compression support for initramfs (Albin Tonnerre), some tracing cleanups (Frederic Weisbecker), a new bug (13850) oops reading /proc/kcore (Mike Smith), ongoing dicussion of the fanotify patches (Eric Paris, and also others), a pull request for wireless updates (John Linville), page allocator trace events from Mel Gorman, some power management fixes (Rafael J. Wysocki), and some clarity on the get_futex_key infinite loop problem (a suggestion from Eric Dumazet that execve() probably forgets to clear clear_child_tid).

In today’s announcements: Git version 1.6.34 and Git 1.6.4. Junio C Hamano once again delighted us with the announcement of several git releases. Some fairly annoying bugs are fixed in these releases, which may be of interest.

Also today, 2.6.31-rc4-rt1. Thomas Gleixner announced the latest release of the “RT” kernel patches ported to 2.6.31-rc4. As covered previously, there are a large number of changes to the underlying architecture in the latest update.

The latest kernel release was 2.6.31-rc4, although a newer rc5 was subsequently released on Friday evening.

Stephen Rothwell posted a linux-next tree for July 29th. Since Tuesday a new kmemleak tree has been provided by Catalin Marinas, Alan Cox’s former ttydev tree has been removed (at Alan’s invocation), the tree still fails to build in an allyesconfig build configuration on powerpc, and the drbd and staging trees gained conflicts against other trees. The subtree count remains steady at 135 trees since the removal of one tree was offset by the addition of another.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/28 Linux Kernel Podcast

August 3rd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090728.mp3

Apologies for lagging behind. Last week was pretty busy and the box hosting the podcasts got attacked by script kiddies over the weekend. On the plus side, I did use the extra downtime to complete automating of my home – I now have (all using Linux) the ability to remotely control my fridge (X10), however the heck that’s useful. Anyway, here we go with a mega update round of podcasts for your edutainment.

For Tuesday, July 28th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Btrfs, dev_pm_ops, MMC, NMIs, NULL pointers, OProfile, Performance Counters, and the TTY layer deathmarch.

Btrfs. Chris Mason requested that Linus pull some late breaking btrfs fixes for 2.6.31 that greatly enhance the way free extents are tracked in RAM – reducing SLAB use in one weekend test from 1GB to 10MB. These patches come originally from work that Josef Bacik has been refining since 2.6.28 or so. Since btrfs is still widely under development, this should be ok to pull.

Device Power Management. Ben Dooks posted to ask about the ongoing conversion to dev_pm_ops (the new device dynamic power management capable code), and specifically whether suspend/resume code should be wrapped in CONFIG_PM conditions (and whether a NULL dev_pm_ops should be supplied if if is configured out of the kernel) and whether such changes could be made during RC, or should wait for the following merge window to open.

MMC. As was raised recently, a new MMC maintainer is being sort since Pierre Ossman is very busy these days and doesn’t want to be a blocker to progress. He does note that he is around and willing to be involved, but the discussion is ongoing. Pierre adds that he generally uses the specs on sdcard.org but that he does also have MMC 4 specs that he cannot pass on (kindly provided to him previously by Nokia). The lack of a formal maintainer didn’t stop Adrian Hunter from posting a seris of updates, especially for OMAP.

NMIs. Paul Mackerras brought up a previous PowerPC discussion topic and asked how it might apply to x86 kernels. On PowerPC, as Ben Herrenschmidt noted, there might be a problem if we get a PMU (Performance Management Unit) interrupt and try to a stack trace of userspace in the interval between when we call switch_mm() and when we call switch_to (all within sched.c). If an NMI occurs in that interval, we’ll see registers from the old task but userspace for the new task, so the stack trace will be “completely bogus”. Paul wonders whether this is also a problem on x86, or if there’s some reason it won’t hit.

NULL pointers. On Monday, Alan Cox raised the idea of catching code exploits that attempt to “jump through NULL”. The idea is clarified somewhat to mean adding a default hardware breakpoint, catching it, and having a handler make an appropriate decision. Andi Kleen argued that hardware breakpoints were a rare resource and that this would upset those who rely upon them, while Alan countered that those who really need all available hardware breakpoints can suitably configure their systems to do so – perhaps losing this feature. Andi also explained (to Kees Cook) how this could not easily be done using page tables alone due to races between different threads.

OProfile. Robert Richter posted a 26 part patch series implementing performance counter multiplexing for OProfile. Quoting Robert, “The number of hardware counters is limited. The multiplexing feature enables OProfile to gather more events than counters are provided by the hardware. This is realized by switching between events at an user specified time interval”. Obviously this is not the same as truly having additional hardware counters, but OProfile is already a snapshot based performance profiling tool, so this approach would seem to be valid. The patch adds a new file (in /dev/oprofile/time_slice) that can be used to specify interval. Separately, Robert posted a series of updates for -tip, on several branches.

Performance Counters. Anton Blanchard raised the obvious point that current perfcounters code only supports tracking executable code, and not data. But he suggests that it won’t be long before we will want to also track data maps – for example to monitor TLB miss rates (hugepage conversion suggestions) or other TLB miss issues – and so he posts a kind of RFC patch that begins to implement such support. He requests review comments on the general idea. Separately, the issue of POSIX signalling and delivery to specific threads was raised again today. This is the issue that a performance counter signal event might not be delivered to the same thread it pertains to, but merely to a thread that forms part of a running userspace process. Andi Kleen and others debated how this might fit in with POSIX and whether a new sigaction flag should be introduced to guarantee delivery to the correct thread.

TTY. Of course the big news today was the change in maintainership of the TTY layer, or rather lack of it. Alan Cox has been heroically fighting battles with the tty layer for some time, trying to beat it into shape (as covered previously), but was finally pushed to breaking point by ongoing heated dialog concerning recent regressions caused by the code having to support various assumptions not necessarily part of any official standard (e.g. emacs file close flushing semantics, and other recent issues). He responded to one particular email from Linus (in which Linus chided Alan for “making idiotic excuses”) with a patch removing himself as maintainer, and suggesting that Linus “have fun”. Later, Greg Kroah-Hartman made some musings suggesting he might be interested in poking in this particularly unpleasant subsystem. As Linux Weekly News noted, this is one subsystem that even scares Ingo Molnar, so it’ll be interesting to see who dares to try fixing it next. Not I!

In today’s miscellaneous items: A resend of a patch from Jon Hunter that enables long sleep times for tickless kernels on 32-bit platforms (as covered previously – increasing from the previous maximum sleep time of 2.15 seconds), an fbcon bugfix correcting a problem with rotating upside down (Stefani Seibold), a new version of the uid mount option for ext2/ext3 patches that uses the specified uid for the files on disk also (and not just for mounts) – rather than root – and allows also for this to be configured at runtime (Ludwig Nussel – as suggested by Andreas Dilger), a confirmation from Jui Jianfeng that he will re-run his tests against Vivek Goyal’s IO scheduler IO controller using the latest V7 version (there are efforts to find out where the 7% performance hit has been introduced), some tracing fixes (Lai Jiangshan), a legal question surrounding linking initramfs images containing proprietary drivers directly into the kernel (Subodh Nijsure – who was told that the LKML is a technical list and not a place for legal advice), a second round of mcheck/EDAC “marriage” patches (Borislav Petkov), a patch to deny use of CLONE_PARENT|CLONE_NEWPID in combination as part of a clone operation (Sukadev Bhattiprolu – who wants to wait at least “until the required semantics of the pid namespaces are clear” before touching this again), some i2c fixes (Jean Delvare), some hwmon fixes (also Jean Delvare), a note that Jesse Barnes is on vacation so Matthew Wilcox is handling PCI updates until August 6th, a lengthy debate about whether a new MAINTAINERS file section was needed to somehow indictate individual wireless driver writers in addition to John Linville as the subsystem maintainer (which David Miller seemed to think would instead only create confusion for those sending patches – which all need to go via John anyway), a series of patches converting IPVS to use pr_fmt, some USB, driver core, and staging fixes (Greg Kroah-Hartman), some interesting patches from Mel Gorman that add trace events for the page allocator, some libata fixes from Jeff Garzik (mostly one-liners aside from pata_at91), a series of stable review patches from Greg Kroah-Hartman for 2.6.27.29 and .30.4, and Peter Zijlstra was happy to discover that his CFS group scheduler fairness fix (aiming to restore fairness that has been apparently broken since .29-rc1) worked fine even though that had only been compile tested before he posted.

Finally today, Pavel Machek responded to Ogawa Hirofumi’s concerns about the use of an int type in the calendar time to broken-down time patches noting that support for years up until 2,000,000,000 (2 Billion) is probably more than sufficient, given that our own Sun won’t be lasting a whole lot more than 5 Billion years either. One assumes we’ll all be using Star Dates long before then anyway, and have holographic representations of famous kernel hackers to create interesting everyday plot situations with along the way.

The latest kernel release is 2.6.31-rc4 (except it isn’t any more since -rc5 was released on Friday evening).

Stephen Rothwell posted a linux-next tree for July 28th. Since Monday, the benh-mm tree was dropped (merged), the tree fails to build in an allyesconfig build configuration on powerpc systems, and several net build failures were introduced. Stephen did a final re-merge of Linus’ tree to get some updates. The total subtree count decreases to 135 trees after dropping benh-mm.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: