Archive

Archive for February 17th, 2010

2010/02/14 Linux Kernel Podcast

February 17th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100214.mp3

This podcast is brought to you by the colour blue and way too much coffee, together reminding you to check out the awesome power of the BeagleBoard Open Source hardware project at http://www.beagleboard.org/. My new Rev C. board was responsible for the delay getting this issue out…too much fun was had.

For the weekend of February 14th, 2010, I’m Jon Masters with a summary of the weeks’s LKML traffic.

In this issue: Linux 2.6.33-rc8, x86 bootmem, NFS, OOM, Performance Counters, Relaxation, Stack Sizes, and SysFS mutability.

Linux 2.6.33-rc8. Linus Torvalds announced the release of version 2.6.33-rc8 on Friday February 12th 2010 at 11:49 am Best Coast Time (PST), saying that he hoped it would be the last before 2.6.33 final. He added that, “A number of regressions should be fixed, and while the regression list doesn’t make me _happy_, we didn’t have the kind of nasty things that went on before -rc7 and made me worried”. This kernel includes fixes for the netfilter bugs that I discovered, as well as some KMS regression fixes. In a separate discussion thread started by John Hawley (warthog9), it was debated when kernel.org should move over to using xz (LZMA2) as a replacement for bzip2 compression (remember when bzip2 was trendy and new?). John proposed various migration options before the thread verred off into a discussion around when an eventual 3.0 Linux kernel would come, and what that would actually mean in practical terms – just an arbitrary future release? I expect that LWN will have a typically witty writeup of this discussion sometime this week.

Bootmem. Back in October last year, Ingo Molnar had stated that the kernel may not need the “bootmem” allocator on x86. At the time, he noted that there were 5 different allocators on x86, depending upon the boot stage (to say nothing of the other core allocator options): the generic allocator, the early allocator (bootmem), the very early allocator (reserve_early), the very very early allocator (early brk model), and the very very very early allocator (basically just build time allocation). By initializing the x86 page allocator earlier in the boot process, Yinghai Lu attempts to do just what Ingo had suggested, now in version 6 of his patchset.

NFS. Hirofumi Ogawa noticed (2.6.33-rc6) that recent kernels could not mount remote NFS version 3 shares, because of a userspace visible change in the kernel nfsd server. If he specified “vers=3″ at mount time, all was well, but the kernel was not falling back to v3 correctly when v4 fails due to a change in error handling. Bruce Fields noted that this change was actually intentional and that the userspace tools had been updated, but decided to revert the patch that caused this change for the time being – at least until the new versions of the mount tools are much more widespread than right now. Bruce sent a patch entitled (”informingly”) “2.6.33 fix” to Linus.

OOM. David Rientjes posted a patchset re-implementing the OOM killer, in the wake of a number of discussions concerning its brokenness. It includes a complete rewrite of the badness() heuristic, which he is then described in some detail within the corresponding patch. Quoting David, ‘The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of “allowable” memory. ” Allowble,” in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current’s cpuset, or a memory controller’s limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space.”

Performance counters. Christoph Hellwig had complained that a patch had been merged back in September from Arjan van de Ven entitled “perf_core: provide a kernel-internal interface to get to performance counters”. That was intended to facilitate in-kernel use of the performance counters framework, but it was Christoph’s opinion that it had no users and should be reverted. Ingo Molnar countered that there actually were a growing number of users, now including the latest work by Don Zickus to create a generalized NMI watchdog handler.

Relax. Michael Breuer posted an interesting analysis of the implementation of the function cpu_relax on x86 systems. This function is called during spinlock spinning cycles in order to give the CPU a break (power management, etc.). Apparently, that function currently uses a nop, but both the Intel and AMD documentation recommend the PAUSE instruction instead (partly because it can be detected on recent CPUs and used to give special treatment to guest instances running under virtualization that are wasting CPU cycles when multiple vpus are allocated and some are spinning away). Arjan van de Ven, and others too, seemed to find this odd, and Artur Skawina wondered if this might be an odd alignment issue. Nonetheless, Michael detects a noticeable performance impact in various tests between these two instructions.

Stack sizes. The kernel contains various task startup code that will create a vma region for its stack use. Existing kernels make this size determination based upon the PAGE_SIZE for the architecture, even though this really is independent of the userspace code that will use the stack, and even given existing rlimits that might see the stack theoretically larger than has been allowed by system limits. Michael Neuling sent a patch to decouple stack sizing from PAGE_SIZE and to default to basing it upon the rlimit.

SysFS. Amerigo Wang posted an RFC patch implementing “mutable sysfs files”. The basic idea is that all potentially “mutable” (that is to say, files that may be yanked out from underneath at any time a hotplug or other operation occurs) files should use a specific API to avoid warnings.

In today’s miscellaneous items: An interesting discussion started by Salman Qazi (Google) centered around a missunderstanding of the ptrace API (and eventual iteration from Oleg Nesterov that the existing API sucks), a January XFS update from Christoph Hellwig (noting new support for netlink provided quota communication, better power saving in XFS kernel threads), Mel Gorman posted version 2 (v2r12) of his “Memory Compaction” patch series that is intended to “defragment” memory by reconciling GFP_MOVABLE pages, and another one of Al Viro’s entertaining rants, this time about pohmelfs and its use of direct access to the current->fs->{root,mnt} entries.

In today’s announcements:

Git version 1.6.6.2. Junio C Hamano announced an update to the 1.6.6 series of the Git SCM tool, releasing version 1.6.6.2. This contains a few fixes.

Git version 1.7.0. Junio C Hamano also announced version 1.7.0 of the Git SCM had been released. This is the latest official version and includes a number of behavioral changes to “git push”, “git send-email”, and other commands as previously noted in this podcast. Users should read the release notes before upgrading if they want to make sure they catch all of the improvements.

Linux 2.6.32.8. Greg Kroah-Hartman, apologizing for the slight delay due to a few crashes that had been reported and a need to verify a security fix, as well as various travel plans, announced the release of 2.6.32.8. It contains a few fixes 2.6.32 users really should have on their systems.

The Linux Storage and Filesystems Summit. James Bottomley announced that the annual Linux Storage and Filesystems summit will take place concurrently with the VM summit on the two days before LinuxCon in Boston (Sunday and Monday), on the 8th and 9th of August. Interested parties can visit either the Linux Foundation website, or email agenda topics to the program committee at lsf10-pc@lists.linuxfoundation.org.

Userspace RCU 0.4.1. Mathieu Desnoyers announced the latest release of his Userspace RCU implementation (remember, patent encumbered, but with a waiver for GPL projects). Version 0.4.1 contains a compilation fix for s390.

As a followup to last weekend’s kerneloops statistics, Arjan van de Ven also posted statistics purely for the 2.6.33 at that time. In his statistics, he showed that the most popular oops was in memcpy_toiovecend (found 391 times).

The latest kernel release is 2.6.33-rc8.

Andrew Morton announced an mm-of-the-moment mmotm for 2010-02-11-21-15.

Don’t forget to read my latest blog posting on jonmasters.org for more information on using the Cyclades TS-3000 with kgdb for remote target debugging, and don’t forget to support Jason Wessel’s proposed kgdb and kdb merge for 2.6.34. You know it makes sense to get this out there widely.

That’s a summary of the week’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: