Archive

Archive for October 5th, 2009

2009/09/23 Linux Kernel Podcast

October 5th, 2009 jcm No comments

Audio: COMING SOON

For Wednesday, September 23rd, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Modules, dpriv, and running out of space in vm_flags.

Modules. Alan Jenkins and Rusty Russell discussed Alan’s previous patches that would sort the kernel built-in symbol map so that it could be binary searched at module load and dynamic linking time. Rusty thought it might be easier to sort the in-kernel symbols on boot, but Alan pointed out that this could add as much as 7ms to system boot times as “pure overhead” (in his hacked up prototype), and so for this (and other reasons) continues to favor having the kernel symbols be sorted at build time. The module loader will then have a much easier time resolving symbols and more quickly linking modules later. Separately, Tim Abbott sent Alan (and everyone else) an initial lib/bsearch.c patch implementing a generic binary search function for the kernel since “there a[re] a large number [of] hand-code binary searches in the kernel…[and] getting binary searches right is difficult”.

A Privilege dropping security module. Andy Spencer posted to let everyone know that he was working on a new LSM (Linux Security Module) called dpriv. This module (which can be used by any user, not just root) creates a dynamic runtime “policy” that does not implement MAC (Mandatory Access Control) but instead can be used simply to drop specific access rights at runtime. As an example, Andrew shows how one can cause the permissions on the root filesystem to be dropped by writing into /sys/kernel/security/dpriv/stage (followed by writting a “commit” into the /sys/kernel/security/dpriv/control “file”). Currently, only file permissions can be controlled using “dpriv”, and its author would like to know what you think about it.

Out of space in vm_flags. Nigel Cunningham posted asking for advice now that the new VM_MERGEABLE flag in post-2.6.31 has taken the last bit in vm_flags. Nigel has some code in his alternative suspend framework (TuxOnIce) that needs a bit too, which could be solved by adding a new long, but at least one kernel function is being passed the flags value directly and so would need to have its prototype and behavior changed.

In today’s pull requests: some tracing fixes for -tip from Frederic Weisbecker, “two radeon fixes” in the DRM tree from Dave Airlie, some input updates for 2.6.32 from Dmitry Torokhov, some S+Core patches for 2.6.32 from Liqin Chen, round 2 of OCFS2 changes for 2.6.32 (with the reflink() system call removed for the moment to avoid contention) from Joel Becker, some writeback fixes for 2.6.32 from Fengguang Wu, some lguest and virtio fixes for 2.6.32 from Rusty Russell, some USB patches for 2.6.32 from Greg Kroah-Hartman (containing “lots of usb stuff all over the map”), some wireless updates from John Linville, some plan9 filesystem changes for 2.6.32-rc1 from Eric Van Hensbergen, and some NFS client cleanups and bugfixes from Trond Myklebust.

In today’s miscellaneous items: a kmemleak fix from Roland McGrath, version 2 of a patchset convering IA64 over to dynamic per-cpu from Tejun Heo, a patch cleaning up orig_ax handling in getreg() (for e.g. ptrace/core-dump fetches) from Roland McGrath, a patch implementing the previously discussed TRACE_EVENT_ABI (using some suitably cunning macros) from Steven Rostedt by way of Arjan van de Ven, a suggestion that (after 10 years) it might be about time to remove the gcc option “-Wdeclaration-after-statement” since C99 has been around long enough at this point from Amerigo Wang, a patch adding a generic method of sending quota message warnings to userspace from Steven Whitehouse (for non-dqout filesystem use), some memory leak fixes from Jiri Slaby, a patch changing the kernel side of the sys_truncare/sys_ftruncate system calls to avoid what he deems a needless unsigned->signed->unsigned conversion cycle from Heiko Carstens, an RFC userspace RCU implementation from Mathieu Desnoyers “(ab)using futexes to save cpu cycles and energy”, and a patch changing some KSM defaults to “better fit into mainline kernel” now that KSM is in the mainline tree from Izik Eidus.

In today’s announcements: linux-trace-users. David Miller noted that he has created the linux-trace-users mailing list on vger.kernel.org (in reply to Steven Rostedt) for discussion of user issues relating to tracing and the various tracing tools.

SystemTAP 1.0. Josh Stone announce the (very long anticipated) 1.0 release of SystemTAP. This release features experimental support for unprivileged users, cross-compiling for foreign architectures (which gdb has supported forever), and a lot more besides.

The latest kernel release was 2.6.31.

Chris Malley was experiencing machine hangs when using “perf sched record” for which a patch from Peter Zijlstra did not seem to make a difference.

Stephen Rothwell posted a linux-next tree for September 23rd. Since Friday, the ocfs2, jdelvare-hwmon, block and usb trees gained problems, while the input-current and rr trees lost theirs and the kmemcheck tree needed an “obvious fix”. The drbd tree was dropped due to a build problem. The total sub-tree count is listed as 140, but does not seem to account for the tree that was removed in the compose. Stephen reminds everyone not to begin pushing patches for 2.6.33 until 2.6.32-rc1 has been released, and also reminds everyone that conflicts are bouncing between trees as Linus merges.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/22 Linux Kernel Podcast

October 5th, 2009 jcm No comments

Audio: COMING SOON

For Tuesday, September 22nd, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Fanotify, and modules.

NOTE: Several kernel hackers were at or en route to LPC (Linux Plumbers Conference) and so the traffic volumes were affected.

Fanotify. Fanotify is a new framework originally authored by Eric Paris (thanks a lot!) in order to help “anti”-malware vendors intercept and authorize certain file operations on the fly, using a generic interface. But fanotify is intended to be useful for other purposes, as Jamie Lokier pointed out (at the same time as re-affirming his total lack of interest in malware scanning uses thereof) was his reason for “sticking his oar” in recently. He is keen to see non-malware uses (improved inotify and other userspace indexing services, such as update) work out right. Davie Libenzi wondered whether, since the malware scanners generally favor “hacking” the syscall table, Linux should just provide a “non racy” mechanism for interrupting and monitoring system calls as some kind of ptrace-on-steriods perhaps. Argument continued on this and many other topics, including whether literal pathnames were important or whether on-access scanning tools should have to look them up if they care, and on whether ioctls or “idiotic packet interfaces” like netlink were best.

Modules. Alan Jenkins posted some module patches intending to speed up symbol resolution during load. The current Linux module implementation has modprobe (or perhaps insmod, if modprobe is not being used) simply load an ELF image containing a number of sections that the kernel will then link itself (in contrast to the old days, when it was all linked in userspace). Alan’s patches sort the tables of builtin kernel symbols so that the module loader can resolve against them using a standard binary search at load time. Using these patches, Alan has elimated 20% of the CPU cycles and 0.3 seconds of real time for a system boot on his EeePC 701 system. Kudos to Alan, once again.

In today’s pull requests: some kmemcheck updates from Vegard Nossum, some tracing/workqueue fixes from Anton Blanchard, another round of RFC patch from Zhang Rui implementing the ALS (”Ambient Light Sensor”) sysfs class driver, version 4 of his “compcache” compressed swap patches from Nitin Gupta, some performance counters (”performance events”) fixes from Ingo Molnar, some timer updates from Thomas Gleixner, some tracing/kprobes updates from Frederic Weisbecker, some s390 patches from Martin Schwidefsky, some sound patches from Takashi Iwai, and some regulator patches for 2.6.32 from Liam Girdwood.

In today’s miscellaneous items: some performance events fixes for powerpc from Paul Mackerras, an endorsement for a dedicated tracing mailing list from Li Zefan and Avi Kivity (both in reply to Steven Rostedt, Avi kicking things off with an initial question to boot), some futex cleanups (and also a race fix) from Darren Hart, version 5 of the RFC cpuidle POWER infrastructure patches intended to allow flexible management of idle policy from Arun R Bharadwaj, a “philosophical” question concerning which of two uaccess.h (linux or asm) headers should be included from Robert P. J. Day, some S+Core patches from Liqin Chen (including header files in that architecture’s linker script), an MCE error injection fix in the face of real errors from Huang Ying, a patch implementing a new “kcoredump” module that uses kprobes to perform a kernel “core dump” anywhere within the kernel (not as in the existing implementation of /proc/kcore) from Hui Zhu, another RFC patch implementing a SCHED_EDF (”Earliest Deadline First”) scheduler from “Raistlin”, version 3 of his RFC SLBQ on memoryless node configurations patches from Mel Gorman, some further linker script cleanup patches (facilitating ksplice integration) from Tim Abbott, an admission that once all other users of CONFIG_PARAVIRT are gone even lguest may not be enough to keep it around(!) from Rusty Russell, and patches containing the latest implementation of the Ceph distributed filesystem client from Sage Weil.

The latest kernel release was 2.6.31.

Shaohua Li reported a regression in page writeback statistics on a test system featuring 12 disks in kernels after a specific commit, but could not figure out an immediate fix for the issue, and so sought further comments. Xiaotian Feng reported a problem running “startx” to start an X session on the latest git tree, attributing it to an issue with KMS on a Fedora 11 system (to which Peter Zijlstra suggested either disabling KMS or updating the Fedora system). Darren Hart reported that he is hitting a repeating BUG on boot on his Thinkpad T60p when running the latest 2.6.31-rt11 preempt-rt kernel (which Clark Williams thinks he saw previously but thought was fixed now – and so Clark requested that Darren send him his kconfig to verify).

Stephen Rothwell announced that there would also not be a linux-next release for September 22nd – he was still feeling under the weather.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/21 Linux Kernel Podcast

October 5th, 2009 jcm No comments

Audio: COMING SOON

For Monday, September 21st, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Optimization, performance events, and trace events.

Optimization. Johannes Buchner posted some comparisons between kernel builds with varying optimization levels, pre-emption settings, and IO scheduling policy as pertained to overall system boot time. Overall, his figures indicated that optimization level and pre-emption setting had “no significant influence on speed”, while “CFQ let [his] system boot several seconds faster”. He posted some graphs on his blog. Of course, these figures are only from a single system, but they may be of interest to some others. Arjan van de Ven was interested that the IO scheduler mattered, given that (s)readahead is supposed to help with this, to which Johannes replied that he wasn’t using readahead for his measurements (CFQ apparently wasn’t improved using readahead, whereas the other scheduling algorithms might have been).

Performance Events. Ingo Molnar posted a merge request intending to rename the “performance counters” to “performance events” in light of the ever-expanding all-encompassing nature of the subsystem formerly known as “performance counters”. The tools remain unchanged (as does the ABI that they rely upon), and the rename is largely symbolic in terms of correcting a “missnomer”, largely done using a script that Ingo also included in his posting.

Trace events. Arjan van de Ven posted to bring up a suggestion that Ingo Molnar had made previously, involving the creation of a TRACE_EVENT_ABI, which would be equivalent to TRACE_EVENT except that it would signal a stable interface. But he was running in to some issues where TRACE_EVENT was being defined differently “all over the place”, leading to “really nasty hack[s]” just to make an alias. He wondered if Steven Rostedt had any clever ideas for making an alias “without fouling up the whole tracing system”.

In today’s pull requests: some DRM fixes from Dave Airlie (containing “the main chunk of the drm changes for 2.6.32″ – Ed Tomlinson wondered what was needed to actually use the R300 3D features), some UBIFS and UBI patches for 2.6.32 from Artem Bityutski, some writeback fixes from Jens Axboe, some x86 fixes from Ingo Molnar, some tracing fixes from Ingo Molnar, some scheduler fixes from Ingo Molnar, some performance counters fixes and updates from Ingo Molnar, some core kernel fixes from Ingo Molnar (including a bunch of RCU updates that came from Paul E. McKenney), some “performance events” patches from Ingo Molnar, some HID fixes from Jiri Kosina, sine trivial fixes from Jiri Kosina, some kbuild fixes from Sam Ravnborg (including kconfig refactoring), some firewire updates from Stefan Richter, some xen updates from Jeremy Fitzhardinge (including a fix for stack protector NX support on 64-bit processors that either don’t have the feature or have it disabled in the BIOS on those PC-BIOS systems), and some ioat/async_tx fixes for 2.6.32 from Dan Williams (the Intel one).

In today’s miscellaneous items: ongoing discussion of the best mechanism for implementing a callback when a swap slot is freed, a fix for a hardware erratum issue affecting AMD 813x rev. B1/B2/etc. parts that won’t generate interrupts when using legacy boot quirks from Stefan Assmann (who continues the fight against legacy “boot interrupts” – thanks!), a fix for a rare case when stable_tree_insert() finds a match when the prior stable_tree_search() did not occasionally causing a page leak from Hugh Dickins, helper functions for data filling of seq_file buffers without directly exposing the internal implementation from Miklos Szeredi (apparently suggested by Al Viro), some concerns about the mmapstress03 test in LTP having some “weirdness” from Geert Uytterhoeven, some wonderings whether wake_up_new_task really needs to play with task priorities from Peter Zijlstra (in reply to comments originally raised by a curious Peter Williams), an “alternative implementation” to handle d-cache aliases in performance counters without having to change how x86 does regular allocations (allowing such architectures to avoid unnecessary vmalloc, but necessitating a difference from e.g. sparc, which does) from Peter Zijlstra, a suggestion that a warning be printed whenever attempting to use kernel headers that have not been installed from Arnd Bergmann, version 2 of an RFC patch intended to allow use of SLQB on architectures that allow memoryless nodes to be installed from Mel Gorman, a patch adding a tracepoint for block request mapping from Jun’ichi Nomura, a patch increasing MAX_STACK_TRACE_ENTRIES from John Kacur (to Ingo Molnar – intended to avoid problems with lockdep running out of entries and falling over), and a discussion of when to perform access checks in fchdir from Trond Myklebust and Jamie Lokier.

The latest kernel release was 2.6.31.

Ingo Molnar pointed out that an earlier regression in tty_open last reported in 2.6.31-rc9 was still occasionally rearing its ugly head in -tip testing. Heiko Carstens reported that the latest git tree occasionally saw the “events” kernel thread running on the wrong CPU(!) on s390 with a default kconfig, but it turned out that this was already in a patch heading “Linuswards” later on in the day (Separately, Heiko posted a patch always showing cpus_allowed in /proc/ /status).

Stephen Rothwell announced that there would be no linux-next tree for the 21st, and possibly the 22nd also as he was “a bit under the weather”.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/20 Linux Kernel Podcast

October 5th, 2009 jcm No comments

Audio: COMING SOON

For the weekend of September 20th 2009, I’m Jon Masters with a summary of the weekend’s LKML traffic.

In this weekend’s issue: devtmpfs, external module building, NOHZ, and PAUSE.

Devtmpfs. Various discussion continued concerning the allegedy “broken by design and implementation” devtmpfs support in recent kernels. Eric W. Biederman’s initial (highly critical and overtly negative) comments gave way to a general discussion as to whether waiting for certain hardware to stabilize could be pushed back out into userspace by means of modifying utilities such as modprobe. I am reviewing that particular thread at the moment since I missed out on it the first time around while at Plumbers. Separately, Kay provided some driver core patches providing devtmpfs with non-default permissions for device nodes.

External module building. Discussion continues about building modules (and other code) outside of the kernel. The original discussion had pertained to kernels built without certain optimization preferences, but had proceeded toward a general argument against certain practices common amongst those building out of tree pieces, such as expecting certain symlinks (e.g. /lib/modules/${krel}/build/arch/${arch}/include) to be present. Arnd Bergmann stated that he was considering adding a warning when -D__KERNEL__ is not set and one attempts to make use of the kernel header files. He would like to know if this would break any legitimate users (other than “make headers_install”, which he already made an allowance for).

NOHZ. Martin Schwidefsky reposted an RFC patch series adding “arch_needs_cpu” and implementing essentially the notion that one need not make the kernel enter into its “tickless” mode if the CPU did some work during the last tick period. In the case that the CPU does go truly idle, this does cause an unnecessary additional tick, but it has been shown to improve performance in Martin’s testing, especially in the s390 case that he is working on in general. He requests feedback.

PAUSE. Mark Langsdorf posted a patch enabling support for the “Pause Filter” in modern (”family 0×10 models 8+”) AMD CPUs. This feature provides a new field in the VMCB called “Pause Filter Count” and will keep a record of the number of times a PAUSE instruction occurs, which can optionally trigger a PAUSE intercept that the kernel can use to keep track of heavily contended spinlocks. In testing, Mark finds that most spinlocks are held for only around 1000 PAUSE cycles, so he defaults the threshold for reporting to 3000 cycles in order to detect contended spinlocks. This looks like a nice feature, which Mark is using with virtualized guests in order to force the yielding of an obviously busy VCPU that is just waiting on a contended spinlock variable.

In the weekend’s pull requests: some sh updates for 2.6.32-rc1 from Paul Mundt, some md updates for 2.6.32 from Neil Brown (mostly RAID6 offload), version 5 of the S+Core architecture support tree from Liqin Chen, some watchdog updates from Wim Van Sebroeck, some 2.6.32 ext4 updates from Ted T’so, some x86 platform support for 2.6.32 updates from Thomas Gleixner (laying the groundwork for upcoming Intel Moorsetown support – as Thomas has previously commented upon in variously varied ways), some i2c updates for 2.6.32 from Jean Delvare, some ISDN patches for 2.6.32 from Tilman Schmidt, some tracing/syscalls patches from Frederic Weisbecker, some ACPI and SFI patches for 2.6.32 from Len Brown, some timechart patches from Arjan van de Ven (who later added a new “perf timechart record” patch, ’similar to “perf sched record”‘), some tracing cleanups for 2.6.32 from Steven Rostedt, some performance counters patches for 2.6.32 from Ingo Molnar (who notes that two of the commits had “mingo” as the owner due to a broken git configuration), some driver core and TTY patches from Greg Kroah-Hartman, some i2c patches from Ben Dooks, and some includecheck fixes from Jaswinder Singh Rajput.

In the weekend’s miscellaneous items: an iocontroller patch fixing a system hang from Gui Jianfeng, a couple of tracing (profiling) updates from Frederic Weisbecker (sent a couple of times for good measure), some ftrace updates from Li Zefan, a new –input option for the performance counter tools allowing one to pass input files from Mike Galbraith, some feedback from Ryo Tsuruta to Vivek Goyal concerning some benchmarks he had previously done of Ryo’s work on the ioband competing patches to [Vivek's] io controller patch series, a crash report from Ingo Molnar in -tip testing (dev_attr_show), a series of MCE ring buffer fixes from Huang Ying, the removal of markers from Ingo’s tree by way of him applying Christoph’s “kill markers” patch, a typically very helpful reply from Alan Jenkins (to Vprabu Vprabu) on the nature of loadable modules and how they work in recent kernels (as well as how they fit in with hot/coldplugging – Kay Sievers added some additional comments too), an EDAC build error fix from Ted T’so, a patch preventing vgacon_deint from touching hardware in the case of inactive consoles from Fancisco Jerez, version 2 of the Dynamic Logical Partitioning support patches from Nathan
Fontenot, a patch for m68knommu to allow kernel command line parameters to be passed through from the uboot firmware bootloader from Lennart Sorensen, an RFC “hatchet job” for SLQB on memoryless node configurations (as pertains to PPC and s390 systems) from Mel Gorman (this is not a full implementation yet though, since it only boots on “at least one machine” Mel has available), a patch preventing the scheduler from immediately rescheduling a yielding process if another process is available from Mark Langsdorf, some more linker script cleanup patches (facilitating ksplice integration) from Tim Abbott, a criticism of “global” events in fanotify from Andreas Gruenbacher (since “virtual machines” in separate namespaces with confuse matters), another round of ISCSI TCM/ConfigFS patches from Nicholas A. Bellinger, some patches removing a perl dependency (always a good idea) introduced in 2.6.25 (making it possible to build at least a “bootable” kernel without perl) from Rob Landley, some interesting patches proporting to migrate x86 Intel systems with fewer than 8 logical CPUs over to the “flat mode” APIC routing (which I didn’t think was necessary until you had more than 8 logical CPUs – but maybe it’s not worth having the two different modes any more?) from Suresh Siddha, a new per-cpu notifier that is called whenever the kernel is about to return to userspace from Avi Kivity (sounds very useful), a note from Avi Kivity that the KVM folks reached a similar conclusion to the VMWare guys that many of the paravirtualization hooks are no longer necessary for performance reasons over native EPT/NPT support (causing Alok Kataria to send along the patches actually removing the VMWare VMI code, for general review – and causing Ingo Molnar to remind everyone that this needs to be handled “carefully” over a few kernel cycles with a proper sunset period), fscache support for plan9 filesystems from Abhishek Kulkarni, a patch exporting a couple of tracing symbols otherwise leading to build errors from Peter Zijlstra, some comments on the modules.builtin implementation from Sam Ravnborg, a patch adding coretemp support for the Core i5 (Lynnfield model 0×1E) CPU from Robert Hancock, and a question as to what the pgpgin/s and pgpgout/s columns in “sar” should actually be measuring from Ted T’so.

The in weekend’s quote of the day: Greg Kroah-Hartman responded to some rather hostile comments from Eric W. Biederman (refering to devtmpfs coolaid) with the line “Oh, we have official team drinks now? Great, sign me up, can I pick a t-shirt logo as well? :) ”.

In today’s announcements: fopsbench. Tobias Oetiker announced the release of a new benchmarking tool for measuring the response time of various filesystem operations, which he calls “fopsbench”. He includes various example output, though I admit to being unclear whether bonnie++ already provides these.

2.6.31-rt11. Thomas Gleixner announced version 2.6.31-rt11 of the preempt-rt patchset, which (amongst other things) includes a latencytop fix and some IRQ fixes. There are still some scheduler issues that Peter Zijlstra is continuing to work on.

The latest kernel release was 2.6.31.

David Miller reported a regression on sparc64 that he suggested might be attributed to recent percpu changes from Tejun Heo, which he verified by reverting the offending commit in his local tree. Robert Hancock reported a problem with recent git trees failure to boot with a DMAR error on his Intel Lynnfield CPU-based Asus laptop.

Stephen Rothwell posted a linux-next tree for September 18th. Since Thursday, the input-current and rr trees had issues, while the blackfin, ext4, nfsd, xfs, and i2c trees lost their issues. The tree continued to have problems bouncing from one sub-tree to another as Linus continued merging for .32. The total tree count in the latest compose remained at 140 trees, and Stephen repeated calls to avoid adding features intended for 2.6.33 until after the RC1 of 2.6.32 was out.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: