Archive

Archive for March 21st, 2010

2010/03/21 Linux Kernel Podcast

March 21st, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100321.mp3

For the weekend of March 21st, 2010, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Linux 2.6.34-rc2, 64-bit system calls, core dumping to a pipe, exported symbols, page cache control, and performance counters for KVM guests amongst other things.

Linux 2.6.34-rc2. Although there is no official announcement as of this writing, Linus’ git tree currently contains a 2.6.34-rc2 release that he created on Friday March 19th 2010 at 6:17pm Best Coast Time (PDT). Once the announcement is officially made, there will be more detail.

64-bit system calls. Benjamin Herrenschmidt raised a question in a thread entitled “64-syscall args on 32-bit vs syscall()”, concerning the ability for existing kernels to handle passing 64-bit parameters to system calls when using a 32-bit userspace. A problem arises on platforms such as POWER and it’s smaller cousin, PowerPC, in which arguments are often passed by register and not on the stack (unless a large number are passed). When passing 64-bit values (as in calling fallocate() within hdparm), GCC may try to use multiple registers (which themselves need to be aligned on even boundaries) to pass a 64-bit value using two sequential 32-bit registers. But the syscall() function within glibc may try to effectively use the same trick again, causing arguments to be off-by-one. Benjamin had a proposal for modifying the existing syscall() interface in a way he thought would be backward compatible (perhaps confined to P{ower,OWER}{PC,} initially) but Ulrich Drepper wasn’t quite so trigger happy to make changes. Peter Anvin favored using explicit versioning to isolate any syscall() interface changes. Separately, Torok Edwin posted some perf (Performance Counters userspace utilities in the “perf” directory) patches enabling callgraph tracing of 32-bit processes when running 64-bit kernels.

Core dumping to a pipe. Neil Horman posted the 4th version of a patch series entitled “exec: refactor how call_usermodehelper works, and update the sense of the core_pipe recursion check”. In addition to addressing some existing race conditions with the implemention, Neil was interested in reworking the call_usermodehelper() function to handle core dumping to a pipe. In the existing arrangement, it is necessary to have all running processes with non-zero core dump ulimits to ensure the pipe dump will work as planned. But Neil has had enough requests to be more flexible, and has come up with the idea of adding a function callback to the call_usermodehelper (umh) that will be made after the task (at this point, in userspace nomenclature, that is just about referable as a process – they are the same however) has been forked but prior to the exec() call starting the userspace code. That function pointer can, in the case of do_coredump, fiddle with ulimits.

Exported symbols. Robert P. J. Day inquired whether the kfifo implementation should really be exporting as many symbols as it does. Tilman Schmidt alluded to the reasoning behind this in mentioning inlined functions. For background, whenever the kernel needs to make use of some function from within modules, that function must explicitly be exported through an EXPORT_SYMBOL or a similar macro definition – simply using the C keyword “static” does not have the desired effect. Sometimes, symbols are exported solely because they are used by corresponding inline functions that are included within module files and need to use the corresponding export. For example, an inline function called “foo”, might need an export “_foo”. In order to clarify the situation, this author suggested a new EXPORT_SYMBOL_INTERNAL export to clearly label these use cases such that symbols are not used where they are not intended.

Page cache control. Balbir Singh posted a patch exposing a cache= kernel command line parameter that can be used to control page cache operation, and effectively disable it entirely in certain situations. This is of particular benefit to virtualized guests (especially those not wanting to enter into direct reclaim frequently), which otherwise might have their pagecache data effectively stored twice – once in the host, and once in the guest itself. Now, there being no such thing as a free lunch, Avi Kivity pointed out that this would slow down guests booted with cache=off because they would now need to use a virtio call to pull in more pages. However, guest memory utilization was shown to fall considerably as might be expected without a page cache. Both Avi and Balbir seemed to agree that the tunable knob allowed for situation specific decisions to be based upon the specific needs of an environment – more overhead in the VM or a slight loss in performance, according to workload, IO types, filesysyems, and a number of other items mentioned by both. Randy Dunlap specifically requested that documentation be added also.

Performance Counters for KVM guests. Yamin Zhang posted a patch entitled, “Enhance perf to collect KVM guest os statistics from host side” intended to facilitate the collection of performance counters statistics from the host when using Linux guest instances, with the exception of guest userspace. Avi Kivity was excited that this patch did not require the exact same kernel on both the host and the guest (he called that “critical”, noting that, “I can’t remember the last time I ran same kernels”). There did seem to be some agreement between both Avi and Ingo Molnar that having a vmchannel client in the host kernel exporting various data for tracing to guest kernels did make life easier for the implementators of such features but potentially opened up another DoS target and needed to be avoided. Instead, Ingo suggested that the host perf tools connect to the qemu instances managing guest instances and communicate over a well-known UNIX socket. The conversation went off onto a tangent about obtaining guest instance information using libvirt, whether there were other tools in common usage to manage guest instances other than starting them directly using the modified qemu, and the relative benefits of shipping all KVM kernel and userspace code in a single project. This gave Ingo an opportunity to get in another mention of what he considers to be “ugly” separation between glibc and the kernel. The entire thread is certainly worth reading, at dozens of posts and likely growing.

In today’s miscellaneous items:

*). A fix for allmodconfig with Xilinx soft core FPGA systems.

*). A device power management documentation update from Rafael J. Wysocki.

*). Version 7 of Andrea Righi’s per memory cgroup dirty page limit patch. Andrea provided some documentation updates that were discussed also. Separately, and on the note of cgroups, the CFG_GROUP_IOSCHED configuration option was made visible in a patch from Li Zefan.

*). A bunch of scheduler and cpusets fixes from Oleg Nesterov, who also noted that there were remaining issues – including a potential lockup in do_fork() caused by receiving a signal from an IRQ or an RT thread pre-emption event because the runqueue lock (rq->lock) cannot be taken in the interim. Oleg asked the maintainers very nicely to please review his patches and comment, although there have been no comments posted in the last week on these.

*). Michael Braun reported an issue involving an interaction (or lack thereof) between the kernel crypto subsystem and the SLOB allocator. He finds that there is “general memory corruption” when using SLOB that isn’t present with the other allocators. Herbert Xu (and by extension, Pekka Enberg, since it was him who inquired as to whether these option were enabled) asked Michael to turn on some allocator debugging options and provide the relevant debugging output to facilitate further analysis.

*). A fix ensuring that legacy PIC interrupts are handled on all CPUs and not just the boot CPU when using the “noapic” kernel boot option from Suresh Sidda. This addresses a bug originally raised by Ingo Molnar.

*). A patch from Dmitry Torokhov re-implementing sysrq as an input handler, rather than as a custom hack in the legacy keyboard driver. Henrique de Moraes Holschuh wondered aloud whether this would introduce any problems for SAK (Secure Attention Key), which should be uninterruptible. That piece seems yet fully resolved in the thread.

*). A patch converting alpha to use clocksource rather than arch_gettimeoffset from John Stultz.

*). A missaligned percpu allocation when using lock events through perf on a particular SPARC box was reported by Frederic Weisbecker.

In today’s announcements:

Kernel.org. John (warthog9) Hawley announced the general availability of various SSL based services on kernel.org. Quoting John, “[t]his should help provide an additional level of security, in particular for our dynamic content like the wiki’s, patchwork and bugzilla”. John noted that the SSL certificates were generously donated by Thawte, and included a quote from the latter in which they state that they are, “proud of [our] open source lineage”. As of this writing, services officially using SSL (through explicit redirection) include Bugzilla, Wikis, Account Requests, Patchwork, while services that can use SSL if requested using the appropriate address do currently include the main www.kernel.org, boot.kernel.org, git.kernel.org, and android.git.kernel.org. Services not using SSL include mirrors.kernel.org (due to the volume of traffic incurred), and the geo-DNS entries because that would expand the number of SSL certificates required unreasonably.

Loop-AES. Jari Ruusu announced version 3.3a of the loop-AES file/swap utility. Details: http://loop-aes.sourceforge.net/

LTP. Rishikesh K Rajak sent an announcement saying that the previous ltp-cvs commit list would be supplemented by a new ltp-commits list that includes git commits also. The name would suggest that it may be somewhat VCS agnostic. Details: http://lists.sourceforge.net/lists/listinfo/ltp-commits

SCST. Vladislav Bolkhovitin posted to announce that the “new SCST SysFS-based interface has become fully usable, so you can start migrating to it and update your target drivers, dev handlers and management utilities”. For further information, please see: http://scst.sourceforge.net/

TCM. Nicholas A. Bellinger announced the release of version 3.4.0-rc1 of the Target_Core_Mod/ConfigFS infrastructure project, which includes a new Open-FCoE.org based target module (tcm_fc) for TCM/ConfigFS 3.x (mentioned in a separate release announcement). As of the latest release, the TCM/ConfigFS project is now tracking upstream Linux development once again. For further information: http://www.linux-iscsi.org/index.php/Target_Core_Mod/ConfigFS

RT 2.6.33.1-rt11. Thomas Gleixner announced the latest RT kernel patch version 2.6.33.1-rt11 is now available. Since he had been traveling, Thomas had made a few interim releases (rt6 through rt11), the sum of which he summarized. For further detail: http://www.kernel.org/pub/linux/kernel/projects/rt

TuxOnIce 3.1. Nigel Cunningham announced the 3.1 release of TuxOnIce. This is a series of alternative software suspend and resume patches that have been out of the kernel tree for some time, but have their various supportors. The latest patches include LZO compression support, UUID support for detecting suspend images without using a resume= parameter, and other fixes.

The latest kernel release is 2.6.34-rc2.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: