Archive

Archive for October 12th, 2009

2009/09/29 Linux Kernel Podcast

October 12th, 2009 jcm No comments

Audio: COMING SOON

For Tuesday, September 29th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Consoles, IO bandwidth throttling, and VMI.

Consoles. Alan Stern raised the issue of console device naming, especially as pertains to removable devices, such as those using a USB interface. He feels that these devices should acquire a name (e.g. ttyUSB0 exposed through /dev/ttyUSB0) that the name should be persistent even beyond the device being removed from the system, and that the kernel should open each console device immediately upon it being registered as a console in order to prevent undelying data structures from being released upon device removal. He has raised this issue on several occasions prior to this. In response, Alan Cox posted how he had originally invesigned something like this could eventually be achieved (fixing the lifetime issue of console devices).

IO bandwidth throttling. Nauman Rafique posted to let Vivek Goyal (and everyone else) know that he and others had discussed the topic of Vivek’s IO controller patches with Jens Axboe at this year’s Linux Plumber’s Conference. In particular, they had discussed Jens’ concerns about the patch sizes, and the soft requirement that the patches work with all existing IO schedulers. Nauman proposes incremental addition of bits of the existing patches. Vivek is on board with just supporting CFQ initially if needed, but wonders whether this will really fly – in other words, whether the implementation shouldn’t be at a level above the IO scheduler (as favored by Andrew Morton). Jens Axboe concurred that he was not in favor of the direct expansion of the CFQ IO scheduler for this support because “some enterprise storages are better performed with NOOP rather than CFQ, and I think bandwidth control is needed much more for such storage system”. Jens suggests making throttling policy user selectable, at a higher layer, like IO scheduler selection today.

VMI. Ongoing discussion continued on the fate of VMWare’s VMI support in the kernel. This is a para-virtualization optimization that is no longer needed for performance reasons and will no longer be supported in future product releases from VMWare. But, as Peter Anvin, Gerd Hoffmann, and others had noted, there are many using VMI on existing systems so the code should not be “zapped” quite so quickly. Instead, Alok Kataria (of VMWare itself) suggested that the intention to remove support be noted in the feature-removal file and recommended that all distributions disable VMI going forward in order for seemless “Live Migration” to work for their customers and users. Arjan van de Ven took the opportunity to also note that vendors have a habbit of ignoring the “default” Kconfig options, so VMWare should not just assume that having an option disabled in Kconfig would automatically be picked up by Linux distros. Chris Wright liked the deprecation idea and suggested a runtime warning.

In today’s pull requests: some percpu fixes from Tejun Heo, some DRM updates from Dave Airlie (including support for video= KMS mode setting on the kernel command line) which were sent twice due to a missing subject on the first attempt, and some networking fixes from David Miller.

In today’s miscellaneous items: a note that the Vmalloc area figures in /proc/meminfo are correct (just refering to the theoretical address space of very many terrabytes) from Kamezawa Hiroyuki, another “lumpy” page writeback patch from Fengguang Wu, a trivial Intel TXT bug warning fix from Shane Wang, a series of hotplug and TSC cleanup patches for dynamic structure allocation and removal at KVM module load and unload from Zachary Amsden, some RCU patches from Paul E. McKenney simplifying rcu_barrier() with the goal of ensuring that offlined CPUs never have RCU callbacks queued, a vsprintf format string option for pretty-printing UUID and GUID values from Joe Perches, the latest version of the “permission masking security module” formerly known as “dpriv” from Andy Spencer, a fix for a mutex locking problem in a previous BKL (Big Kernel Lock) removal patch from Frederic Weisbecker, some fatfs-2.6 patches resent from Ogawa Hirofumi (who also replied to the “Simon and Garfunkel” corruption reported previously), some CPU affinity problems with KVM reported by Haneef Syed, some SCSI header cleanup patches from Michael S. Tsirkin, a suggestion from Arjan van de Ven that GFP_NOWAIT memory allocations are what one poster (who seemed to be partially re-implemeting “perf” due to being unaware of it) wanted in requesting ZONE_NORMAL allocations that might not be available immediately but without using a GFP_ATOMIC flag (which can cause system emergency pools to be used and exhausted and should not be used in general for large allocations), a NULL pointer fix in the swap core from Suresh Jayaraman, some NOHZ performance optimization patches from Martin Schwidefsky, some patches implementing full NAT support for IPVS from Hannes Eder, general discussion of multiple simultaneous port support in virtio_console, some connector permission checking patches from Philipp Reisner, ongoing discussion of Taro Okumichi’s dynamic kernel source browser, and a question from Robert P. J. Day as to whether he should modify his scripts that look for broken Kconfig references to also detect and inform the community about unreferenced header files throughout the kernel tree.

Finally today, a detailed analysis of a hard lockup and ext4 corruption from Ted T’so along with sympathy from Ted surrounding the experience (though of course he was in no way responsible for it). From the sound of it, Ted fingers hardware literally writing to the wrong location on disk (it was an USB drive of some kind according to the original poster, Andy Isaacson, who posted some dumpe2fs output online for Ted – and anyone else – to take a look at).

In today’s announcements: Userspace RCU. Mathieu Desnoyers followed up to previous discussion of the feasibility of a userspace RCU implementation (for doing RCU in application level code) with an implementation called “urcu”. He posted it on the LTTng website and says version 0.1 should work on both x86 32/64 and PowerPC. More information is at http://lttng.org/urcu.

The latest kernel release was 2.6.32-rc1 (remember that EXTRAVERSION unintentionally got set to “rc2″).

Eric Dumazet tracked down a problem in the cmpxchg() function, which doesn’t handle 64-bit values on X86_32 and doesn’t generate an error. He suggests either replacing a use of cmpxchg() in a problem-triggering patch from Peter Zijlstra with cmpchg64(), fixing xmpxchg() to handle 64-bit values, or reverting Peter’s patch. Linus noted the potential for “very nasty silent failure” and really wanted to fix use of cmpxchg() – even if it just generates a warning or link-time failure when used with 64-bit types. Arjan van de Ven suggested using the alternatives() implementation to patch according to the type of CPU found at runtime, for which Linus posted an “untested” patch.

Stephen Rothwell posted a linux-next tree for September 29th. Since Monday, the linux-next fixes tree still contains a powerpc/kvm fix, there is still a reverted scsi commit causing a build failure and there is still a removed patch that had caused various includes of autoconf.h to go away and break various people’s builds. So, very similar to the previous day’s tree. The total subtree count remained steady at 139 trees in Tuesday’s compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/28 Linux Kernel Podcast

October 12th, 2009 jcm No comments

Audio: COMING SOON

For Monday, September 28th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: External process limits, page writeback, and QUEUE_FLAG_VIRT.

External process limits. Neil Horman posted a patch implementing a new procfs interface in /proc/pid/limits, allowing process limits to be set from outside of a running process. As Neil describes, modern tasks (also known as processes outside the kernel) can be long lived and it would be beneficial to be able to set limits without having to kill a task and restart it, just to set a limit. The new interface takes a simple format that can be written using the command line: “ “.

Page writeback. Discussion continued on the best practices for page writeback, or the handling of page of “dirty” memory containing data that needs to be committed back to disk. Specifically, the discussion centered around what priority “background” writes (the periodic writeback of dirty pages) should occur, and how this should contrast to synchronous writes being explicitly requested by a process performing an fsync. Fengguang Wu noted that, as things stand, the kernel makes no distinction for synchronous writes and the code path is still through balance_dirty_pages_ratelimited – Dave Chinner had advocated for differentiated handling between the two different cases by having the VFS layer cease “thow[ing]” away the necessary indication of whether an operation should be synchronous or asynchronous in nature.

This whole debate of course also relates to the wider ongoing issue of IO bandwidth management and the various patches proposed there, too. And on that particular note, Ryo Tsuruta and Vivek Goyal continue their back-and-forth on the relative merits of their differing approaches to the topic. That discussion included debate over fairness to rotational vs. SSD media. One set of patches proposed (by Corrado Zoccolo) would split out IO requests in three different queues (sync sequential, sync seeky, and async queues) which would then be handled alternatively in a round-robin fashion.

QUEUE_FLAG_VIRT. Rusty Russell requested the removal of the QUEUE_FLAG_VIRT patches, which had been intended to (within the context of virtio_blk) cause the immediate unplugging of the corresponding IO request queue, resulting in a lot of overhead on virtualized platforms such as KVM. Although Rusty could not reproduce the “extreme regressions” seen by some Fedora users, he nontheless wanted to remove the patches until such time as the virtio-blk overhead was low enough that this flag made sense (and then would be based upon some feature flag provided by the host kernel rather than being automatic).

In today’s pull requests: some parisc updates from Klye McMartin, some wireless fixes from John Linville, and some PM fixes for 2.6.32 from Rafael J. Wysocki.

In today’s miscellaneous items: a patch implementing support for limiting IRQ affinity to specific CPU domains from Dimitri Sivanich, a patch differentiating fake “injector” Machine Check Exceptions from the real deal so as to not confuse the handler when a real MCE occurs from Huang Ying (which inspired Hidetoshi Seto to post a 5 part patch series based upon it), a discussion of problems with non-atomic page flag motification in the HWPOISON (and other) patches from Fengguang Wu and Andi Kleen, ongoing discussion of how to handle the lack of additional bits in vm_area_struct, a patch moving common histogram functions for the “perf” utility into their own file from John Kacur, some permission elevation issues with the O_NODE open flag pointed out by Jamie Lokier, a patch from Frederic Weisbecker pushing the blk tracepoint calls further down the stack (to avoid ugly locking issues), a percpu trivial patch from Tejun Heo, the “final scan results” from Robert P. J. Day showing the bad Kconfig entries selecting non-existent variables, a confirmation from Martin Schwidefsky that an offending “sched_clock” Make it NMI safe” commit reported by Arjan van de Ven was causing problems on some x86_32 systems, some conspiracy theories surrounding the usurpation of Intel’s TXT (Trusted Execution Technlogy) by removing liquid nitrogen covered RAM sticks from a running system post S3 suspend and recovering the content, and a patch from Randy Dunlap expanding the recommended patch size limit in the SubmittingPatches documentation from it’s value of 40 kB (that was “so last millennium”) to the more respectable 300 kB size we tend to see today.

Finally today, Barry Song posted a patch intended to help toward the Y2K38 problem – the date at which the UNIX 32-bit time_t will overflow (and the entire world will end, at least in the eyes of the media – can you imagine what CNN and Faux News types would do with 3-D holographic TVs showing red neon effects with dire warnings of impending doom in 2038?).

In today’s announcements: Taro Okumichi announced that he had written a “gcc-tracer” and “html-formatter” that could be used to browse kernel source code (currently only init/main.c). I haven’t looked to see how this differs from what LXR has been able to do forever.

The latest kernel release was 2.6.32-rc1. As mentioned by a number of people, Linus had accidentally set the EXTRAVERSION in the kernel Makefile to “-rc2″. In his typical self-deprecating manner, Linus refered to himself as a “moron” and said he’d “try not to do that again”. But he also noted that the git tags were actually correct (so it shouldn’t be too confusing to the history). For his part, Stephen Rothwell noted that he wouldn’t be applying a patch to linux-next to set it to -rc1 in order to ensure all bug reports are “consistently confusing :-) ”.

Frans Pop noted some weird vmalloc numbers in /proc/meminfo and asked: “is it me or are VmallocTotal and VmallocChunk off by a factor of 10,000 or so?”.

Eric Dumazet noted some very unusual process time accounting behavior in 2.6.32-rc1. He posted a reproducer program source file. Linus pointed the finger at some of the usual suspects in noting that overall process times were accurate, but somewhere along the line stats for individual tasks were not.

Michael Tokarev noted some issues on Pentium III systems running 2.6.31. He found that a “real PIII” machine had no problems booting, whereas “pretty consistent[ly]” the machine would hang on boot with an Intel PIII Celeron. He noted that the cpu flags differed between the two in that the Celeron did not list the “apic” flag, even though APIC support was enabled in his config.

Pavel Machek reported a problem with the behavior of CROSS_COMPILE, especially when using ccache. The error message refered to running “make mrproper”, which it also broke, and had a typo in the message itself. “Ouch”, indeed.

Stephen Rothwell posted a linux-next tree for September 28th. Since Sunday, the fixes tree for linux-next had a build fix for powerpc/kvm, a scsi commit causing boot failures got reverted and a patch removing various includes of autoconf.h that caused subsequent build failures was also removed. The total subtree count remained steady at 139 trees in the Monday compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: