Archive

Archive for September, 2009

2009/09/15 Linux Kernel Podcast

September 26th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090915.mp3

NOTE: We’re 10 days behind, but you’ll notice we’re moving several days forward at a time, and so should be up to date early next week. Patience really is a virtue! (you’re welcome to volunteer to help me out!). I will record and upload the missing audio versions on my day off on Monday, as I prepare to return from the Linux Plumbers Conference.

For Tuesday, September 15th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Mudflap, POSIX O_SYNC vs. O_DSYNC, and speeding up kernel compiles.

Mudflap. Janboe Ye posted version 3 of the “mudflap” patch series. This is a patch series intended to catch use-after-free type situations (in previously SLAB allocated memory regions) by taking advantage of the gcc “mudflap” code generation feature to catch certain kinds of memory access (using hardware breakpoints, and other underlying hardware features) and verify that they are not attempting to access data that has previously been explicitly freed. The latest version includes further efforts at architecture independence following previous comments from Pekka J. Enberg on the previous iterations.

POSIX O_SYNC and O_DSYNC semantics. Christoph Hellwig posted the latest iteration of a patch series implementing full O_SYNC support (as opposed to the existing behavior within the kernel in which O_SYNC requests are actually implemented in the form of O_DSYNC semantics). This patch series relies upon previous work done by Jan Kara and has been discussed several times before.

Speeding up kernel compiles. Ozan Caglayan posted a question concerning prefered methods for speeding up kernel compiles. He suggested various methods that one might use (building inside a – RAM-backed – tmpfs, using ccache, switching to a performance power management governor, passing -j to gcc, and so forth), but wonders out loud what others are doing. Specifically, Ozan would love to know if others are really using icecream or distcc for their daily test builds.

In today’s pull requests: some security and credential fixes (including various KEYS fixes from David Howells) from James Morris, some percpu fixes for 2.6.32 from Tejun Heo (including sparse conguent allocation in the kernel’s vmalloc area, and a note that all arches other than IA64 are now using the new percpu data allocator introduced previously), some IDE fixes from David Miller, some UWB fixes from David Vrabel, an official request to merge the new DRBD distributed replicated block device patches from Philipp Reisner, round 1 of some hwmon updates for 2.6.32 from Jean Delvare, some tracing fixes from Steven Rostedt, a large number of staging patches from Greg Kroah-Hartman (who recently posted over 700 patches to his list), some SLAB fixes from Pekka J. Enberg, some

In today’s miscellaneous items: some ftrace documentation updates from Mike Frysinger (Steven Rostedt also posted some documentation updates), the addition of histograms showing potential and effective wakeup latencies (especially in the preempt-rt kernel tree) from Carsten Emde, some further /proc/kmem cleanups (and some HWPOISON bits also) from Fengguang Wu, version 2 of the previously covered cpuidle menu governor performance optimization patches from Arjan van de Ven, a patch enhancing support for catching unsupported relocations when using CONFIG_RELOCATABLE on PowerPC 64-bit systems from Ben Herrenschmidt, a trivial performance counters fix (for a buffer overflow in perf_copy_attr) from Xizo Guangrong, a patch adding further use of unreachable() to the MIPS architecture from Ralf Baechle, a tracing patch to set_pid_ftrace from Jiri Olsa adding the ability for ftrace to simultaneously trace multiple independent processes, some ftrace and systemtap integration patches from Atsushi Tsuji, version 3 of a patch series implementing support for choosing which power state a POWER (pseries) CPU will enter when going offline (affording greater flexibility in allowing a CPU to remain assigned to a particular LPAR or be returned to the pool), version 3 of the post-merge per-bdi writeback patches from Jens Axboe (to which Nick Piggin followed up with a 5 part patch series of fixes), and the latest iteration of an RFC patch series from Corrado Zoccolo implementing scalable CFQ IO slice sizing proportional to the number of processes performing IO operations.

In today’s announcements: Linux 2.6.27.34 and Linux 2.6.30.7. Greg Kroah-Hartman announced the latest round of stable kernel releases – 2.6.27.34 and 2.6.30.7 – and encouraged existing users of these kernels to upgrade. He has also recent begun contributing to an LWN column entitled “ask a kernel hacker” in which he talks about the process for stable kernel releases.

2.6.31-rt10. Thomas Gleixner announced version 2.6.31-rt10 of the preempt-rt Real Time patches. The latest version updates to 2.6.31, includes timekeeping and locking fixes, a KVM fix, tracing, performance counters fixes, and more. There are several known issues, including a scheduler load balancing problem that Peter Zijlstra has been looking at for a while. Thomas later followed up to note that, as mentioned elsewhere, latency histograms are back!

The latest kernel release was 2.6.31.

Andrew Morton posted some 2.6.32 merge plans for -mm, noting that there are an unusual amount of actual memory management patches in the tree.

Chuck Ebbert reported a regression in 2.6.30. In this kernel (and presumably also in more recent kernels, including the recently released 2.6.31) binfmt_misc is taking precedence over binfmt_script as a binary format handler when a miscellaneous handler (and pattern) are defined that should otherwise be caught and handled by the generic script handler.

Ingo Molnar reported that the previous regression involving SLAB corruption was more repeatable than once in 1000 reboots. It turns out that he had another machine that managed to fall over, also in bdi_alloc_work, during an overnight test run. Ingo goes through a process of elimination (non-SMP, different distros, different hardware, etc.) and concludes that this is a BDI bug, although he doesn’t have much else other than logs to go on.

Tobias Oetiker draws attention to “unfair IO behavior” for high load interactive use in kernel 2.6.31. He points out that his use case is different from similar reports, since he has a busy NFS server with a lot of processes accessing many small and medium sized files for reading and writing (as opposed to the case of, for example, several concurrent dd processes performing read and write operations). Tobias notes that iostat reports “huge” wMB/s and “ridiculously low” rMB/s.

Stephen Rothwell posted a linux-next tree for September 15th. Since Monday the sh, rr, and drbd trees gained issues, while the net, input, block, slab, and block trees lost their issues. The total sub-tree count remains steady at 140 trees in the latest linux-next compose.

Stephen repeated previous warnings that code intended to be merged into 2.6.32 should not be pushed for linux-next inclusion until after 2.6.32-rc1 has been released. He also noted that conflicts continue to bounce between trees as Linus continues to perform various merging activity.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/14 Linux Kernel Podcast

September 24th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090914.mp3

For Monday, September 14th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFS, Fanotify, Huge pages for device drivers, and kthreads.

CFS vs. BFS. Nikos Chantziaras announced that Phoronix has published some comparison benchmarks between the current mainline CFS task scheduler, and Con Kolivas’ recently announced “BFS” task scheduler. The benchmarks aim to cover “real-life” applications, such as timed compilations, a game, email benchmarks, and so forth. The results speak well for BFS, but this is not a surprise since the tests were performed on an (Ubuntu) desktop system with a hardware configuration friendly toward the design goals for BFS. Without meaning to imply any bias, it is this author’s opinion that such tests should be expanded to cover much larger systems before drawing generalizable conclusions. Having said that, BFS clearly has desktop uses.

Fanotify. Eric Paris wondered aloud whether he should migrate the fanotify implementation away from a combination of use of socket, bind, and setsockopt calls over to set-in-stone system calls. A total of 9 new system calls would be required, and Eric obviously expresses hisitation toward this approach, but he is seeking a “ruling from on high” as to the best direction here.

Huge pages for device drivers. Alexey Korolev posted version 3 of a three part patch series aimed at implementing support for huge TLB mappings for the sharind of data between device drivers and userspace. This is especially useful when a large amount of data needs to be shared, as is the case with (for example) video capture, frame buffer, and other devices. The patches are layered upon Eric Munson’s earlier ANON_HUGETLB patchset. Alexey also posted an example HugeTLB driver that makes use of the latest patch series.

Kthreads. Pavel Vasilyev sent a complaint that the calls to set_user_nice configuring all kernel threads to run with a -5 nice level by default in kthread.c had been removed by Ingo Molnar. In fact, they had been “removed” by Mike Galbraith in the course of another patch since he felt that most kthreads didn’t use enough CPU that their weighting was really an issue in getting their work done. This was disputed by others, including Chris Friesen. Pavel posted some benchmarks and discussed headed toward deciding default priorities for certain kernel threads in particular, such as ksoftirqd.

In today’s pull requests: three viafb trees from Jon Corbet, version 4 of the S+Core architecture support patches from Liqin Chen (based upon work he has been doing with Arnd Bergmann and based against 2.6.31-rc7), some credential and SELinux fixes (originally posted by Eric Paris) from James Morris, some input updates for 2.6.32 from Dmitry Torokhov, a merge plan for sh support in 2.6.32 from Paul Mundt (who says, “better late than never..”) (including support for various new boards and processors, runtime PM support, and a “shiny” new DWARF unwinder), some GFS2 updates from Steven Whitehouse, some nilfs2 updates from Rysuke Konishi, a simple ring buffer PowerPC build failure
fix from Steven Rostedt, some DLM updates for 2.6.32 from David Teigland, some O_SYNC cleanups from Jan Kara, some tracing fixes from Steven Rostedt, part 1 of a number of AMD64 EDAC updates from Borislav Petkov, some SLAB allocator “fixes and cleanups” from Pekka J Enberg, some UDF fixes from Jan Kara, some KVM updates from 2.6.32-rc1 from Avi Kivity (simply another iteration of the previous patches, also including a reminder that Marcelo Tosatti is joining Avi as co-maintainer for KVM), some PM updates for 2.6.32 from Rafael J. Wysocki, a number of updates to Blackfin architecture support for 2.6.32 from Mike Frysinger, some x86/txt (Intel “Trusted eXecution Technology”) patches from Peter Anvin, and another round of security credential fixes from James Morris (including several fixes to the KEYS support that have been posted recently by James Morris).

In today’s miscellaneous items: some fixes to /dev/mem supporting partial read and write operations from Fengguang Wu, a patch to incorporate the currently set ARCH environment variable in the deb-pkg build target by Wei Chong Tan, some tracing fixes from Li Zefan, a macro fix from Ferenc Wagner correcting problems experienced by macro users of the container_of macro, version 2 of the post merge per-bdi writeback patches from Jens Axboe (the aforementioned per-bdi writeback flusher patches having now made their way into mainline – separately Jens wondered what bdev->bd_inode_backing_dev_info was ever intended to be used for), signal tracing in ftrace from Jiri Olsa, the latest version of the IO bandwidth controller and BIO tracking patches (dm-ioband and blkio-cgroup) from Ryo Tsuruta, a number of kbuild updates from Andi Kleen, version 2 of the __builtin_unreachable patches from David Daney (who seems to be in contact with Roland McGrath over Roland’s alternative implementation that only supported the x86 case), version 5 of the reflink system call from Joel Becker (allowing copy-on-write reference-counted links to files), latency tracing histogram support (targeting the preempt-rt kernel patches) for displaying potential and effective wakeup latencies from Carsten Emde, and some further /proc/kmem patches from Fengguang Wu.

Finally today, Nicolas Pitre noted that he has a new email address. This is due to some “problems at cam.org”. His new address is nico@fluxnic.net.

In today’s announcements: The IO Controller Mini-Summit 2009. Ryo Tsuruta announced that there will be a dialin number for those wishing to participate in the IO Controller Mini-Summit, running concurrently with this year’s kernel summit in Tokyo, Japan, on October 17th. Mike Snitzer was one of many who wondered (but in this case, aloud, on LKML itself) why discussion of the various IO controller proposals had to wait until after the 2.6.32 merge window closed, was taking place off LKML, and was happening at a venue where limited participants will be able to attend to offer their input.

The latest kernel release was 2.6.31.

Andrew Morton posted an mm-of-the-moment for 2009-09-14-01-57.

Ingo Molnar found a SLAB corruption bug being reported by kmalloc that happens after about “1000+ successful random bootups” and so is “not bisectable at all”. He believes that it may be related to ongoing security related SLAB troubles, although it doesn’t appear to be directly SELinux at fault. Stephen von Krawzynski reported an issue with the 2.6.31 IPv4 implementation as compared with 2.6.30.X kernels when configuring a particular vlan setup.

Stephen Rothwell posted a linux-next tree for September 14th. Since Friday, the sparc, nfsd, security, tip, and oprofile trees lost their issues, while the rr tree still has a build failure and the block tree gained another. Stephen repeats his standard warning that developers not post patches intended for 2.6.33 until 2.6.32-rc1 has been released, and also notes that a number of build failures are currently bouncing from one tree to another.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/13 Linux Kernel Podcast

September 24th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090913.mp3

I’m at the Linux Plumbers Conference, uploading a few updates between tracks. We’ll get there – sorry I’m behind again, but this isn’t an easy task!

For the weekend of September 13th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Cpuidle, Dynamic Logical Partitioning, Localmodconfig, RO/NX protection for loadable kernel modules, and Timechart.

Cpuidle. Arjan van de Ven posted a patch implementing a new variant of menu cpuidle governor to boost IO performance, “[balancing] power savings, energy efficiency and performance impact.” The raison d’etre for this patch series stems from poor performance of the existing cpuidle governor code on recent Intel Nehalem server systems. Arjan includes some benchmarks, showing as much as 1.8 times performance improvement over the existing code, with only a very minimal overhead as compared with running a system without dynamically changing power cstates.

Dynamic Logical Partitioning (DLPAR). Nathan Fontenot posted a five part patch series implementing “Dynamic Logical Partitioning” (DLPAR) support for IBM pseries PowerPC 64-bit systems supporting dynamic hardware provisioning (addition, removal, and reconfiguration of processor and memory resources via the underlying hypervisor). The patches implement a number of probe/release sysfs files on such systems that can be used to signal the addition or removal of CPU and memory resources as required.

Localmodconfig. As covered previously, Steven Rostedt has implemented a Kbuild extension providing a Kconfig build target of “localmodconfig”. This will build a kernel that includes modular support for all locally present hardware, and is useful for kernel testing (without requiring a full build with support for non-present hardware). Steven posted an official merge request for 2.6.32.

RO/NX protection for loadable kernel modules. Siarhei Liak posted version 6 of a patch series implementing RO/NX page level protections for loadable kernel modules. This patch is a logical extension of the existing in-kernel page level protections for kernel executable code, but now adds such protections to loadable kernel modules by way of modifying the kernel module loading code.

Timechart. Arjan ven de Ven, known for his previous work using utilities such as bootchart (a tool to visualize the system boot process) announced the release of a new “timechart” utility that can be used to visualize exactly what is happening on a running system. It builds upon various timestamp counters added to the existing performance counters code, and a few other changes that Arjan has proposed to be added to the performance counters “perf” userspace utility.

In today’s pull requests: some updates to infiniband for 2.6.32 from Roland Dreier, some updates to ummunotify, which is a new character device that allows a userspace library to register for MMU notifications, also from Roland Dreier, some updates to libata-dev from Jeff Garzik (mostly trivial and driver updates), some powerpc updates for 2.6.32 from Ben Herrenschmidt (which Ben points out include a dependency upon various IOMMU and swiotlb bits currently in Ingo’s tree – so he is happy for Linus to delay pulling until after pulling from Ingo’s tree), some kmemleak patches from Catalin Marinas, some s390 patches for 2.6.31+ from Martin Schwidefsky (including “call home support”), the merge plan for linux-2.6-block bits in 2.6.32 from Jens Axboe (including updates to CFQ, cleanup of BIO IO flags, and various others), some tracing updates from Steven Rostedt, an official request to pull the per-bdi writeback flusher threads patches from Jens Axboe (including various benchmarks), some sound updates for 2.6.32 from Takashi Iwai, an official merge request for the HWPOISON patch series from Andi Kleen (these extend existing MCE handler coverage to catch transient bit errors in memory pages and take pre-emptive action by killing or otherwise signalling tasks experiencing memory corruption, in certain cases), some NFS client changes for 2.6.32 from Trond Myklebust, some OCFS2 updates for 2.6.32 from Joel Becker, some networking and SPARC updates (including initial LEON processor support from Konrad Eisele, basic software performance counters support from Jens Axboe, and very simple hardware counter support for UltraSPARC IIIi and Niagara-2) from David Miller, some KVM updates (including support for MCE injection, irqfd/ioeventfd mechanisms for communicating with guests, “unrestricted guests” on Intel systems – improving real-mode support – nested SVM improvements, syscall and sysenter emulation support, support for 1GB pages on AMD systems, and x2apic support for improved SMP performance all around) from Avi Kivity, some documentation updates from Jonathan Corbet, an initial round of SCSI updates for 2.6.32 from James Bottomley, some suspend updates from Rafael J. Wysocki, and several ftrace updates from Steven Rostedt (including a new mechanism to automate discovery of the output format being used by ftrace).

In today’s pull requests from (the village of) Ingo Molnar: core, debug, futuxes, Oprofile updates for 2.6.32, some threaded IRQ updates for 2.6.32, performance counters updates for 2.6.32, scheduling updates, tracing updates for 2.6.32, x86 updates for 2.6.32, and probably a lot more.

In today’s miscellaneous items: a patch splitting out separated read and write statistics for in flight block IO (bio) requests from Nikanth Karthikesan, a fix to fanotify support in the networking tree adding defines for the fanotify socket number declarations from Eric Paris, a fix to the io controller group code “root only” group optimization from Gui Jianfeng, some /dev/mem cleanups from Fengguang Wu, version 20 of the per-bdi writeback flusher threads patch series from Jens Axboe (mostly a rebase to 2.6.31 – Jens also posted a series of patches separately that should be applied post-merge of the per-bdi writeback flusher threads patch series), a patch removing automated scaling of readahead size on md if the RAID chunk size is greater than or equal to 4MB from Fengguang Wu, some tracing cleanups from Jiri Olsa, some documentation updates from Randy Dunlap, a kernel-level configfs enabled generic target engine for Linux (allowing for persistent registrations and multipath configurations within the ISCSI stack) from Nicholas A. Bellinger (he includes links to online wiki resources providing documentation), an RFC patch series implementing a trace module extension for crash (allowing for flight-recorder style usage) from Lai Jiangshan, support for timestamps on fork events in performance counters from Arjan van de Ven (as a required dependency of his timechart tool), an update to fix a situation in which creds->security could be NULL if SELinux is disabled from Eric Paris, some tree RCU fixes from Paul E. McKenney, some trivial warning cleanups from Filipe Contreras, some RFC PCI/ACPI run-time PM patches from Rafael J. Wysocki, and some ongoing discussion of the BFS.

In today’s announcements: Linux 2.4.37.6. Willy Tarreau announced version 2.4.37.6 of the Linux kernel. This release focuses mostly on fixing the various vulnerabilities causing information leakage to userspace, and so it is obviously a good idea to upgrade if you are running a 2.4 series kernel.

Git 1.6.4.3. Junio C Hamno announced the release of version 1.6.4.3 of the GIT SCM utility used in Linux kernel development (and other projects). The latest release includes a fix for an unnecessary error message during the git clone operation of an empty directory, and a number of other fixes.

The latest kernel release was 2.6.31.

Gene Heskett reported a potential regression on 2.6.31, which occurs when enabling the latency tester, and manifests in the form of zeroed CPU usage statistics reported in the gkrellm utility. Tetsuo Handa noticed a potential memory leak in load_module that was being reported by kmemleak. Parag Warudkar noted that the boot time trace testing recently increased from 3 seconds to 41 seconds and seems to be performing some tests three times.

Stephen Rothwell posted a linux-next tree for September 11th. Since Thursday, the rr and block trees have build failures, which the scsi-post-merge tree lost its build failure. Stephen reminds everyone that they should not begin pushing patches intended for 2.6.33 into linux-next until at least 2.6.32-rc1 has been released. The total sub-tree count remains steady today at 140 trees in the latest linux-next compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

Podcasting on the train

September 18th, 2009 jcm No comments

I am headed from Boston, MA to Portland, OR for Linux Plumbers Conference, on the train, following the Lewis and Clark trail and reading their journals along the way. Looking forward to meeting more victims of my summaries next week :)

I have a cache of LKML traffic that I will write summaries for along the way, but at the moment I have only a non-tetherable iPhone and cannot upload until Monday. But on the plus side, the view is beautiful. Have a great weekend!

Jon.

Categories: Uncategorized Tags:

LKML Podcast Update – 2009/09/17

September 17th, 2009 jcm No comments

There have been a number of DoS attacks taking place against the server hosting this site, which recently resulted in a spate of OOM conditions on the virtual machine. Today’s OOM killing of mysql resulted in a small database corruption that is now corrected. Those visiting the site earlier would have seen a wordpress configuration webpage, caused by the “wp_options” table being corrupt.

Sorry for the disruption, and keep listening!

Jon.

Categories: Uncategorized Tags:

2009/09/10 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090910.mp3

For Thursday, September 10th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: BFS, Checkpoint and restart, MMAP and Performance Counters.

BFS. Jens Axboe posted a link to his new “latt” tool that he has been using to perform some scheduling latency benchmarks and comparisons between BFS and the mainline scheduler, since it was of interest to a number of folks. He has since converted the link to a file explaining where to find the new git tree containing the source, which is not on the standard kernel.org website. On the subject of BFS, Ingo Molnar posted another round of scheduler comparison benchmarks entitled bfs-vs-tip-oltp-v2 in which he thanked Con Kolivas for providing incentive to examine scheduler latencies once again, but noted that Con’s alternative BFS “isn’t particularly strong in this graph” either.

Apologies to those who disliked my previous BFS commentary. No source of information is completely unbiased and I do feel it completely appropriate to discuss any potential performance issues without restraint, however I do not want to offend anyone too much in the process.

Checkpoint and restart. Sukadev Bhattiprolu posted an RFC patch series with an updated version of his new clone_with_pids system call. This is used in the latest incarnation of checkpoint and restart patches to re-created tasks within a given namespace using the same process IDs as were in use prior to taking a checkpoint. Obviously, such support is a precursor to tasks being restarted without explicitly supporting a change in process descriptor ID.

MMAP. Brian McGrew posted asking a question about creating large shared page mappings and the overhead incurred in doing so. He is replacing previous use of physical mapped memory (this is presumably involving an embedded device) with a form of software emulation in which many tasks will share the same direct physical pages via mmap. He finds that creating 4MB, 16MB, 64MB or even 256MB mappings is fine, but allocating 1GB introduces huge overhead. It is very likely (in my opinion) that he is on a 32-bit system and isn’t locking every page using an mlock, and a few other things. But perhaps this is some other issue that is worth looking into.

Performance counters. Masami Hiramatsu posted some updates to the kprobes based event tracer which will allow users to add trace events dynamically on ftrace and use those events with the new performance counters “perf” tools. This patch series continues the trend toward turning perf into a swiss army knife of Linux kernel interaction – and who knows where it might end. We had another such example also from Frederic Weisbecker, who posted an RFC patch series implementing hardware breakpoints on top of performance counters.

In today’s miscellaneous items: some tracing and ring buffer updates for 2.6.32 from Steven Rostedt, some trace filters updates from Tom Zanussi, an Android build fix from Kosaki Motohiro, some gconfig build updates disabling “typeahead find” search in treeview from Diego Eli Petteno, an update on GFS2 from Steven Whitehouse (in which he essentially says the tree will be as it is now unless last minute bugs are reported), some crypto updates for 2.6.32 from Herbert Xu (including a completed hash algorithm transition over to shash), some internal PCI hotplug interface cleanups from Alex Chiang, some cpuset and hotplug fixes from Oleg Nesterov, and some /dev/mem (and also /dev/kmem) cleanups from Fengguang Wu.

Finally today, Andreas Mohr posted some weird Xorg tty experiences from 2.6.31-rc6, which is likely so ancient at this point that it has long since been fixed in the recent tty layer work.

The latest kernel release is 2.6.31.

Andrew Morton released an mm-of-the-moment for 2009-09-09-22-56.

David Tees posted a question concerning an ext4 error he was seeing in his logs from ext4_mb_generate_buddy. He wondered if anyone had suggestions concerning how serious this actually is, and what to do other than his anticipated reboot and fsck cycle.

Zhenyu Wang sent a very detailed followup addressing why some folks might have experienced strange “blanking” problems on MacBook 2,1 systems running 2.6.31-rc7. This was due to an issue with the Intel 945GM chipset and the way that the MacBook integrated TV DAC routed signals. His description was quite elaborate, and he apologized for the delay in providing this helpful detail.

Greg Kroah-Hartman posted some stable review patches for the forthcoming 2.6.27.34 and 2.6.30.7 stable series kernels. The deadline for posting replies has already lapsed at this point, however. One wonders if the review window could be slightly larger anyway.

Stephen Rothwell posted a linux-next tree for September 10th. Since Wednesday, the acpi and security-testing trees lost issues, while the rr, block, and scsi-post-merge trees had some issues. The total sub-tree count remains steady at 140 trees in this compose.

Stephen reminds everyone (in a thread entitled “linux-next: merge window reminder”, and in today’s linux-next announcement) not to add code intended to hit 2.6.33 until 2.6.32-rc1 has been released, so that folks adding bits for post-rc1 have a chance.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/09 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090909.mp3

For Wednesday, September 9th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Linux 2.6.31, Compache, MMAP, and unreachable code.

More on Linux 2.6.31 in a moment, but first these other top stories.

Compcache. Nitin Gupta posted version 2 of his “compcache” compressed in-memory swap device. This is used preferentially prior to a backing disk since it is faster and can store more data in a compressed form than would be the case in simply having more free memory in the system pagecache. Since the previous release Nitin has switched to using struct page references rather than 32-bit PFNs (to make the code 64-bit safe), and a variety of other cleanups. Testing shows up to a 33% performance improvement in certain idealized test conditions. Presumably this is now targeting 2.6.32.

MMAP. Lee Schermerhorn noticed some “very erratic behavior” affecting certain (AIM7) workloads on a distribution and mainline kernels, chiefly larger systems such as on an 8-socket, 32-core 256GB of RAM x86_64 platform. Lee notices a coment in mm/mmap.c:vma_adjust suggesting that there isn’t a need to take the anon_vma lock when only adjusting the end of a vma (as with brk()). The comment “questions whether it’s worth[while] to optimize for this case” but “apparently, on the newer, larger, x86_64 platforms, with interesting NUMA topologies, it is worth[while]“. The patch is a one-liner, but can double performance for the test workload, or at least stabilize the results.

Unreachable code. Roland McGrath posted a two part patch series introducing an UNREACHABLE macro that can be used to inform GCC that a particular code path cannot be reached in normal code execution. Although GCC itself has heuristics to determine when this is the case, it cannot catch assembly level impacts or certain other side-effects. Roland suggests folks begin looking for infinite for loops in the kernel and start to replace them since it takes a bit of enlightened reasoning to make the changes beyond a simple find/replace. He starts off in patching the BUG() macro to use his UNREACHABLE macro.

In today’s miscellaneous items: an update to the documentation for procfs covering the additional “time spent by a cpu servicing a guest” in /proc/stat from Eric Dumazet, an update concerning hid in 2.6.32 from Jiri Kosina (including mention of a rewrite of the debugging stub), a question about turning off ext4’s delayed allocation features from Clemens Eisserer, a trivial aoe fix from Jens Axboe, updated support for the “switch” command within compliant SD cards from Wolfgang Mues, some writeback fixes from Fengguang Wu, some updates concerning the sound tree in 2.6.32 (chiefly these will comprise driver updates, and many users won’t notice that), a trivial fix freeing the old name within kobject_set_name in the case of ENOMEM from Sebastian Ott, some internal PCI interface cleanups from Alex Chiang, some Xen bugfixes addressing spinlock bugs and stackprotector support from Jeremy Fitzhardinge, some cleanups to trace.h from Li Zefan, a fix to an unintended behavioral change in net_device_ops from Martin Decky, a fix for paravirt ops alternatives patching on 486 systems (prevously failing in text_poke_early) from Ben Hutchings, and a fix to ensure the raw_time clocksource is updated in timekeeping_suspend from Janboe Ye.

Finally today, Ingo Molnar replied to the “Epic regression in throughput since v2.6.23″ thread from Serge Belyshev with an asertion that he believes he has found the issue and has a fix in -tip that should be of interest. He would like folks to re-test and see if these improve scheduler performance.

In today’s announcements: Linux 2.6.31. Linus Torvalds announced the release of version 2.6.31 of the Linux kernel. In pointing to the kernelnewbies.org website for the full breakdown of changes, Linus took the opportunity to call out a few specifics. Amongst these were the “painful” changes to the new fsnotify backend to both inotify and dnotify, ongoing work on KMS, debug and performance counters work, and much much more. Linus announced the opening of the 2.6.32 merge window, but with the caveat that folks really should wait a few days to test and play with 2.6.31 before moving on to 2.6.32.

Greg-Kroah Hartman announced stable kernel release 2.6.30.6 and 2.6.27.32, both containing a raft of updates, followed later in the day by 2.6.27.33, which contains a fix for building ocfs2 that some folks were hitting.

The latest kernel release is 2.6.31, released at 16:06 (BCT).

David Miller noticed that __hw_perf_counter_init on x86 systems might be leaking active_counters on error condtions, causing the LAPIC NMI watching to never get re-enabled even after all performance counters users go away.

Stephen Rothwell posted a linux-next tree for September 9th. Since Tuesday, the acpi, rr, security-testing, and scsi-post-merge trees had issues, while the async_tx, wireless, drm, tip and tty trees lost their issues. The total sub-tree count remains steady at 140 trees in this compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: Uncategorized Tags:

2009/09/08 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090908.mp3

For Tuesday, September 8th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: BFS, Inotify, OCFS2, and Tasklets.

BFS. Frans Pop posted an email thread entitled “Another BFS versus CFS shakedown” in which he says that he tried “very consciously” to pay attention to interactivity. His results seem to show what others have implied – that BFS falls down in many literal timing tests but seems to nonetheless offer a very smooth and interactive desktop experience, which is what Con was getting at in posting the proof of concept. Frans BCC’d Con, since he wasn’t sure whether he would want to actually participate in an LKML discussion.

Inotify. Giuseppe Scrivano posted a patch intended to extend inotify to support file descriptors in addition to plain old paths. The example cited is watching standard input from within GNU tail, in which case tail must perform an entirely different internal process for watching the standard input stream because it is not necessarily represented by a known file path. The proposal is to add a new system call entitled sys_inotify_add_watch_fd, which does roughly what it would imply.

OCFS2. Joel Becker posted an update about forthcoming OCFS2 patches, in which he noted that he currently has 85 patches queued up for the forthcoming 2.6.32 merge window, and that that will probably grow to over 100 patches. Amongst these is a “big ticket” item in the form of the reflinkat() system call, which had been discussed at this year’s filesystem workshop and is mentioned in this week’s Linux Weekly News.

Tasklets. Steven Rostedt replied to the ongoing backlash against using tasklets as interrupt handler “bottom halves” and the ascertion from Stephen Hemminger that using process context for such processing is too slow, with a note that he plans to present on just this topic at LPC (Linux Plumbers Conference), demonstrating that process context is far from “too slow”. Meanwhile, Ingo Molnar pointed out how one might use performance counters on Intel systems to produce real-life measurements of any overhead.

In today’s miscellaneous items: some performance counters updates from Markus Metzger that split sample creation and output functions for performance, a fix for randomized stack configurations such that the kernel won’t accidentally pick an unfortunate mmap_base address starting in the stack reserved area from Michal Hocko, version 19 of the per-bdi writeback flusher threads patches from Jens Axboe, a fix for a PCI reference leak in the quirks code from Jiri Slaby, a fix to ensure data stored into an inode is properly seen before it is unlocked (fixes a corruption issue with ext3 over NFS) from Jan Kara, support for D-cache aliasing CPUs (such as many SPARCs) from David Miller, version 3 of a patch adding FAT root directory timestamps to the volume label from Jorg Schummer, a question concerning limiting the DMA mode picked for legacy IDE devices from Alan Stern, an ACPI 4.0 compliant power meter from Darrick J. Wong, a patch to make tmpfs depend upon shm from Hugh Dickins, some rcutorture updates from Paul E. McKenney, version 0.14 of the Ceph distributed filesystem, a fix for paravirt-alternatives suppport on i486 systems since these older processors like the 486 don’t necessary invalidate pre-fetched instructions possibly containing paravirt ops information from Ben Hutchings, Jon Corbet posted some very helpful flexible array documentation, some compiler (fPIC) options checks from Jory A Pratt, an August XFS status update from Christoph Hellwig (including 2.6.32 comments), a respin of the data=guarded patches for ext3 filesystems from Chris Mason, and a question concerning matainance plans for 2.6.27 after 2.6.32 is released from Luis R. Rodriguez.

In today’s announcements: 2.6.31-rc9-rt9.1. John Kacur announced version 2.6.31-rc9-rt9.1, since Thomas Gleixner was on vacation. This is largely the same as the rc8 tree but contains a couple of other fixes also.

The latest kernel release is 2.6.31-rc9.

Serge Belyshev posted an email thread entitled “Epic regression in throughput since v2.6.23″ in which he suggests a 10% performance degradation in tests between 2.6.23 and 2.6.31. He also comes out in favor of BFS, but it isn’t clear what kind of hardware he is using, nor how scalable the figures are.

Stephen Rothwell posted a linux-next tree for September 8th. Since Monday, the edac-amd tree has been removed temporarily (at the request of the maintainer), the v4l-dvb and trivial trees lost conflicts, while there remain a number of issues with acpi, async_tx, wireless, drm, security-testing, and scsi-post-merge. The total subtree count falls to 140 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/07 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090907.mp3

For the US Labor Day weekend of 2009, I’m Jon Masters with a summary of today’s LKML traffic. Happy Labor Day everyone. I spent my weekend in Maine, hiking the Knife Edge of Katahdin with my favorite AMC hiking buddies.

In today’s issue: BFS, boot interrupts, KVM, modules, tasklets, and VFS.

BFS. There has been some debate recently (and LWN has great coverage of) a new scheduling algorithm proposed by Con Kolivas (who pops up every few years in between getting disgruntled and saying that he won’t do so again) called the “Brain Fuck Scheduler”. It is intended to be really simple, and the initial posting came with all kinds of assertions about how it would perform better under typical desktop load conditions. Ingo Molnar got around to perfoming some tests under various workloads and found that, quoting, “BFS is slower than mainline in virtually every measurement” that he performed. He also discovered it performed worse in desktop interactivity tests, but he encouraged others to not take anyone’s word for it and run their own tests – including links to current versions of the upstream scheduler and Con’s patches. It should be noted that Ingo is the maintainer of the upstream scheduler and so is bound to be in part refering to himself.

Boot interrupts. Stefan Assmann posted a two part patch series disabling boot interrupts on Intel X58 and 55×0 systems. These are necessary, as he reminds everyone, because systems will otherwise generate legacy compatible interrupts that will simultaneously arrive at both the PIC and primary IO-APIC, even the former is not in use and the latter’s corresponding line is masked. Needless to say, this can and has caused some pretty hairy issues (especially for the RT kernel) and so patches like these are most welcome. These patches, like his others, poke at generally hidden PCI configuration devices that must first be made accessible before allowing disabling of boot interrupts.

KVM. Jan Kiszka inquired of K. Prasad as to his ongoing work into implementing a generic hardware breakpoint support infrastructure that could be used to also handle the contextual save and restore of hardware debug registers upon switching within KVM from host to guest CPU environment. Avi Kivity had, apparently, previously suggested that these registers might be restored from current->thread.debugregX without having to explicitly save/restore, but Jan feels it might be better to just do this generically in K. Prasad’s code.

Modules. Michal Marek posted a two part patch series modifying kbuild to generate modules.builtin files that can be parsed by module-init-tools and used to recognize drivers and other optionally modular components that have in fact been built into the running kernel. This then allows those to be listed within lsmod and other tools.

Tasklets. Luis R. Rodriguez raised the issue of using tasklets as containers for “bottom half” (a reference to old-school top/bottom half handlers) interrupt handler processing. He cites an older LWN story on the efforts by Steven Rostedt and others to remove tasklets or otherwise move them into a process context of their own. Luis feels there is particularly no reason for tasklets in wireless drivers and that instead much of the work can be moved out to a process context. This will of course be easier if the interrupts can be threaded and could do all of their handling without separate contexts, though this is not always possible in performance critical situations.

VFS. Linus Torvalds posted an 8 part patch series aimed at cleaning up VFS name lookup permission checking, with the stated goal of eventually doing multiple path component lookups in one go without taking the per-dentry lock or toggling the per-dentry atomic count for each component. The existing code is pretty horrific in terms of cacheline “ping-pong” on the common top-level dentries that “everybody looks up” and Linus is already able to show a roughly 3% performance hit on a single-socket Nehalem system. Included within his patch is an observation that there was never a need for the IMA code to call ima_path_check repeatedly during path lookups, only on the final path.

In today’s miscellaneous items: Jason Gunthorpe posted some scathing commentary of the existing implementation of pubek sysfs file for reading from TPM, and a bunch of fixes to “do it again”, some input updates from Dmitry Torokhov, a fix implementing “make file.s_c” building of dual C and assembly hybrid files from Amerigo Wang, the second patch series for ioatdma implementing RAID5/6 offload support from Dan Williams (in followup to the previous day’s patches), the 18th version of the per-bdi writeback flusher threads from Jens Axboe, a lot of helpful cleanups (mostly to x86) from Jan Beulich, some performance counters fixes from Ingo Molnar, some AMD-IOMMU passthrough support patches (iommu=pt) and page table/page fault handling updates for the same from Joerg Roedel, version 6 of the crashkernel=auto patches from Amerigo Wang, version 2 of an RFC patch series reducing the number of calls to global_page_state from balance_dirty_pages to reduce cache pressure from Richard Kennedy, some SPARC and networking updates from David Miller, a common method for reading and parsing user input within the tracing code from Jiri Olsa, a patch adding a boot option to disable the automatic VT cursor on boot (for use with graphical splash screens) from Matthew Garrett, some RFC sysfs documentation also from Matthew Garrett, version 3 of a patch removing a sleep in TASK_TRACED under a lock known as ->cred_guard_mutex from Oleg Nesterov, a single PCI fix for broken resource alignment calculations from Jesse Barnes, a cpuidle fix from Sanjeev Premi, some directory lookup optimizations for the performance counters perf tool from Ulrich Drepper by way of Arnaldo Carvalho de Melo, some tracing fixes from Frederic Weisbecker, a critical OCFS2 fix for rc8 from Joel Becker (correctly the handling of cancel requests rather than erroring out), a series of 18 patches from Steven Rostedt that had started out as a simple bugfix but turned into a significant rework to better handle switching per-cpu ring buffers, a new CROSS_COMPILE option in kconfig facilitiating easier configuration of a cross compilation environment from Roland McGrath, a fix for ext2_rename correcting unbalanced use of kmap and kunmap (causing pkmap slots to get exhausted) from Nicholas Pitre, a fix to the RCU kconfig help text from Valdis Kletnieks, a SLUB RCU fix for 2.6.31 from Pekka J. Enberg, some firewire fixes from Stefan Richter, another version of a patch adding support for LZO-compressed kernel images from Albin Tonnerre, an update on the async_tx.git/next tree and merge plans for 2.6.32 from Dan Williams (the Intel one), some USB console fixes correcting an oops from Jason Wessel, an update on a suspend saga affecting the Sharp Zaurus from Pavel Machek, a fix for building User Mode Linux with bash 4 from Paul Bolle, some minor firewire fixes from Stefan Richter, some IDE patches from David Miller, an important IMA security fix from James Morris, a large number of linker script fixes and cleanups from Tim Abbott, some drm fixes for 2.6.31 final from Dave Airlie, perf trace filtering support from Li Zefan, and some documentation updates rendering consistent the default mountpoint for making available debugfs from GeunSik Lim, a patch to fix error handling in load_module from Kamalesh Babulal, version 5 of the clone_with_pids() system call from Sukadev Bhattiprolu, a fix for the case where ACPI state C2 is mapped to C3 from Luming Yu, a patch adding locking to ext3_do_update_inode to avoid a race from Chris Mason, some fixes for handling hot remove of mmaped files from Eric W. Biederman, a summary of the current VFS scalability queue from Nick Piggin, and a patch from Adrian Hunter aiming at making write_cache_pages more sequential in flushing back pages.

In today’s announcements: Linux 2.6.31-rc9. Linus Torvalds announced the release of version 2.6.31-rc9 of the Linux kernel. He was originally planning on shipping a final 2.6.31 already, but some fundamentals (such as broken inotify support) necessitated holding off for a few more days. He requests a final round of testing prior to the 2.6.31 release.

util-linux-ng 2.16.1. Karel Zak announced util-linux version 2.16.1. The latest release includes a number of updates, amongst them a modules.dep parser that is particularly hairy but unfortunately “necessary” for ext2/3/4 detect.

The latest kernel release was 2.6.31-rc9, which was released on Saturday.

Rafael J. Wysocki posted a list of regression from 2.6.30 to 2.6.31-rc9. These include 27 unresolved issues at this time, including inotify and page allocator problems that aren’t closed yet. The outstanding list of regressions between 2.6.29 and 2.6.30 also contained 27 items, and almost all of them are driver issues dating back for some time.

Tarkan Erimer reported an oops in the ALSA stack when running 2.6.31-rc7-git1-rt9. Christoph Lameter reported 5 second “hiccups” on CIFS with 2.6.31-rc8. Luis R. Rodriguez passed along some kmemleak reports from 2.6.31-rc8 which were affecting process_zones(). Gene Heskett couldn’t reliably run 2.6.31-rc9 without various segfaults taking down his mail.

Greg Kroah-Hartman announced the 2.6.27.32 and 2.6.30.6 stable review patches.

Stephen Rothwell posted a linux-next tree for September 4th. Since Thursday, the xfs, acpi, security-testing, and staging trees had issues (all new, except for acpi). The total sub-tree count remains steady at 141 trees.

Stephen Rothwell posted a linux-next tree for September 7th. Since Friday, the acip, async_tx, mtd, battery, slab, trivial, percpu, tty, and scsi-post-merge trees gained issues, while security-testing lost its build failure but gained another for which Stephen reverted the offending commit. The total sub-tree count remains steady at 141 trees in the latest compose.

Valdis Kletnieks reports some “weirdness” in linux-next affecting KVM and bisected back to a patch from Beth Kon entitled “KVM: PIT support for HPET legacy mode”, which is causing hangs or triple fault reboots on a Dell Latitude D820 laptop.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/03 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090903.mp3

For Thursday, September 3rd, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFQ, Matchreply, PCI, RCU tree scalability, and Tracepoints.

CFQ. Corrado Zoccolo posted an RFC patch series modifying the CFQ IO scheduler to adapt its slice slice dependent upon the number of processes that are currently performing IO. Effectively, rather than using fixed time slices, the IO time slice is scaled to a faction of the number of processes performing IO and rescaled whenever that changes. The attached figures appear impressive.

Matchreply. Tejun Heo posted a simple script that he has been using “for a couple of years now” to solve the problem of receiving many duplicated messages from different mailing lists. It indexes Maildirs and hooks into procmail to catch duplicates and redirect them into a separate folder.

PCI. Tejun Heo posted a two part patch series splitting out pci_add_dynid support from store_new_id such that in-kernel code can add PCI Ids dynamically. It will be used by pci-stub to initialize intial IDs via module parameters and allows one to (for example) prevent built-in drivers from attaching to devices with certain IDs handled by loadable modules.

RCU tree scalability. Paul McKenney replied to Nick Piggin’s earlier RCU tree scalability concerns, saying that he believes that Nick is routinely driving up the number of callbacks queued on a given CPU to above 10,000, which would cause excessive calls to force_quiescent_state (400,000 calls per second, for example). He removes the grace period machinery from rcutree __call_rcu, which apparently was a previous effort to avoid implementing synchronize_rcu_expedited.

Tracepoints. Jason Baron, the cunning fox that he can be, posted a 4 part patch series implementing a new “jump label” optimization for tracepoints. The current tracepoint code is implemented using a global variable conditional for each tracepoint, which can become painfully hairy under memory pressure or with large numbers of tracepoints built into the kernel. To better handle this, in discussion with Roland McGrath and Richard Henderson, Jason and co. created a new “asm goto” statement that allows branching to a label. Using some code patching they effectively make switching tracepoints on and off a simple case of patching a jump instruction, conditionally.

In today’s miscellaneous items: some kmemleak patches from Luis R. Rodriguez, some networking updates from David Miller, some sound updates from Takashi Iwai, some AMD Magny-Cours CPU support fixes from Andreas Herrmann, some block fixes for 2.6.31 from Jens Axboe (fixing the max_sectors_kb greater than 512KB issue mentioned previously), another bug report against reading /proc/kcore from Nick Craig-Wood, version 3 of Peter Zijlstra’s load-balancing and cpu_power patches, a fix to allow setrlimit on non-current tasks from Jiri Slaby, a fix to avoid sleeping in TASK_TRACED under the ->cred_guard_mutex lock from Oleg Nesterov, version 3 of the VMware virtual HBA support patches (including relatively minor fixes since version 2) from Alok Kataria, a fix to avoid truncation of the value in abs() if it is greater than 2^32 from Rolf Eike Beer (on 64-bit systems), a bunch of suggestions for asm-generic update candidates in various architecture trees from Robin Getz, and the latest round of rants about Linux software RAID (but on that subject, Dan Williams posted a 29 part patch series beginning the road towards RAID support in ioatdma).

Finally today, Amerigo Wang posted a series of patches inplementing gcov support within kbuild such that “make foo/fbar.c.gcov” becomes possible.

In today’s announcements: Autofs version 5.0.5. Ian Kent announced version 5.0.5 of the autofs utilities. It’s been a long time, apparently, but better late than never, and that update seems fairly comprehensive.

The latest kernel release was 2.6.31-rc8.

Frank A. Kingswood reported another “inconsistend lock state” regression against 2.6.31-rc8, complete with a backtrace, in the JBD code.

Andrew Morton released an mm-of-the-moment for 2009-09-03-16-35.

Greg Kroah-Hartman posted an update on the staging tree for the upcoming 2.6.32 merge window. He reminds everyone that staging is not a dumping ground for dead code (citing the Ethernet Power Link driver as an example of an unmaintained driver that will be removed in the .32 cycle and warning that Android and others face a similar fate in the not too distant future if nothing changes soon).

Stephen Rothwell posted a linux-next tree for September 3rd. Since Wednesday, the xfs, and net trees lost their issues, while the acpi, security-testing, tip, percpu and sfi trees gained several problems. The total subtree count remains steady at 141 trees in the latest compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: