Archive

Archive for September, 2009

2009/09/02 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090902.mp3

For Wednesday, September 2nd, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Fsync, memory controller groups, and tree RCU scalability.

Fsync. Christoph Hellwig noticed that there is a disconnect in necessary fsync handling between older and newer filesystems. Many modern filesystem only update and write out metadata once other IO commits have taken place. They sometimes implement a wait inside their ->fsync methods but this is suboptimal because it happens under the i_mutex lock and must wait for an entire file to be flushed out. Instead, it can be preferable to simply wait for data writeout completion within O_SYNC handling prior to calling ->fsync. This is what Christoph’s patch does in modifying vfs_fsync_range. He includes a mini-audit of the impact upon existing filesystems and any necessary actions.

Memory cgroups. Kamezawa Hiroyuki notes that there are a few scalability issues with the current res_counter charge and uncharge accounting functions in the memory controller groups code, especially lock contention. He believes that there is a chance to perform batch-uncharge by building up a list of pages that have been affected by paging and accounting them at the time when other large chunks are processed (as a result on unmapping, truncation, at task completion, and so forth). Since it is late in the 2.6.31 cycle, he is willing to wait until the floodgates have opened for 2.6.32. Separately, Kamezawa also cleaned up multiple calls to res_count_soft_limit_excess.

Tree RCU scalability. Nick Piggin posted saying that he is testing out the scalability (or lack thereof) of various VFS code paths, and that he is noticing a problem with call_rcu. According to Nick, __call_rcu is taking 54 times more CPU to do 8 times the amount of work from 1-8 threads, of a factor of 6.7 slowdown when using tree RCU. Nick obviously requested further information from Paul McKenney, RCU inventor and chief guru.

In today’s miscellaneous items: some further fake numa node creation patches for powerpc from Ankita Garg, version 17 of the per-bdi writeback flusher threads patches from Jens Axboe, version 4 of a patch making O_SYNC handling use the standard syncing path from Jan Kara, some x86 performance counters updates from Markus T. Metzger, an updated version of the previous days’ walltime clock synching patches for KVM guests from Glauber Costa, full NAT support for IPVS with netfliter matching support from Hannes Eder, a rework of the GPE handing in the ACPI code from Matthew Garrett, additional warnings within Documentation/md.txt (largely fueled by Pavel Machek’s ongoing rants about RAID support), a summary of merge plans for RDMA in 2.6.32 from Roland Dreier (who suspects rc8 signals impending merge window craziness), some performance counters fixes for POWER7 support from Paul Mackerras, and Luis R. Rodriguez wondered whether kmemleak.h really needed to be exported to userspace.

Finally today, Frederic Riss questions whether ARM kprobes unregistration is SMP safe. The current code makes use of an illegal unstruction to trigger the kprobe code, and Frederic cannot see how one avoids a situation in which a probe is being unregistered as another core takes an illegal instruction. He wonders whether stop_machine should be in use instead.

The latest kernel release was 2.6.31-rc8.

Inotify continues to be a pain upstream. Tej Bewith posted a git dissection in which he claims that a recent fix from Eric W. Biederman to ensure for NULL termination actually broke his system from booting.

Maciej Rutecki posted a potential regression against USB in 2.6.31-rc8. Apparently, since rc7, a Debian testing box experiences unreliable detection and handling of plugin flash drives on KDE4 (one assumes with identical userland between the two kernels).

Wu Zhangjin discovered a kernel panic on 2.6.31-rc7-rt8 in the SetPageLRU code, running on MIPS.

Stephen Rothwell posted a linux-next tree for September 1st. Since Tuesday, the tree gained a few build failures (xfs, acpi, v4l-dvb, net, block). The total subtree count remains steady at 141 trees in the current compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/09/01 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090901.mp3

For Tuesday, September 1st, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFQ, Flexible arrays, IO controllers, kthreads, KVM, NOHZ, and POSIX.

CFQ. Jeff Moyer responded to a bug posting against 2.6.30 in which it had been discovered that the CFQ IO scheduler could (under certain circumstances) skip over incoming requests (mostly those issued out of order) and dramatically diminish the performance of, for example, packet writing to a DVD. Jeff’s patch causes a new next_req to be chosen in cfq_dispach_insert so that there will always be a request to handle if there are some left in the queue. With his patch one can see that the attached results speak for themselves.

Flexible arrays. David Rientjes posted some updates to his “flex arrays”, changing the way that static definitions are done because the existing implementation of FLEX_ARRAY_INIT had no way to determine whether its parameters were valid (since it simply served as a struct initializer. Instead, the new DEFINE_FLEX_ARRAY interface (which can be prefixed with ’static’ for file scoping purposes) performs checks on its parameters, which include a new “name” parameter specifying the name of the resultant structure that will be defined by the macro call.

IO controllers. On another IO related note, Vivek Goyal posted an update in regard to dm-ioband testing. He took one 40GB SATA drive (without hardware queueing) and created two partitions on the disk, to each of which he associated a new ioband device, at weight 200 and 100 respectively. Vivek assumed that this would result in the first device seeing double the IO bandwidth of the second, but this is not what happened in practice. He attached the scripts that he used to generate the tests and requested clarification from Ryo Tsuruta.

Kthreads. Ingo Molnar noticed a synchronization problem at boot time involving kthreads in which there appears to be a race between the initial task (which becomes the idle thread of CPU0) and the init task (which, as he points out, is not the same as the initial task). Although the BKL protects the interaction between these two tasks, little protects which will run first, and there is a possibility that init might run sooner than rest_init, with a resultant ksoftirqd creation failing due to a NULL kthreadd_task. Ingo adds a completion variable to avoid this situation and tags the patch for -stable.

KVM. Glauber Costa, in likely earning himself a few beers, posted two patches that introduce a worker thread fired by kvmclock that will update the guest wallclock time periodically to be in sync with the host’s wallclock. This allows system administrators to set only the host wallclock time and avoid having to run NTP within guest VMs to deal with changes in time.

NOHZ. Josh Triplett posted in regard to the tickless kernel and the reality that the kernel is only truly tickless (running without a timer interrupt) when it is running only the idle task (at other times, the system will still be interrupted every 1/HZ seconds for a timer interrupt). Josh points out that on a system largely doing number crunching, these interrupts can add up to something quite unpleasant – as much as an 8% overhead in his case. With a simple sledgehammer approach, Josh posts a patch that forces the kernel to remain tickless all of the time. The patch as it stands breaks RCU, process accounting, POSIX CPU timers, and other things, but he wants to encourage discussion and debate about the best way forward for development.

POSIX. Jim Meyering noticed that getdents and readdir returned a different st_ino inode number than dirent.d_ino for a mount-point in use by a mounted filesystem. This he claims is in violation of POSIX 2008 and caused him to disable an optimization in coreutils ‘ls -i’. He attaches a snippet of the recent POSIX specification and encourages that “Linux can catch up before too long”, since the only system currently taking advantage of strict compliance seems to be (somewhat more ironically) Cygwin.

In today’s miscellaneous items: a correction to the documentation in Documentation/numastat.txt from Minchan Kim, new sysfs ALS (Ambient Light Sensor) patches from Zhang Rui, version 2 of a patch adding support forKPF_KSM page type recognition to the page-types utility from Fengguang Wu, version 2 of his load-balancing and cpu_power patches from Peter Zijlstra, version 16 of the per-bdi writeback flusher threads patches from Jens Axboe, a patch removing an explicit assumption of the presence of cpu0 in the percpu code from Tejun Heo (especially useful on SPARC systems – this patch was later requested as part of a pull request sent out by Tejun), a patch allowing for max_sectors_kb to exceed above the default of 512 from Nikanth Karthikesan, a fix to avoid dangling blocks not used during a write operation on reiserfs from Jan Kara, a simple nilfs2 bugfix pull request from Ryusuke Konishi, a fix to ensure GCC flags don’t get squashed in the Makefiles by Jory A. Pratt, a new version of a fix to vmscan that moves pgdeactivation modification to shrink_active_list from Hugh Dickins, a fix for the anti-fragmentation patches from Mel Gorman that will once again unbreak nommu, and the addition of some XFS compatibility ioctls as well as an XFS pull request containing those from Felix Blyakher. Xiaohui Xin posted a detailed RFC for Virtual Machine Device Queues (VMDq) support on KVM for which there was not room in this episode – look for that in a later edition.

Finally today, Roland Dreier and David Miller discussed the setup of the new linux-rdma@vger.kernel.org mailing list and how it can be advertized, archived, and generally advocated as the new list for RMDA topics.

The latest kernel release was 2.6.31-rc8.

Stephen Rothwell posted a linux-next tree for September 1st. Since Monday, the pxa, xfs, i2c, and dwmw2-iommu trees lost their conflicts and build failures, while the pci, acpi, and block trees gained failures for which Stephen mostly used other versions as necessary. The total subtree count remains steady at 141 trees in the latest compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/31 Linux Kernel Podcast

September 15th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090831.mp3

For Monday, August 31st, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: KVM, Poisonous hardware, and XFS.

KVM. Avi Kivity announced that, from now on, he will be sharing KVM maintainership with Marcelo Tosatti. They will commit on alternating weeks, or something along these lines, which is aimed to provide Avi with more time to develop new features and improve the overall maintainership role.

Poisonous hardware. Fengguang Wu provided some memory cgroup patches implementing support for HWPOISON (detected known bad physical pages) testing. The idea here is that adding specific tasks into a memory cgroup allows for only a sub-set of running tasks to have errors injected, providing for measurement of the system response to such situations without running the risk of core system processes and daemons being killed during basic tests.

XFS. Michael Tokarev noted that XFS doesn’t provide a compat_ioctl layer for resizing (via the xfs_growfs ioctl command) ioctl calls, meaning that there is no easy way to perform online resizing of XFS volumes when using a 64-bit kernel and a 32-bit userspace environment. Michael obviously wonders if there is any plan to add such support through compat_ioctl wrappers.

In today’s miscellaneous items: some perf tools cleanups (creating a library for certain functions) from Frederic Weisbecker, some x86 header cleanups from Ying Huang, various v4l/dvb fixes for 2.6.31 from Mauro Carvalho Chehab, a patch moving the page-types utility from Documentation/vm to tools/vm from Fengguang Wu (who also added support for recognizing KPF_KSM pages), a trivial KVM symbol offset calculation fix (substituting __pa for __pa_symbol) from Glauber Costa, the addition of the new KPF_HWPOISON page flag for hardware detected memory corruption marking from Fengguang Wu (part of Andi Kleen’s ongoing HWPOISON effort), a fix using native_rdmsr|wrmsr_safe_regs prior to reading or writing to the MSR for an x86 AMD K8 erratum fix from Borislav Petkov (based on an idea from Peter Anvin), version 5 of the ALS (Ambient Light Sensor) support patches from Zhang Rui, a tracing/filters memory allocation fix from Li Zefan, another attempt at cleaning up kcore on mmotm from Kamezawa Hiroyuki, some “fake numa” fixes for powerpc from Ankita Garg, version 15 of the per-bdi writeback flusher threads patches from Jens Axboe, some patches implementing optional delays during ALUA state transition from Nicholas A. Bellinger, ongoing discussion about what to do with KVM guest page table metadata and whether this could provide safe hinting to the host, and an ongoing rant about RAID continued.

The latest kernel release was 2.6.31-rc8.

Paul Mundt discovered a page allocator regression on nommu systems, which he says is caused by a recent page from Mel Gorman (entitled “move check for disabled anti-fragmentation out of fastpath”). It causes a failure during initramfs unpacking on his development board.

Mario Holbe discovered a regression between 2.6.26 and 2.6.30 in which device-mapper would no longer handle devices with identical UUIDs. This is typically an unlikely situation, but it can happen, especially when using backup images and mounting them onto a running system.

Stephen Rothwell posted a linux-next tree for August 31st. Since Friday the pxa, sound, and sfi trees lost their conflicts while the i2c, drm, and dwmw2-iommu trees gained conflicts or build failures for which temporary fixes were applied. The total subtree count remains steady at 141 trees.

Jiri Slaby noticed that an ongoing suspend race in linux-next seems to be caused by might_sleep() calls in flush_workqueue() and flush_cpu_workqueue(), which he discovered through painstaking code instrumentation. As he points out, due to the number of suspend cycles required, bisection is tricky, but he has at least provided some data points to aid in debugging.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/26 Linux Kernel Podcast

September 2nd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090826.mp3

For Wednesday, August 26th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFS, Cpuidle, and Hardware Breakpoints.

CFS. Peter Zijlstra noted that the preempt-rt kernel (also known as ‘-rt’ in this particular instance) runs hardware interrupts and softirqs as real time tasks, concurrent with the task load balancing, which is itself done from a softirq task context. This means that there is likely to be at least one (and probably several) real time task running at task load balance time. Recently, it has been observed that load balancing fails on preempt-rt systems in various subtle ways due to the interaction between RT tasks and CFS as described. Peter “solves” this problem for now by ignoring RT tasks when it comes to calculating the individual CPU load thereby reducing the likelihood for Real Time tasks to bounce around, evacuating “a significant numer” of other tasks as they become runnable.

Separately, Peter, who continues to be one of the strongest voices against the “Offline” scheduler proposal, was joined by Ingo Molnar, who stated, “If you on the other hand were approaching this issue with pragmatism and with intellectual honesty, if you were at the end of a string of patches that gradually improved latencies but couldn’t get them below a certain threshold, and if scheduler developers couldn’t give you any ideas what else to improve, and _then_ suggested some other solution, you might have a point. You are far away from being able to claim that”. Clearly, he’s not a big fan then.

Cpuidle. Arun R Bharadwaj posted version 2 of his patch series implementing cpuidle support for POWER systems, including a sample implementation for the IBM pseries platform. Arun’s tests show improved idle handling with the CPU idle overhead now being a fraction of what it had been with the older pseries_dedicated_idle_sleep loop implementation.

Hardware breakpoints. Frederic Weisbecker replied to yesterday’s posting by K. Prasad of updates to the hardware breakpoint infrastructure noting that he was wrong to request the new API that he had, and that he would instead prefer if the perf tools could selectively arm and disarm existing breakpoints, rather than having their own API to create and manage new ones. His email began, “You will hate me but…”, which is always a good sign.

In today’s miscellaneous items: some networking fixes (smc91x, oom in virtio_net), version 5 of the crashkernel=auto patchset from Amerigo Wang, some tracing fixes fixing a bug in splice_read for the ring_buffer (and allowing rb_get_reader_page to be called by blockable code) from Lai Jiangshan, some vdso32 fixes from Jan Kratochvil, the latest iteration of Rafael J. Wysocki’s asynchronous suspend and resume patches for system sleep state transitions such as suspend to RAM, an update to yesterday’s hardware breakpoint patches from K. Prasad, a fix for a memory leak in IMA from Eric Paris, some thermal management improvement patches (including documentation updates) from Frans Pop, a fix to avoid returning to userspace with the BKL held from within the vt code from Henrik Kretzschmar, a fix to avoid complaints from the perf tools if root owns perf.data from Pierre Habouzit, a module fix for symbol_put_addr such that it uses dereference_function_descriptor on architectures using function descriptors, such as powerpc from Rusty Russell, round 4 of the pending KVM updates for 2.6.32 from Avi Kivitiy (who’s vying with Ingo Molnar for patchcount), version 3 of the per-process OOM killer rework (true per-task oomadj, etc.) patches from Kosaki Motohiro, a fix for unintended panics in the MCE handler code from Hidetoshi Seto, a fix to a sys_umount induced perpetual freeze in the filesystem freeze code, some tracing updates for the forthcoming merge window from Steven Rostedt, and some S+Core updates from Liqin Chen (who seems to have convinced his company to setup a public git server – another awesome sign of a developer who has gotten involved with the community with helpful
assistance and mentoring activities from Arnd Bergmann).

In today’s announcements: 2.6.31-rc7-rt8. Thomas Gleixner announced the latest version of the preempt-rt patch, which is updated to Linus’ latest kernel, includes the previous performance counters crash fix from Peter Zijlstra, and contains two other fixes. There are still known issues with ARM highmem and scheduler load balancing “oddities” for which Peter Zijlstra is working on some magic fixes, as covered earlier in this episode.

Ryo Tsuruta announced the IO Controller mini-Summit of 2009, which will be held immediately prior to this year’s kernel summit. On the agenda are topics that include reconciling the multiple IO controller development projects, extensions to struct bio, and formalizing an I/O tracking and charging policy. There is a website with a wiki for registering (or you can email Ryo). What is not entirely clear is how many of those working on IO controller patches will actually be present at this year’s kernel summit to participate in the debate.

The latest kernel release is 2.6.31-rc7, which was released last week.

Rafeal J. Wysocki followed up to his previous posting of known regression lists with individual replies to many of the outstanding kernel bugs affecting recent kernel releases. Several bugs were subsequently closed. Meanwhile, Darren Hart posted about a trace-cmd memory corruption in Steven Rostedt’s tracing utility. Finally, Eric W. Biederman reported concerns around inotify in 2.6.31-rc6 not noticing deleted files. There has been some rework in this area recently, so it’s quite possible that a regression exists in that code.

Stephen Rothwell posted a linux-next tree for August 26th. Since Tuesday, the uwb tree gained a build failure. The total subtree count remains steady at 141 sub-trees in the latest linux-next tree compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/30 Linux Kernel Podcast

September 2nd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090830.mp3

For the weekend of August 30th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Discard, IO scheduler based IO controller, and offline scheduling.

Discard. Christoph Hellwig posted a 7 part patch series implementing his latest ponderings on the best way to implement discard support. This is what happens when disk blocks are no longer in use or needed by a filesystem and can be explicitly returned to the disk as such. If the drive firmware is aware of an explicit discard event, then it can intelligently handle garbage collection and management of flash blocks on SSD devices. In Christoph’s latest patches, blkdev_issue_discard becomes a lot more generic and he mentions that he would like to see some progress on the block layer. Regular listeners may be aware of issues with fundamental ATA commands such as TRIM affecting the actual implementation of generic discard support. Christoph references this problem but still wants to get the other pieces sorted.

IO Scheduler based IO controller. Vivek Goyal posted version 9 of his IO scheduler based IO controller patches, as another RFC. These patches have been floating around for some time now, and represent just one of three competing implementations of an IO controller for Linux. They allow applications to be grouped using cgroups and assigned particular limited amounts of disk time and bandwidth resources that they may not exceed. The latest version of these patches contains a number of fixes and ever growing documentation.

Offline scheduling. Gregory Haskins, Rik van Riel, Thomas Gleixner (aka “the usual suspects”) and others continued debate over Raz Ben Yehuda’s earlier “offline” scheduler proposal. In that proposal, Linux would support offlining a CPU and rededicating it to run an exclusive task free from any of the overheads typically associated with maintaining CPU state (for example, free from any of the usual percpu kernel threads involved) and subject to certain limitations. The conversation had shifted away from that proposal though, and by the weekend folks such as Gregory Haskins were wondering whether one might instead (ab)use the nohz tick disabling code to wrap around a specific task running on a given CPU, gaining the benefits of running uninterrupted but without having to perform many other modifications to the task itself. But as others would later point out, there’s more to truly isolating CPUs than running tasks in a tickless kernel environment.

In today’s miscellaneous items: a suggestion from Peter Anvin that msr_safe functions not return a nonsensical value of -EFAULT but instead use -EIO, some early boot fixes and a conversion of PCI init to x86_init for Intel Moorestown support from Thomas Gleixner (and also some platform_setup based patches for Intel Moorestown from Jacob Jun Pan), version 3 of a asynchronous raid6 acceleration through hardware offloading patch from Dan Williams (the Intel one), a fix for kbuild to detect stack protector support when building on x86_64 systems with an IA32 target, the addition of user and system time measurements in task status files from Tatsuhiro Aoshima, some ACPI fixes from Len Brown, some minor tracing updates for 2.6.32 from Steven Rostedt, version 2 of the hardware breakpoints fixes from K. Prasad, a single wireless fix from John Linville (correcting a regression in the ipw2200 firmware loading code), a fix for rpc_task_force_reencode from Trond Myklebust, a divide-by-zero fix in the performance counters code from Peter Zijlstra (who is obviously not quite yet satisfied at finding these bugs), a patch reducing the kernel stack footprint of the firewire core kernel thread from Sefan Richter, a fix for earlyprintk=dbgp from Jan Beulich, some x86 fixes from Ingo Molnar, version 2 of the previously covered “offline state framework” allowing for a choice of the state an offlined CPU will be placed into from Gautham R Shenoy, an update to the module macro comments syncing real-world use of multiple MODULE_AUTHOR statements with inline documentation from Johannes Berg, another iteration of Mel Gorman’s patches implementing multiple free-lists in the percpu structure used by the low level page allocator (now that such allocations are dynamic), and some continuing rambling about the reliability of various kinds of block device and the failure modes thereof impacting data integrity.

In today’s announcements: Thomas Gleixner announced yet another iteration of the preempt-rt kernel patches for 2.6.31-rc7. In the rt8 release, Thomas includes some further fixes and notes ARM highmem, scheduler load balancing, and latency tracer issues as known issues. He encourages everyone to test this latest release in particular since he wants to release a .31-rt within 24 hours of Linus posting the final 2.6.31 release.

Junio C Hamano announced git version 1.6.4.2, the SCM used by the Linux kernel, which contains a number of fixes.

The latest kernel release is 2.6.31-rc8, which was released by Linus Tovards on Thursday evening.

Eric Paris continues to have fun with inotify. Quoting Eric, “Knocking on wood failed. I actually screwed inotify up worse in -rc8 than ever before (2 new regressions including inotify never gets events!!). Sorry to everyone out there trying to use -rc8″. Eric has been firing off patches in mitigation. Several others reported issues with inotify, including Jeff Chua, who claims that the latest patches adding a terminating byte onto name_len break Samba. Meanwhile, Luis R. Rodriguez reported a number of kmemleak warnings related to ACPI, ext4, and tty use.

Stephen Rothwell posted a linux-next tree for August 28th. Since Thursday, the v4l-dvb tree lost one conflict and gained another, the sound tree gained a build failure, and the rr tree lost its conflict. The total subtree count remains steady at 141 subtrees in the latest linux-next tree compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/27 Linux Kernel Podcast

September 2nd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090827.mp3

For Thursday, August 27th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: RAID, Stable review patches, and syncing.

RAID. Pavel Machek (in a typically provocative style) replied to an email thread that had been discussing how to document potential unreliabilities in using ext2 and ext3 on certain kinds of backing block device, with the subject “raid is dangerous but that’s secret”, and a direct comparison between the kernel community and some hyperthetical automotive manufacturer being aware of their ABS having flaws but deciding not to tell anyone about it anyway. He adds that he expects “slightly higher moral standards”, though one wonders whether documetation alone can ever truly guarantee that to be the case.

Stable review patches. Luis R. Rodriguez wondered aloud whether more should be done to encourage and enable users to test stable patches prior to them hitting the stable tree. Althoug Greg and many others do a great job with the stable updates, Luis is interested to find out whether there should be more exposure to online resources, such as the stable-review vger mailing list.

SYNC. Christoph Hellwig, Jamie Lokier, and Ulrich Drepper had a debate surrounding the O_SYNC and O_DSYNC flags one can pass to file operations. Christoph wondered aloud whether it might be possible for Linux to change the meaning of O_SYNC and start using O_DSYNC within the C library on newer kernels. Ulrich followed up saying that it is not possible to change the meaning of existing flags, which the various Open Group standards apparently don’t mandate should be handled by the underlying system call, which is free to not handle them if it would like to do so. Uli suggested adding a new sys_newopen that would fail when given unknown flags as parameters.

In today’s miscellaneous items: a fix to the perf tools to handle importing task comms (running task names) for ftrace event tracing from Frederic Weisbecker, a SCSI driver for VMWare’s virtual HBA from Alok Kataria, a fix to the RTC code for certain devices, explicitly marking them as “wakeup capable” from Anton Vorontsov, some further syscall tracing patches from Frederic Weisbecker, some kmemleak patches for 2.6.32 from Catalin Marinas, version 5 of the vhost kernelspace virtio server optimization from Michael S. Tsirkin, some early CPU load-balancing rework patches from Peter Zijlstra, a patch removing bd_mount_sem from Fernando Luis Vazquez Cao (which also causes a small user-visible semantic change, returning -EBUSY if a filesystem is frozen rather than blocking the mount task in an uninterruptible state for an indetermininate time interval), some plan 9 fixes from Eric Van Hensbergen, a fix to the AFS implementation that corrects handling of symbolic links that prevously resulted in crashes from David Howells, some fixes to inotify from Eric Paris (including a terminating null on filename fix), version 3 of the cpuidle patches for POWER from Arun R Bharadwaj, a few m68k fixes, a fix to task group weight ratio calculations such that they don’t result in a divide-by-zero from Peter Zijlstra, a slow-work conversion patch for libata from Jens Axboe, a patch to performance counters removing any hint of ABI preservation guarantees in trace-events from Peter Zijlstra, and some tracing fixes for s390 systems from Hendrik Brueckner.

In today’s announcements: Linux version 2.6.31-rc8. Linus Torvalds announced the release of Linux 2.6.31-rc8 on Thursday evening at 6:24pm PDT (Best Coast Time). In his announcement, Linus says that this should really be the final RC since things have “really been quieting down”. He’s right that most of the recent stuff hitting the tree has been relatively trivial, and so this should result in a 2.6.31 final release hitting the intertubes as forecast around the US Labor Day weekend holiday (when many of us will be away anyway).

The latest kernel release is 2.6.31-rc8, released this evening.

Andrew Morton posted an mm-of-the-moment for 2009-08-27-16-51 (which superceeded the one he also posted at 2009-08-27-00-57).

Gregory Haskins noticed that the ftrace function-graph tracer had suddenly stopped working for him in moving from rc6 to rc7, although others could not reproduce this behavior (including Ingo Molnar, who said as much).

Stephen Rothwell posted a linux-next tree for August 27th. Since Wednesday, the uwb tree lost its build failure and the tip tree lost a conflict. The total subtree count remains steady at 141 in today’s linux-next compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/25 Linux Kernel Podcast

September 2nd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090825.mp3

For Tuesday, August 25th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: AlacrityVM, CFS, and Hardware Breakpoints.

AlacrityVM. Gregory Haskins posted some updated networking benchmarks for his “AlacrityVM” hypervisor, which is a fork of KVM. These were taken after a switch in the original code from get_user_pages to an explicit switch_mm and then a copy_[to|from]_user cycle at the suggestion of Michael Tsirkin. Greg says that the latest changes, “moves us ever closer to the goal of native performance under virtualization”.

CFS. Bharata B Rao posted an RFC patchset implementing CFS Hard limits. This aims to modify CFS such that it will give each task a maximum of the CPU time allocated to it, rather than dividing out available time on a fairness basis. Using this patchset, one can more easily implement pay-per-use and avoid a virtual machine instance from consuming more CPU resources than planned, and also for other purposes. The idea for hard limits was previously discussed in an email thread entitled “CPU hard limits” that Bharata had started back on June 4th. This patchset is an early implementation based upon that RFC.

Hardware Breakpoints. K. Prasad posted some updates to his previous hardware breakpoints API. The latest patches introduce per-cpu kernel-space Hardware Breakpoint requests and allow those kernel breakpoints that are defined to be modified through a new API, for use with the perf tools, that will now be able to set and modify existing hardware breakpoints.

In today’s miscellaneous items: some trivial networking fixes from David Miller, some SPARC fixes from David Miller, a new toggle option to be able to disable IMA at runtime for debugging purposes (due to an ongoing memory leak debugging exercise in Fedora) from Kyle McMartin, some syscall tracing fixes from Frederic Weisbecker (a git tree containing similar fixes – or perhaps identical – to those posted on Monday), a proposed update to console_print such that it would have a printk-like interface from Anirban Sinha, a fix from Yinghai Lu to enable x86 to use hard_smp_processor_id to get the apic id in identify_cpu, version 2 of the fixes from Jan Beulich intended to allow binutils prior to 2.17 to properly link the kernel, some ext3 updates from Jan Kara, version 2 of the O_NOSTD patchset originally posted on Monday from Eric Blake, further work on the walltime timer-source option for ftrace from Zhao Lei, some sound fixes from Takashi Iawai, an infinite loop problem report with checkpatch in post-2.6.28 parsing include/linux/inetdevice.h from Eric Dumazet, and the usual round of craziness from Ingo Molnar (fixes for core kernel, performance counters, timers, tracing, and, of course, also x86).

Finally today, Will Brown had posted a problem report surrounding epoll, claiming that it “frequently fails to notify connects at connect bursts”, to which Davide Libenzi followed up adding that multiple quasi-simultaneous events for the same fd are merged, and giving an example for the correct way to handle POLLIN using a loop to catch all the connect events.

Today’s quote of the day goes to Peter Zijlstra, who implies an obvious love for Christoph Lameter’s enthusiasm for the offline scheduler patchset in saying, “Christoph, stop being silly, this offline scheduler thing won’t happen, full stop. It’s not a maintainable solution, it doesn’t integrate with existing kernel infrastructure, and its plain ugly”.

The latest kernel release is 2.6.31-rc7, which was release last week.

Someone called “mailing54″ (who identifies as “Tobias” but doesn’t bother to list an actual name, or valid whois records from which to obtain a name) reports a regression between rc6 and rc7 of the 2.6.31 kernel. The more recent kernel is apparently unable to boot on a Macbook 2,1 using Debian packages built by the Ubuntu kernel team. Meanwhile, Pawel Golaszewski posted an oops occuring on a variety of kernels after a few minutes indicating serious corruption in pick_next_task_fair which seems a bit like a local system (memory?) issue at first glance.

Rafael J. Wysocki posted a list of regressions between 2.6.29 and 2.6.30 and from 2.6.30 up to 2.6.31-rc7-git2. These show that the number of regressions between 2.6.29 and 2.6.30 has more or less leveled off (still largely affecting individial device drivers, not core kernel), and the number of regressions reported from 2.6.30 up until the current development tree currently is doing a lot better than it has been. The unresolved count actually fell in the latest statistics once again, standing at 26. Some of these include some pretty core kernel regressions that still need love.

Stephen Rothwell posted a linux-next tree for August 25th. Since Monday, the voltage, omap, and tip trees lost their build failures and conflicts and the
total subtree count remains steady at 141 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/08/24 Linux Kernel Podcast

September 2nd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090824.mp3

For Monday, August 24th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Compcache, Git, Lazy workqueues, O_NONSTD, and races in copy_process vs. de_thread.

Compcache. Nitin Gupta posted the latest version of the compcache patches. These implement a compressed, RAM backed block device upon which one can mount a swap device. Nitin cites this as a big win over regular swap to disk because disks are far slower than RAM (even with compression), and can be damaged by excessive swap activity (for flash drives). Even systems not using swap are improved because they can run more applications with the same RAM installed. The patchset creates a series of /dev/ramzswapX files, which can each have an additional backing swap disk device for use whenever an incompressible page is found or a memory limt for a given device is reached. Nitin includes some performance metrics and makes a variety of claims regarding improvement.

Git. Peter Zijlstra brought up the fact that git send-email and similar commands don’t always do the right thing (especially with threading) by default without some custom configuration. He had asked before and wondered why Junio C Hamano hadn’t yet affected any changes to the defaults. Junio replied that he previously thought that nobody was objecting to the existing behavior, but that a forthcoming 1.7.0 version of the git tools will make various changes to the defaults for those just tuning in.

Lazy workqueues. Jens Axboe posted version 2 of his lazy workqueues patchset. The patchset aims to reduce the dramatic number of kernel threads created on modern (multi-processor) Linux systems by selectively creating kernel threads on demand, as they are needed to handle work, rather than registering a workqueue per processor that sits mostly idle. These threads then stick around until a configurable period of idleness has elapsed, before being reaped. It’s like having the existing kernel threads, but they are only ever fully created if they are going to be used. Linux Weekly News has a more detailed writeup.

O_NONSTD. Eric Blake, fed up with the legacy behavior of e.g. dup, open, pipe2, and so forth (which will always assign the lowest available file descriptor, even if it is one of the special 0-2 range), posted a patch adding a new O_NOSTD flag that will cause newly opened file descriptors not to have values lower than three. This means that, if a task intentionally closes its file descriptors prior to forking, the child can insist that those file descriptors not be immediately recylced without having to sanity check the values of every descriptor returned. Linux Weekly News has a more detailed writeup on this issue also, so refer to that for further detail.

Races in copy_process vs. de_thread. Hiroshi Shimamoto noticed that copy_process uses signal->count as a reference counter, but it is not in fact a protected reference count. If copy_process(CLONE_THREAD) races with de_thread() then this count logic in signal->notify_count breaks, with the execing thread hung forever in kernel space. The more general problem was discovered when debugging some issues surrounding the GFS code locking up.

In today’s miscellaneous items: some lockdep fixes for reiserfs (related to killing BKL usage) from Frederic Weisbecker, a fix for an infinite loop regression in the NFSv4 implementation from Trond Myklebust, some syscall tracing fixes (moving callbacks for syscall tracepoints to the definition site and adding generic TRACE_EVENTs which capture syscall arguments) from Josh Stone, a conversion on x86/x86-64 to define and use NR_syscalls, especially in syscall event tracing from Jason Baron, some OCFS2 fixes for 2.6.31-rc7 (including a quote file corruption bugfix, and a fix for an oops on failed mounts) from Joel Becker, some inotify and idr race fixes from Eric Paris, some temporary (until a pending rework is actually completed) RCU fixes for notrace and hotplug CPU from Paul McKenney, some error and documentation fixes for ext3 from Jan Kara, some tracing documentation updates to tracepoint-analysis.txt from Mel Gorman, some ARM subarch submissions for the forthcoming merge window from Kevin Hilman, a fix for inverted handling of SNAT netfilter targets in 2.6.30 from Maximilian Engelhardt, some documentation updates for ext2/ext3 on block devices where reliable operation may not be guaranteed from Pavel Machek, some HTC Dream updates from Pavel Machek, some AVR32 updates (fixing unaligned use of memcpy) for 2.6.31 from Haavard Skinnemoen, ongoing discussion of build fixes for x86 building with older binutils releases including Peter Anvin, some input updates from Dmitry Torokhov (who also noted some potential issues with cancal_delayed_work()’s use of del_timer_sync), some networking fixes from David Miller, a fix from Li Zefan that handles a race within the tracing code when a module is unloaded before ftrace_profile_disable() is called, and Ryo Tsuruta posted version 11 of his blkio-cgroup patchset.

The latest kernel release is 2.6.31-rc7, which was released last week.

Andrew Morton released an mm-of-the-moment for 2009-08-24-16-24. It contains a number of patches against 2.6.31-rc7.

Ravikiran G Thirumalai discovered a 2.6.31-rc7 regression on vSMPowered systems due to a buggy flat_phys_pkg_id as the scheduling domains appear to build incorrectly. He found the faulty commit and provided a three line fix.

Harald Dunkel reported a crash in reiserfs mounted over NFS on 2.6.30.5 under a Linux-HA clustering stress test, for which he posted a backtrace.

Stephen Rothwell posted a linux-next tree for August 24th. Since Friday, a new kconfig tree was added (which might be the source of the following issue that Steven Rostedt reported), the configfs, nfs, and suspend trees lost conflicts and build failures, and the kconfig, rr, voltage, and tip trees gained conflicts and/or build failures for which Stephen took avasive action. The total number of subtrees now increases to 141 due to the new kconfig tree.

Steven Rostedt discovered a problem building on the linux-next master branch in the module stage 2 due to crc16 not being available in net/bluetooth/l2cap.ko, which suggests config dep. issues.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: