Home > episodes > 2009/09/01 Linux Kernel Podcast

2009/09/01 Linux Kernel Podcast

September 15th, 2009 jcm Leave a comment Go to comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090901.mp3

For Tuesday, September 1st, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: CFQ, Flexible arrays, IO controllers, kthreads, KVM, NOHZ, and POSIX.

CFQ. Jeff Moyer responded to a bug posting against 2.6.30 in which it had been discovered that the CFQ IO scheduler could (under certain circumstances) skip over incoming requests (mostly those issued out of order) and dramatically diminish the performance of, for example, packet writing to a DVD. Jeff’s patch causes a new next_req to be chosen in cfq_dispach_insert so that there will always be a request to handle if there are some left in the queue. With his patch one can see that the attached results speak for themselves.

Flexible arrays. David Rientjes posted some updates to his “flex arrays”, changing the way that static definitions are done because the existing implementation of FLEX_ARRAY_INIT had no way to determine whether its parameters were valid (since it simply served as a struct initializer. Instead, the new DEFINE_FLEX_ARRAY interface (which can be prefixed with ’static’ for file scoping purposes) performs checks on its parameters, which include a new “name” parameter specifying the name of the resultant structure that will be defined by the macro call.

IO controllers. On another IO related note, Vivek Goyal posted an update in regard to dm-ioband testing. He took one 40GB SATA drive (without hardware queueing) and created two partitions on the disk, to each of which he associated a new ioband device, at weight 200 and 100 respectively. Vivek assumed that this would result in the first device seeing double the IO bandwidth of the second, but this is not what happened in practice. He attached the scripts that he used to generate the tests and requested clarification from Ryo Tsuruta.

Kthreads. Ingo Molnar noticed a synchronization problem at boot time involving kthreads in which there appears to be a race between the initial task (which becomes the idle thread of CPU0) and the init task (which, as he points out, is not the same as the initial task). Although the BKL protects the interaction between these two tasks, little protects which will run first, and there is a possibility that init might run sooner than rest_init, with a resultant ksoftirqd creation failing due to a NULL kthreadd_task. Ingo adds a completion variable to avoid this situation and tags the patch for -stable.

KVM. Glauber Costa, in likely earning himself a few beers, posted two patches that introduce a worker thread fired by kvmclock that will update the guest wallclock time periodically to be in sync with the host’s wallclock. This allows system administrators to set only the host wallclock time and avoid having to run NTP within guest VMs to deal with changes in time.

NOHZ. Josh Triplett posted in regard to the tickless kernel and the reality that the kernel is only truly tickless (running without a timer interrupt) when it is running only the idle task (at other times, the system will still be interrupted every 1/HZ seconds for a timer interrupt). Josh points out that on a system largely doing number crunching, these interrupts can add up to something quite unpleasant – as much as an 8% overhead in his case. With a simple sledgehammer approach, Josh posts a patch that forces the kernel to remain tickless all of the time. The patch as it stands breaks RCU, process accounting, POSIX CPU timers, and other things, but he wants to encourage discussion and debate about the best way forward for development.

POSIX. Jim Meyering noticed that getdents and readdir returned a different st_ino inode number than dirent.d_ino for a mount-point in use by a mounted filesystem. This he claims is in violation of POSIX 2008 and caused him to disable an optimization in coreutils ‘ls -i’. He attaches a snippet of the recent POSIX specification and encourages that “Linux can catch up before too long”, since the only system currently taking advantage of strict compliance seems to be (somewhat more ironically) Cygwin.

In today’s miscellaneous items: a correction to the documentation in Documentation/numastat.txt from Minchan Kim, new sysfs ALS (Ambient Light Sensor) patches from Zhang Rui, version 2 of a patch adding support forKPF_KSM page type recognition to the page-types utility from Fengguang Wu, version 2 of his load-balancing and cpu_power patches from Peter Zijlstra, version 16 of the per-bdi writeback flusher threads patches from Jens Axboe, a patch removing an explicit assumption of the presence of cpu0 in the percpu code from Tejun Heo (especially useful on SPARC systems – this patch was later requested as part of a pull request sent out by Tejun), a patch allowing for max_sectors_kb to exceed above the default of 512 from Nikanth Karthikesan, a fix to avoid dangling blocks not used during a write operation on reiserfs from Jan Kara, a simple nilfs2 bugfix pull request from Ryusuke Konishi, a fix to ensure GCC flags don’t get squashed in the Makefiles by Jory A. Pratt, a new version of a fix to vmscan that moves pgdeactivation modification to shrink_active_list from Hugh Dickins, a fix for the anti-fragmentation patches from Mel Gorman that will once again unbreak nommu, and the addition of some XFS compatibility ioctls as well as an XFS pull request containing those from Felix Blyakher. Xiaohui Xin posted a detailed RFC for Virtual Machine Device Queues (VMDq) support on KVM for which there was not room in this episode – look for that in a later edition.

Finally today, Roland Dreier and David Miller discussed the setup of the new linux-rdma@vger.kernel.org mailing list and how it can be advertized, archived, and generally advocated as the new list for RMDA topics.

The latest kernel release was 2.6.31-rc8.

Stephen Rothwell posted a linux-next tree for September 1st. Since Monday, the pxa, xfs, i2c, and dwmw2-iommu trees lost their conflicts and build failures, while the pci, acpi, and block trees gained failures for which Stephen mostly used other versions as necessary. The total subtree count remains steady at 141 trees in the latest compose.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

  • Print this article!
  • del.icio.us
  • Facebook
  • TwitThis
  • Identi.ca
  • Digg
  • Google Bookmarks
  • Slashdot
  • RSS
Categories: episodes Tags:
  1. No comments yet.
  1. No trackbacks yet.