2010/03/14 Linux Kernel Podcast
Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100314.mp3
For the weekend of March 14th 2010, I’m Jon Masters with a summary of the week’s LKML traffic.
In today’s issue: The 2.6.34 merge window, anonymous inodes, ATA 4KiB sector issues, cpuhogs, ext4, PCI, and USB console support.
The 2.6.34-rc1 merge window. Linus Torvalds announced the release of the first 2.6.34 RC kernel on Monday, March 8th 2010 at 12:33pm Best Coast Time (PST). In closing the merge window early, he hoped to make a point in line with previous comments on the issue of getting merge requests in in a timely fashion. Quoting Linus, “but in general the merge window is over. And as promised, if you left your pull request to the last day of a two-week window, you’re now going to have to wait for the 2.6.35 window.” According to Linus, nearly two thirds of the changes are in drivers (when factoring in 50% drivers/ code, 5% sound/ code, and 10% firmware). Of the remaining bits, about half is architectural and the rest is, well, the rest. So far, about 850 developers are involved. Linus again refered to his Fedora Nouveau rant in ending with a reference to the need to upgrade libdrm/nouveau_drv versions if using that driver.
Several architecture maintainers gave their excuses and requested pulls later, but Linus drew the line at a request from James Bottomley to pull SCSI pieces two days later, on March 10th. James noted that he had been en route back from India, nobody had told him the merge window would close early, and that the only commit added to his tree since the merge window closed on Monday was a bug fix. Linus said he was “not going to pull” and that the whole point behind closing the merge window early was because of people posting pull requests late that “should have been ready when the merge window _opened_”. James objected to the unpredictability of the merge window closing, but Linus said that “WAS THE WHOLE F*CKING POINT!”, in order to avoid last minute pull requests, and added that he would in future not even say how long the merge window was going to be in order to have requests ready the moment the window opened. Unfortunately for James, Linus wanted to make a point and he seemed to meet Linus’ criteria for doing so. Doug Gilbert later pointed out that people should not attack James just because he was the subject of “yet another Linus rant”.
Anonymous inodes. Dmitry Torokhov recently started a thread entitled “S[E]Linux going crazy in 2.6.34-rc0″ (but note the corrected capitalization of “SELinux”). He was experiencing a side effect of some recent work by Al Viro, as well as others, to switch various subsystems such as inotfiy over to use anon inodes rather than their own “filesystem” type. Previously, inotify had used its own filesystem called simply and obviously “inotifyfs”. This allowed for SELinux rules to match on various notification events on an “inotify_t” filesystem type of filesystem. But with the trend to convert to anonymous inodes, there becomes no easy way to write SELinux rules to confine applications (if that is what you actually want to do), and the existing rules go insane, as this author recently saw on a rawhide system that happened to be running SELinux. Eric Paris proposed various workarounds – type a, and type b – of the “revert” everything back to how it used to be, or create support for differing security contexts for anonymous inodes. The latter seems more likely to happen though the thread dried up at that point and nothing further was said on the topic until Eric Paris sent a pull request for some notify bits a week later.
ATA 4 KiB sector issues. Tejun Heo started a new thread entitled “ATA 4 KiB sector issues”, in which he lamented the current state of support for larger sector size ATA devices (those using 4K rather than 512 bytes as their natural unit of size – someone please add a comment to this article with a description for the term used to describe the natural size of a disk, its “word size”). Apparently, the transition will be “quite painful”. In his lengthy email, the gist of which is covered by an article on the kernel.org wiki at: http://ata.wiki.kernel.org/index/php/ATA_4_KiB_sector_issues, Tejun covers the issue of backwards compatibility, DOS partition table support, and that beast of beasts – Windows. Interestingly, I didn’t see a specific mention of the issue of unaligned writes when using journalled filesystems and ensuring commits have hit the disk, but I’m sure that’s covered somewhere in there. I suspect this is now required reading if you work on disk and block bits. James Bottomley added some useful notes about the lack of bootloader support, etc.
CPU Hogs. Tejun Heo posted a patchset intended to generalize the case of monopolizing a CPU (or a set of CPUs) with a single kernel thread. The cpuhog functionality can be used by any kernel code that needs to grab one or more CPUs exclusively for some period of time, such as [k]stop_machine, which does just thus during module load in order to ensure that it is safe to fiddle with the kernel symbol table. For good measure, Tejun also fixes the kernel migration threads to use cpuhog while he’s at it. LWN had a writeup on this topic later, and your author has a pet project in mind that should benefit already from using this patchset. Thanks Tejun Heo!
ext4. Christian Borntraeger posted asking about e4defrag support for compatible ioctls (as in the case on his system, with a 64-bit x86_64 kernel and 32-bit IA32 userspace environment). He suggested, “[l]et[']s just wire up EXT4_IOC_MOVE_EXT for the compat case.” This lead Jeff Garzik to wonder aloud what the overall status was of ext4 defragmentation support. Jeff noted that he had actually poked at defragmentation support himslef in the past and was “hopeful that I will see defragging in a Linux distribution sometime in my lifetime”. Eric Sandeen noted that such support had previously been in Fedora (briefly) but was removed because he (Eric) wasn’t so happy with the code. Since I happen to know Jeff has a good many years ahead of him, one hopes that he will get to see many great things, including ext4 defragmentation. Separately, Michael Tokarev pointed out another 32-bit userspace on 64-bit kernel issue with compatible ioctls, this time affecting AIO. Jeff Moyer was on the case with an initial test patch that he could use succesfully with the libaio test harness built with -m32 while he continues to work in general on further AIO cleanups for the longer term.
PCI. Alex Chiang posted an updated patch based upon some awesome work that Matthew Wilcox had done to provide sysfs PCI slot to device mapping directory entries that can be used to determine which physical slot a device is actually installed in within the chasis of a given system. This will be of use to a number of projects, including efforts to name network interfaces according to the slot they reside in (rather than their MAC address) for distributions needing to support single system images – at least, that’s one possibility that comes to mind. I have pinged a few people myself to see if this will be of use to that effort in general, and there are bound to be many more.
USB Console. Jason Wessel posted a 6 part patch series entitled “usb console imprevements series”, containing “aggregated and ported…usb patches I have previously posted which are not mainlined into a single series aimed at providing a stable [USB] console”. Jason began with a recap about what the problem with USB consoles currently is – that they are not synchronous (as opposed to regular serial UART consoles which are) and so will drop data on the floor if there is no room to buffer it when interrupts are disabled. The new code introduces intentional delay loops calculated through imperical testing using an FTDI USB part (a common part on many embedded boards, such as the BeagleBoard JTAG debugger sitting on this author’s desk).
In today’s miscellaneous items:
* some early dev_name() patches from Paul Mundt allowing early platform device code to use dev_name() before the guts of the driver core are online.
* This author was bitten by a recent bad commit from Al Viro that caused opendir() to succeed on regular files. I posted a question about it and was told that it had already been fixed. Indeed, it had.
* Ongoing debate happend about reducing the number of memory allocators in use on x86 systems, per a previous note from Ingo that there were 5 possibilities depending upon phase of boot and this needed to be reconciled.
* A rant from Finn Thain about a “coding style” fix patch for Macintosh that reduced a comment length to fit in 80 characters. Finn thought this was an utter waste of time, and repeated a comment often heard elsewhere, “checkpatch.pl is great but code that fails it is NOT always wrong.” and, ‘”Check patch” is a good idea but “check existing code” is a waste of everyone’s time. Sometimes, cleanup patches do more harm that good, for example a well intentioned “if” cleanup this week completely misunderstood how the identation is supposed to work and was also summarily rejected. Ben Herrenschmidt’s only response to this mini-rant was “Amen !”.
* Mitake Hitoshi concurred with Guangrong Xiao’s posted results showing an *improvement* in performance of userspace mutexes when lock trace events were enabled. Reproducer code was posted and confirmed.
* Some useful documentation was provided on Linux’s circular buffering and memory barriers support from David Howells.
* Support for specifying in the environmental variable context of a kernel emitted uevent whether it came because of a kernel_firmware() or a kernel_firmware_nowait() request was postulated by Johannes Berg (to handle the case of built-in drivers requesting firmware not in an initramfs). Kay Sievers pointed out that many events are re-triggered during boot and so the firmware loader cannot know what state the system is in, and therefore it might be better to leave requests for unsatisfiable firmware around “forever” until they are cancelled from userspace rather than trying to cunningly work around the issue of firmware not being present in an initrd context with special uevent environment variables.
* and the jabs at SELinux security labeling continued with Al Viro coming up with a few amusing retorts in the “Upstream first policy” thread and Ingo Molnar comparing SELinux relabeling wait times to fire doors, “we should prefer a one inch thick fire door that opens and closes fully automated to a five inches thick fire door that people keep always-open with a chair”. Ingo contends that all too often, people “turn off the whole thing” because of various frustrations and so there is less overall security than might be the case with a slightly less perfect system. Dave Airlie called SELinux relabels “the new fsck” and called for journalling.
In today’s announcements:
Benchmarks. Anca Emanuel announced some new Phoronix benchmarks for kernels 2.6.24 through 2.6.33, showing that performance has generally improved by 770% from 2.6.29 to 2.6.30 and only regressed very slightly in 2.6.32. Regretfully, however, 2.6.33 does not perform nearly so well, and, according to the Phoronix quote, “PostgreSQL performance atop the EXT3 file-system has falled off a cliff”. Full details are available on the http://www.phoronix.com/ website.
RT 2.6.33-rt6. Thomas Gleixner announced the release of version 2.6.33-rt6 of the RT patchset that he and others are continuing to develop against the 2.6.33 series kernel. As he mentions, there was an -rt5, but it was more of a separation point in the git tree. With the merging of some bits into that older tag, MIPS support rejoins the RT tree thanks to Wu Zhangjin. As usual, the RT patch is available on the kernel.org website, in the section devoted to such projects, or in the head (rt/head) and stable (rt/2.6.33) branches of the “tip” tree maintained by Ingo Molnar. Details: http://www.kernel.org/pub/linux/kernel/projects/rt/
The latest kernel release is 2.6.34-rc1.
Andrew Morton posted an mm-of-the-moment (mmotm) for 2010-03-09-19-15. Hiroyuki Kamezawa posted an updated version of his OOM notifier memory cgroup patches against this latest tree. Andrew later posted an mmotm for 2010-03-11-13-13. And in other “mm” news, Mel Gorman posted the 4th version of his “memory compaction” patches.
Greg Kroah-Hartman posted some review patches for stable kernels 2.6.33.1, and for 2.6.32.10. These were subsequently released.
Finally today, Robert P. J. Day asked whether it was still worth him running his “cleanup” scripts (that look for problems with kernel config options) after each merge window closes. Randy Dunlap thought “yes”, and was even more happy that Robert had posted his scripts for him and others to use. Details: http://www.crashcourse.ca/wiki/index.php/Kernel_cleanup_scripts Robert followed up later with another email saying that most of his popular cleanup scripts have now been posted, which is great.
That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

