Archive

Archive for March 15th, 2010

2010/02/21 Linux Kernel Podcast

March 15th, 2010 jcm No comments

Audio: http://media.libsyn.com/medi/jcm/linux_kernel_podcast_20100221.mp3

For the weekend of February 21st, 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

In today’s issue: AMD TSC, anon_inode flags, extents, LSI MegaRAID, md RAID, SSE, UML, and XZ.

AMD TSC. Mark Langsdorf (AMD) posted a patch entitled “Option to synchronize P-states for AMD family 0xf”, in which he reminded readers that AMD Family Oxf processors (that is AMD Athlon 64s and AMD Opterons) do not have P-State and C-State invariant TSCs – that is to say the TSC increments at the current frequency of the CPU core, and not at some fixed frequency that would be more useful to those using it as a timing source. It is nonetheless possible to scale the TSC readings to be used as a time source, if all CPUs in the system adjust their frequency at the same time and to the same amount. To do this, Mark modifies the PowerNow! driver with a new “tscsync” parameter. He reminds us that there are many other possible clock sources in a system, but customers want something particularly lightweight in some situations, like the TSC.

anon_inode flags. Matt Helsley noted that existing anon_inode interfaces often do not support flags that can be set by using fcntl(). He proposed a series of 4 patches to signalfd, timerfd, epoll, and eventfd that would allow the same flag behavior as their corresponding creation syscalls. Davide Libenzi, the original author of the anon_inode bits, signed off.

Extents. Jari Sundell reported an issue with sparse files on ext4 in which many extents nonetheless sequentially placed on disk were not merged by the filesystem. This manifested in the form of 3000 or more extents for a 250MB bittorrent download file (aside: bittorrent pulls many file pieces at once from many different sources and so relies heavily on sparse files).

MegaRAID. LSI posted to let everyone know that they were interested in an overhaul of the MegaRAID driver to support future HBAs. Rather than make a lot of changes to the existing code, they were interested in, and were encouraged to create a new driver for the newer parts. Matthew Wilcox may have detected a hint of reasoning behind why they had been a little resistive to not having a single heavily hacked driver and suggested an approach that could be used to “make your management happy” in effectively combining two drivers together into a single object file with two separate sets of PCI tables being handled and different functions within. Whatever the eventual decision, the thread ended there with no followup.

md. Justin Piszcz started a discussion thread entitled “Linux mdadm superblock question”, in which he asked about RAID superblock types. The older version 0.90 superblock format supports autoassemble within the kernel, whereby the kernel can automatically create the appropriate RAID device without having to use tools within an initrd/initramfs (the initramfs itself is not required in that case, otherwise it is if you want to use RAID). Justin wanted to know whether there were any benefits for a < 2TB RAID1 boot volume in moving to a higher versioned superblock without autoassemble support.

The conversation lead Peter Anvin to point out some issues with a recent change in mdadm, which now apparently creates 1.1 version superblocks by default. Peter noted that the 0.9 superblock format doesn’t make it possible to easily distinguish RAID partitions from whole volume RAID devices, but the problem migrating to 1.1 is that 1.1 uses the bootblock for its superbock and so can cause problems with bootloaders such as grub that result in people having to regenerate their entire disk if they want to easily boot with it. Version 1.2 of the md RAID superblock uses the same 1.1 superblock format but at a different location than the bootblock, and so Peter favors a default of using 1.0 or 1.2, but not 1.1 as the mdadm default.

The entire md RAID thread is worth reading because it took a tangent off into a lengthy debate about the merits of using (or being required to use) initramfses, time taken to boot using an initramfs (or if not using one – the plan is to remove autoassembly from the kernel for good, so good luck booting within an initramfs if you want RAID in the longer term), and tools such as AEUIO that can build a customized initramfs image. Of course, every distro and his dog have also re-invented initramfs creation.

SSE. There’s a long-standing philosophy of avoiding floating point (FP) or other general usage of optional compute units such as SSE, SSE2, and so forth from within the kernel itself. Using these units requires saving state, and that isn’t typically done (for performance reasons). However, these optional units can often handle very large word sizes and so can be useful for those seeking to optimize existing kernel routines. Luca Barbieri posted, starting a new thread entitled “use SSE for atomic64_read/set if available” to do just that on x86-32 systems as an alternative to some of the more complex code being used today (including disabling pre-emption very briefly). Peter Anvin and Luca got into a somewhat lengthy debate about FPU etiquette (especially with regard to Peter’s view that kernel_fpu_begin() and kernel_fpu_end() be wrapped around kernel calls to the FPU, and Luca’s view that this expensive state change could be skipped in the case that only specific registers need to be saved and restored in such situations as in his patch). Peter Zijlstra, though not objecting to a cleanish implementation, suggested that one might want to “run a 64bit kernel already”. In the end Luca decided to re-write his other patches explicitly in assembly to avoid future complications with GCC changes, and to hold off on the SSE piece in question until another day.

UML. Remember the work a few weeks back to bring initial task userspace stack sizes in line with those permitted by rlimit? Well it turns out that the patch was a little too restrictive and was causing UML (User Mode Linux) to segfault on startup. The issue was raised by a number of people, including Adam Nielsen, who was also told that it is not possible to run 32-bit UML instances on a host 64-bit kernel or vice versa. They must match.

xz. Discussion continued on the potential for migrating kernel.org over to use ZX format compressed files. Phillip Lougher offered some defense of the venerable gzip format, emphasizing its cross-platform nature (there are even completely separate implementations available in Java for the inclined), and Andi Kleen pointed out the relative availability of tools that handle gzip files or bzip2 vs. xz, but others seemed to agree that various contrived scenarios not that relevant directly to kernel developers don’t warrent holding off an eventual migration to some better compression format.

In today’s miscellaneous items: An updated version of the OOM killer rewrite was posted by David Rientjes (including a patch that treats task running on different sets of CPUs as unlikely to be interfering with oneanother), the third round of KVM patches for 2.6.34 from Avi Kivity (including 1GB page size support, and an initial implementation of “Hyper-V” support for those desperate enough to need or want to run a Microsoft virtual machine guest), some seqlock implementation cleanups from Thomas Gleixner, a “foruth [sic] general posting of the newest version of the AppArmor security module” that is essentially a rewrite of the existing AppArmor code to use the existing hooks in the LSM security infrastructure rather than custom VFS patching, Grant Likely posted “basic ARM device tree support” (yaaaay!), Denys Vlasenko posted another attempt at supporting split out function and data ELF sections (one section per function or data item – something that is great for Ksplice), and Microsoft revived their work in Hyper-V recently (Hank Janssen seems to be trying really really hard to do the right things).

In today’s announcements:

Gujin 2.8. Etienne Lorrain announced a new release of the Gujin bootloader. It has some really nice options for device emulation, El-Torito emulation for booting Live-CD images, and a lot more besides.

RT patchset 2.6.32.12-rt21. Thomas Gleixner announced an updated RT patchset containing “fixes and cherry-picks from all over the place”, as well as some tracer fixes. The short log includes two scheduler fixes, some futex fixes, and some architectural stuff for ARM support.

RT patchset 2.6.33-rc8. Thomas Glexiner also announced the first RT release for the 2.6.33 stable series kernel. Thomas says he is pretty excited about the stability of this latest patch series, and the overall patch size is still falling quite considerably. He ends, “We are zooming in, but there is still a way to go”.

util-linux-ng 2.17.1. Karel Zak announced the release of util-linux-ng 2.17.1. This latest release includes an option to fdisk to disable DOS-compatible mode from the commmand line.

The latest kernel release was 2.6.33-rc8.

Finally today, the end of an era. Christine Caulfield announced that she is orphaning DECnet support in the kernel, due to “lack of time, space, motivation, hardware and probably expertise”. Apparently, “judging from the deafening silence on the linux-decnet mailing list [she] suspect[s] it’s either not being used anyway, of the few people that are using it are happy with their older kernels.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: