Archive

Author Archive

2010/03/07 Linux Kernel Podcast

March 18th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100307.mp3

For the weekend of March 7th, 2010, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Console, DRM, ext4, integrating tools, sensors, split function and data sections, union mounts, and versioning.

Console. Eric W. Biederman posted an intuitive patch for /dev/console opening, effectively ensuring that it is always available even if the root filesystem has no /dev. “This effectively guarantees that there will be a device node, and it won’t be on a filesystem that we will ever unmount”. Al Viro replied “hell yeah”, and took the patch “with thanks”.

DRM. This weeks thread length of the week prize goes to a thread entitled, “drm request 3″ in which Dave Airlie tried to pull some patches into the 2.6.34 merge window. These contained, “[f]ixes for default y + CONFIG_STAGING + CONFIG_DRM_NOUVEAU enabled”. Linus wasn’t very happy when he booted with these patches (nouveau interface version 0.0.16) and saw an error message saying “[drm] wrong version, expecting 0.0.15″. This lead to a rant about backwards compatibility, and that he hadn’t even been warned it would break existing user space (in his case, Fedora 12). Linus even found that the commit that introduced the breakage did so explicitly, but again noted, ‘why the hell wasn’t I made aware of it before-hand? Quite frankly, I probably wouldn’t have pulled it. We can’t just go around break people[s] setups. This driver is, like it or not, used by Fedora-12 (and probably other distros). It may say “staging”, but that doesn’t change the fact that it’s in production use by huge distributions. Flag days aren’t acceptable’. This lead on to a thread in which Linus and others (including Jeff Garzik) noted that Fedora 12 was shipping this driver in “production” and so more should be done to ensure that the kernel could be tested on older systems, while others said the driver was all along a “use at your own risk” driver (Jesse Barnes). Personally, this author solved the problem by using another graphics chipset a long time ago. Daniel Stone probably had the best solution, “fuck it, it’s Friday. To the pub”.

The DRM thread also deviated into a discussion of “Upstream first” as a distro policy, and then onto specific patches in other distributions that aren’t in upstream. For example, Ubuntu carrying AppArmor. That lead on to yet another tangent in which James Morris felt he was being personally attacked for the lack of the patches being upstream. Ingo Molnar (and later, Linus, who seemed to share a similar viewpoint – that there needn’t be only one security answer) decided to weigh in, noting that it had been “a few reasonable months after the last big security flamewar”, and wanting to see a “rehash or fair summary of the pathname versus labels arguments” (refering to the fact that SELinux uses file labeling and complex rules, while AppArmor uses simple file paths). Ingo feels that pathnames are a “far more fitting abstraction to any ‘human based security process’ on Linux than ‘labels’”. Ingo called out that there was a lot of security research based on labels but essentially said none of that mattered due to the difficulty of practically using label based security. Quoting Ingo again, “[i]n other words: [I] see [SEL]inux’s main failure in that it somewhat blindly aims for a security model that is sees as the technical most secure, while not being intellectually open to the fact that we very likely _cannot know in advance_ which of the models will make Linux more secure in the long run. It would seem Ingo would like AppArmor to be less of a “hostile competitor” and more of a “natural ally” to SELinux. The idea is that there can be two different security mechanisms for different use cases.

Ext4 performance concerns. Justin Piszcz had recently raised the issue of the relative performance of ext4 for “large” writes vs. XFS. Justin was seeing almost half the write throughput when using ext4 as opposed to XFS and was concerned. After asking various questions, to which the replies included that he should use “nice” numbers of disks (e.g. 9 for the specific RAID case he was looking at) that made no difference, the thread seemed to dry up without any concrete conclusions other than that a performance issue exists and requires some further investigation using blktrace, etc.

Integrating tools. Ingo Molnar, in a thread entitled “Re: KVM usability”, made some remarks about the relative virtues of having “unified repositor[ies]” in which both the kernel and userspace tools are combined in one place, such as with the Performance Counters tools. Ingo believes that one reason why Apple can “consistently out-develop Linux” is “in part due to there not being a strict [C]hinese [W]all between the Apple kernel, libraries and applications – it’s one coherent project where everyone is well-connected to each piece”. This maybe true, but it’s just as likely in this author’s opinion that Apple is benefitting from that, coupled with the fact that it owns every piece and can hand down edicts from on high about what every piece will do, and when. In any case, the thread is worth reading – it was surprisingly short given the potentially contentious comments that could have made great flamebait.

Sensors. Dima Zavin (Google) replied to Jean Delvare’s attempt to have the ALS (Ambient Light Sensors) subsystem pulled, saying that the kernel was on the road toward having one subsystem under drivers/ for ALS, one for Proximity sensors, one for Accelerometers, etc. all with similar interfaces, and that a better approach would be a single “sensors” subsystem. He offered to help work on just that. Jean was interested, but didn’t want to hold up having the ALS patches pulled, favoring reworking them later on. He was subsequently dismayed when Linus and others started asking why ALS wasn’t just using the input subsystem for events, saying that he didn’t care where the code went but that discussions had been ongoing for 5 months already and he didn’t want to hold things up for another 5 months when people decided to bring this up during the merge window rather than before. The conversation then took a tangent into different rate devices (some of these “sensors” can operate at many KHz, above what the “input” subsystem is intended for). Linus contended that these devices, just like joysticks, were input devices. The conversation appears to have stalled at this point without a resolution.

Split function and data sections. As some of you will know, various attempts have been made over the past year to add support for compiling the kernel with the GCC options “-ffunction-sections”, and “-fdata-sections”. These cause the kernel to generate one ELF section for each function or data related object, and make life very easy for optimization tools (that can remove whole sections) as well as kernel patching utilities such as Ksplice. Tim (Ksplice) Abbott was happy with the latest round of patches, though he did have some questions about the “rename kernel’s magic sections with compatbility with -ffunction-sections -fdata-sections” patch series, especially about where certain renames were being used. For example, he wondered aloud how renaming “.text.reset” to “.text..reset” would affect AVR32 systems, because he couldn’t see how the original “.text.reset” was being populated anyway (answer: it wasn’t). As Tim mentioned, he wanted input from Haaard Skinnemoen, who provided the comment on “.text.reset” amongst other feedback.

Union mounts. Valerie Aurora posted version 1 of an RFC patch series (against Al Viro’s for-next tree) entitled, “Union mount core rewrite”. This, as it implies, is a complete rewrite of parts of the code implementing union mounts. Val has previously written about the goals and implementation of her work in various LWN articles. Separately, Val wondered aloud whether it was now possible to have multiple read-only layers in union mounts.

Versioning. Paul McKenney posted a patch placing the SHA1 git hash of the latest commit in the kernel version line on boot if available, or “[Not git tree]” in the case that a non-git tree was use to build.

In today’s miscellaneous items:
Large numbers of git pull requests started to come in for 2.6.34 (including everything from core kernel to networking and sound), there were some further nested SVM patches from Joerg Roedel, a large number of KVM updates (including a lot of PowerPC bits, Microsoft Hyper-V patches, and some x86 emulator cleanup), a new “platform-drivers-x86″ git tree reference was added to the MAINTAINERS file (as maintained by Matthew Garrett, who posted a pull request for the latest bits also), a new generic x86 “NMI Watchdog” built upon performance events from Don Zickus (by way of Ingo Molnar actually making the pull request for Don’s previously posted patches), version 3 of the memory controller groups dirty page limits patches from Andrea Righi, an affirmation from Andrew Morton that the “Linux Checkpoint-Restart” patches could be posted to LKML following 2.6.34-rc1 (Oren Laadan also mentioned how the patches will refuse to do a checkpoint if they believe they cannot do so safely, reporting this back to userspace), the latest “compat-wireless” tree for stable kernel (2.6.32) users that contains the latest 2.6.33 bits from Luis R. Rodriguez, version 3 of a patch series providing for 512KB readahead rather than 128KB from Fengguang Wu, various trivial and staging patches from Greg Kroah-Hartman (as an aside, Alan Stern raised some concerns about the way Greg’s scripts generate those patches), a request to pull the Ceph distributed file system client into 2.6.34 (along with various input about changes made since the 2.6.33 merge request) from Sage Weil, some Performance (perf) Counters “live mode” patches from Tom Zanussi that allow perf data to be directly processed as it is captured “without ever touching the disk”, some paravirt (PV) extension patches for HVM (Hybrid virtualization support) in Xen from Sheng Yang, and Ted Ts’o complained about dynamic device filesystems with initramfses in a mini-rant about how 2.6.33 could not boot with an LVM root on his Ubuntu 9.10 userspace. He added that, “of course, the initrfamfs environment is so crappy that there are no debugging aids — not even a working pager”.

In today’s announcements:

Git 1.7.0.2. Junio C Hamano announced the latest maintenance release of Git version 1.7.0.{1,2}. The second .2 posting had a few minor patches since .1, including fixing support for GIT_PAGER. Whether or not it is technically an SCM, I will cease using that term in this podcast, following some feedback from listeners of this podcast.

LTP. The Linux Test Project was released for February 2010. The latest release comes with a reminder that there “has been multiple chnges for building/installing the test suite after the recent changes in Makefile infrastructure”. This month’s release didn’t come with any corrupt script warnings.

Userspace RCU 0.4.2. Mathieu Desnoyers announced version 0.4.2 of his Userspace RCU “urcu” library. It includes some patches from Paolo Bonzini adding generic uatomic ops support for architectures not explicitly supported by liburcu, including (effectively free support) for IA64 and Alpha when using GCC versions 4.0-4.5, and a bugfix in urcu-bp which is the “User-Space Tracing” version of the urcu library. Mathieu has asked me to point out that an patent exemption was made to cover use of RCU in LGPL code such as urcu, so my previous comments about GPL patent concerns were a little too severe.

The latest kernel release was 2.6.33.

Andrew Morton posted an mm-of-the-moment (mmotm) for 2010-03-04-18-05.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/02/28 Linux Kernel Podcast

March 18th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100228.mp3

For the weekend of February 28th 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

In today’s issue: Linux 2.6.33, ACPI, Cgroups, Checkpoint and Restart, OF Device Tree, Firmware, and x86 embedded.

Linux 2.6.33. Linus Torvalds announced the final release of 2.6.33 on Wednesday February 24th at 12:06pm Best Coast Time (PST). The final release includes a relatively small number of final fixes on top of rc8. As Linus says, the most notable thing may be the Nouveau integration and modesetting support. Others may notice the mainlining of DRBD and the fact that the AS IO scheduler is now gone (”since keeping it around and just causing confusion seemed to not be worth it any more. You’re supposed to use CFQ instead”). Daniel walker asked Linus whether he still planned to try a one week merge window this time, to which Linus said, “No. But I might do a ten-to-twelve day thing or something like that – just to make sure that anybody who tries to game the system and send their merge request late will get summarily ignored. So I’m going to stop being so predictable that people can tell that exactly two weeks after the last release is where the merge window closes, and if people want to make sure their stuff merged, I had better have a merge request in my inbox earlier than thirteen days after the release.” The pull requests started pretty much immediately, and with the usual vigor. Separately, Con Kolivas announced 2.6.33-ck1, which includes his BFS scheduler and various other “desktop” focused bits.

ACPI. Rafael J. Wysocki posted an RFC patch concerned with removing race conditions from ACPI event handlers. The first race concerns the execution of handlers while they are being removed, the second is a locking issue.

Cgroups. Andrea Righi posted an intruiging RFC patch series intended to provide per-cgroup dirty page limits. The idea is that the maximum amount of dirty pages a cgroup is allowed to have can be limited, and if a cgroup exceeds this count, it will be forced to perform write-out immediately.

Checkpoint and restart. Oren Laaden posted version 19 of his “Linux Checkpoint-Restart” patchset. As a reminder, these patches are intended to allow systems to handle failures by taking whole system checkpoints and restarting all activity from that point in the event of failure. The latest patchset is intended to address previous concerns from Andrew Morton and others, and is apparently able to checkpoint and restart both screen and vnc sessions, and support live migration of network servers between hosts. The project has a checklist of TODOs on its wiki: http://ckpt.wiki.kernel.org/.

OF Device Tree. Grant Likely asked Linus to pull in his OF device tree rework for 2.6.34. Grant has recently been working on ARM support, in addition to the PowerPC, Microblaze, and SPARC changes covered in this pull. Hopefully, OF device tree emulation will finally provide one mechanism for supplying data to the kernel that can be common across many different architectures, in addition to those that do “real” OpenFirmware in the vendor firmware.

Firmware. There was some discussion about kernel firmware versioning, and whether kernel firmware should be wrapped in a container format making it more suited to SO library style versioning. This happened in response to the folks behind the open sourcing of the Atheros WiFi firmware seeking advice on the best way to handle compatible and incompatible versions. David Woodhouse has advocated for the use of more library-like versioning, but was not a big fan of introducing the complexity of such wrappers. In the end it was decided that the kernel developer maintained linux-firmware package should provide firmware files of the form foo-$(API). Those wanting a sub-versioned file like foo-$(API)-$(VAR) could provide one if they so wish.

x86 embedded. Graeme Russ posted a very detailed and well reasoned description of his embedded x86 port, which is not in any way based upon PC hardware, in which he uses U-Boot to transition to 32-bit Protected Mode and directly calls the kernel’s “32-bit BOOT PROTOCOL” described in Documentation/x86/boot.txt. He was having some issues though handling kernel relocation that turned out to be due to documentation differences between the bzImage format and the current reality. Peter Anvin was his usually very helpful self.

In today’s miscellaneous items: A fix for SPARC32 from Rob Landley (apparently, SPARC32 has been broken since 2.6.28, which isn’t surprising since this author and most other Linux SPARC users seem to be running SPARC64 kernels), various debugging from Thomas Gleixner and John Kacur on the recent 2.6.33 RT patch, version 6 of a patch series intended to add lockdep-based diagnostics to rcu_dereference() from Paul McKenney, a series of PPS implementation patches from Rodolfo Giometti (useful for those needing accurate time sources on a serial line), a patch to increase readahead size to a default of 512K from Fengguang Wu (the previous default was 128K), a bunch of s390 updates for 2.6.33 final from Martin Schwidefsky (including kernel image compression “finally…after only 10 years”), some patches intended to document the rfkill sysfs ABI from Florian Mickler, some more nested SVM (virtualization within virtualization on AMD compatible systems) from Joerg Roedel intended to aid running Microsoft Hyper-V with nested SVM (which doesn’t quite work yet even with these according to Joerg), a number of rather cool gdb and early debug updates from Jason Wessel (who has now split kdb and early debug out into two separate trees), version 4 of the “concurrency managed workqueue” from Tejun Heo, a discussion about order 1 allocation failures started by Frans Pop (the failures were under GFP_ATOMIC, but Frans felt that they were particularly ugly given plenty of cache was available for reclaim), David Howells proposed removing EXPERIMENTAL from NFS_FSCACHE in order that it could be compiled into the standard Ubuntu kernel (since, as he says, “As Arjan van de Ven pointed out…the EXPERIMENTAL flag doesn’t mean that much any more”, and a lengthy discussion of linux-next “requirements” that is worth reading, if you have the time.

In today’s announcements:

iproute2. Stephen Hemminger announced release 2.6.33 of the iproute2 utilities that “includes bug fixes and support for all the new features in kernel 2.6.33. This integrates a number of minor bug fixes from Debian aswell”. The update is available at http://devresources.linux-foundation.org/.

RT 2.6.33-rt4. Thomas Gleixner announced version 2.6.33-rt{2,3,4} of the RT kernel patchset. This updates to Linus’ latest tree and includes a number of fixes to bugs reported by John Kacur and others. It is available from the usual location: http://www.kernel.org/pub/linux/kernel/projects/rt/ Thomas noted that “rt/2.6.33 branch is now stabilization only. The rt/head branch will follow linus tree from now on, so it will inherit all (mis)features which come in the merge window. Separately, John Stultz announced that he had forward ported Nick Piggin’s VFS scalability patches to 2.6.33-rc8-rt2, and that it applies to 2.6.33 without any collisions. He requested feedback as he had yet to do any serious stress testing with the patchset (yet).

The latest kernel release was 2.6.33.

Greg Kroah-Hartman released an updated stable Linux 2.6.32.9.

Finally today, Mikael Abrahamsson suggested that some TLC be given to the Wikipedia article on the Linux kernel as it “doesn’t even mention the new -rc system” (in the “development model” section of the article). He wondered if anyone who knew exactly what was going on could write up the new world order on that wiki page for the rest of the world to see. That does not seem to have happened as of this writing.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/02/21 Linux Kernel Podcast

March 15th, 2010 jcm No comments

Audio: http://media.libsyn.com/medi/jcm/linux_kernel_podcast_20100221.mp3

For the weekend of February 21st, 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

In today’s issue: AMD TSC, anon_inode flags, extents, LSI MegaRAID, md RAID, SSE, UML, and XZ.

AMD TSC. Mark Langsdorf (AMD) posted a patch entitled “Option to synchronize P-states for AMD family 0xf”, in which he reminded readers that AMD Family Oxf processors (that is AMD Athlon 64s and AMD Opterons) do not have P-State and C-State invariant TSCs – that is to say the TSC increments at the current frequency of the CPU core, and not at some fixed frequency that would be more useful to those using it as a timing source. It is nonetheless possible to scale the TSC readings to be used as a time source, if all CPUs in the system adjust their frequency at the same time and to the same amount. To do this, Mark modifies the PowerNow! driver with a new “tscsync” parameter. He reminds us that there are many other possible clock sources in a system, but customers want something particularly lightweight in some situations, like the TSC.

anon_inode flags. Matt Helsley noted that existing anon_inode interfaces often do not support flags that can be set by using fcntl(). He proposed a series of 4 patches to signalfd, timerfd, epoll, and eventfd that would allow the same flag behavior as their corresponding creation syscalls. Davide Libenzi, the original author of the anon_inode bits, signed off.

Extents. Jari Sundell reported an issue with sparse files on ext4 in which many extents nonetheless sequentially placed on disk were not merged by the filesystem. This manifested in the form of 3000 or more extents for a 250MB bittorrent download file (aside: bittorrent pulls many file pieces at once from many different sources and so relies heavily on sparse files).

MegaRAID. LSI posted to let everyone know that they were interested in an overhaul of the MegaRAID driver to support future HBAs. Rather than make a lot of changes to the existing code, they were interested in, and were encouraged to create a new driver for the newer parts. Matthew Wilcox may have detected a hint of reasoning behind why they had been a little resistive to not having a single heavily hacked driver and suggested an approach that could be used to “make your management happy” in effectively combining two drivers together into a single object file with two separate sets of PCI tables being handled and different functions within. Whatever the eventual decision, the thread ended there with no followup.

md. Justin Piszcz started a discussion thread entitled “Linux mdadm superblock question”, in which he asked about RAID superblock types. The older version 0.90 superblock format supports autoassemble within the kernel, whereby the kernel can automatically create the appropriate RAID device without having to use tools within an initrd/initramfs (the initramfs itself is not required in that case, otherwise it is if you want to use RAID). Justin wanted to know whether there were any benefits for a < 2TB RAID1 boot volume in moving to a higher versioned superblock without autoassemble support.

The conversation lead Peter Anvin to point out some issues with a recent change in mdadm, which now apparently creates 1.1 version superblocks by default. Peter noted that the 0.9 superblock format doesn’t make it possible to easily distinguish RAID partitions from whole volume RAID devices, but the problem migrating to 1.1 is that 1.1 uses the bootblock for its superbock and so can cause problems with bootloaders such as grub that result in people having to regenerate their entire disk if they want to easily boot with it. Version 1.2 of the md RAID superblock uses the same 1.1 superblock format but at a different location than the bootblock, and so Peter favors a default of using 1.0 or 1.2, but not 1.1 as the mdadm default.

The entire md RAID thread is worth reading because it took a tangent off into a lengthy debate about the merits of using (or being required to use) initramfses, time taken to boot using an initramfs (or if not using one – the plan is to remove autoassembly from the kernel for good, so good luck booting within an initramfs if you want RAID in the longer term), and tools such as AEUIO that can build a customized initramfs image. Of course, every distro and his dog have also re-invented initramfs creation.

SSE. There’s a long-standing philosophy of avoiding floating point (FP) or other general usage of optional compute units such as SSE, SSE2, and so forth from within the kernel itself. Using these units requires saving state, and that isn’t typically done (for performance reasons). However, these optional units can often handle very large word sizes and so can be useful for those seeking to optimize existing kernel routines. Luca Barbieri posted, starting a new thread entitled “use SSE for atomic64_read/set if available” to do just that on x86-32 systems as an alternative to some of the more complex code being used today (including disabling pre-emption very briefly). Peter Anvin and Luca got into a somewhat lengthy debate about FPU etiquette (especially with regard to Peter’s view that kernel_fpu_begin() and kernel_fpu_end() be wrapped around kernel calls to the FPU, and Luca’s view that this expensive state change could be skipped in the case that only specific registers need to be saved and restored in such situations as in his patch). Peter Zijlstra, though not objecting to a cleanish implementation, suggested that one might want to “run a 64bit kernel already”. In the end Luca decided to re-write his other patches explicitly in assembly to avoid future complications with GCC changes, and to hold off on the SSE piece in question until another day.

UML. Remember the work a few weeks back to bring initial task userspace stack sizes in line with those permitted by rlimit? Well it turns out that the patch was a little too restrictive and was causing UML (User Mode Linux) to segfault on startup. The issue was raised by a number of people, including Adam Nielsen, who was also told that it is not possible to run 32-bit UML instances on a host 64-bit kernel or vice versa. They must match.

xz. Discussion continued on the potential for migrating kernel.org over to use ZX format compressed files. Phillip Lougher offered some defense of the venerable gzip format, emphasizing its cross-platform nature (there are even completely separate implementations available in Java for the inclined), and Andi Kleen pointed out the relative availability of tools that handle gzip files or bzip2 vs. xz, but others seemed to agree that various contrived scenarios not that relevant directly to kernel developers don’t warrent holding off an eventual migration to some better compression format.

In today’s miscellaneous items: An updated version of the OOM killer rewrite was posted by David Rientjes (including a patch that treats task running on different sets of CPUs as unlikely to be interfering with oneanother), the third round of KVM patches for 2.6.34 from Avi Kivity (including 1GB page size support, and an initial implementation of “Hyper-V” support for those desperate enough to need or want to run a Microsoft virtual machine guest), some seqlock implementation cleanups from Thomas Gleixner, a “foruth [sic] general posting of the newest version of the AppArmor security module” that is essentially a rewrite of the existing AppArmor code to use the existing hooks in the LSM security infrastructure rather than custom VFS patching, Grant Likely posted “basic ARM device tree support” (yaaaay!), Denys Vlasenko posted another attempt at supporting split out function and data ELF sections (one section per function or data item – something that is great for Ksplice), and Microsoft revived their work in Hyper-V recently (Hank Janssen seems to be trying really really hard to do the right things).

In today’s announcements:

Gujin 2.8. Etienne Lorrain announced a new release of the Gujin bootloader. It has some really nice options for device emulation, El-Torito emulation for booting Live-CD images, and a lot more besides.

RT patchset 2.6.32.12-rt21. Thomas Gleixner announced an updated RT patchset containing “fixes and cherry-picks from all over the place”, as well as some tracer fixes. The short log includes two scheduler fixes, some futex fixes, and some architectural stuff for ARM support.

RT patchset 2.6.33-rc8. Thomas Glexiner also announced the first RT release for the 2.6.33 stable series kernel. Thomas says he is pretty excited about the stability of this latest patch series, and the overall patch size is still falling quite considerably. He ends, “We are zooming in, but there is still a way to go”.

util-linux-ng 2.17.1. Karel Zak announced the release of util-linux-ng 2.17.1. This latest release includes an option to fdisk to disable DOS-compatible mode from the commmand line.

The latest kernel release was 2.6.33-rc8.

Finally today, the end of an era. Christine Caulfield announced that she is orphaning DECnet support in the kernel, due to “lack of time, space, motivation, hardware and probably expertise”. Apparently, “judging from the deafening silence on the linux-decnet mailing list [she] suspect[s] it’s either not being used anyway, of the few people that are using it are happy with their older kernels.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

Updates coming!

March 4th, 2010 jcm No comments

Folks,

Sorry for the delay. I should have updates out before the end of the week. Thanks. Remember, this is a spare time project and takes a lot of effort to do properly.

Jon.

Categories: Uncategorized Tags:

2010/02/14 Linux Kernel Podcast

February 17th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100214.mp3

This podcast is brought to you by the colour blue and way too much coffee, together reminding you to check out the awesome power of the BeagleBoard Open Source hardware project at http://www.beagleboard.org/. My new Rev C. board was responsible for the delay getting this issue out…too much fun was had.

For the weekend of February 14th, 2010, I’m Jon Masters with a summary of the weeks’s LKML traffic.

In this issue: Linux 2.6.33-rc8, x86 bootmem, NFS, OOM, Performance Counters, Relaxation, Stack Sizes, and SysFS mutability.

Linux 2.6.33-rc8. Linus Torvalds announced the release of version 2.6.33-rc8 on Friday February 12th 2010 at 11:49 am Best Coast Time (PST), saying that he hoped it would be the last before 2.6.33 final. He added that, “A number of regressions should be fixed, and while the regression list doesn’t make me _happy_, we didn’t have the kind of nasty things that went on before -rc7 and made me worried”. This kernel includes fixes for the netfilter bugs that I discovered, as well as some KMS regression fixes. In a separate discussion thread started by John Hawley (warthog9), it was debated when kernel.org should move over to using xz (LZMA2) as a replacement for bzip2 compression (remember when bzip2 was trendy and new?). John proposed various migration options before the thread verred off into a discussion around when an eventual 3.0 Linux kernel would come, and what that would actually mean in practical terms – just an arbitrary future release? I expect that LWN will have a typically witty writeup of this discussion sometime this week.

Bootmem. Back in October last year, Ingo Molnar had stated that the kernel may not need the “bootmem” allocator on x86. At the time, he noted that there were 5 different allocators on x86, depending upon the boot stage (to say nothing of the other core allocator options): the generic allocator, the early allocator (bootmem), the very early allocator (reserve_early), the very very early allocator (early brk model), and the very very very early allocator (basically just build time allocation). By initializing the x86 page allocator earlier in the boot process, Yinghai Lu attempts to do just what Ingo had suggested, now in version 6 of his patchset.

NFS. Hirofumi Ogawa noticed (2.6.33-rc6) that recent kernels could not mount remote NFS version 3 shares, because of a userspace visible change in the kernel nfsd server. If he specified “vers=3″ at mount time, all was well, but the kernel was not falling back to v3 correctly when v4 fails due to a change in error handling. Bruce Fields noted that this change was actually intentional and that the userspace tools had been updated, but decided to revert the patch that caused this change for the time being – at least until the new versions of the mount tools are much more widespread than right now. Bruce sent a patch entitled (”informingly”) “2.6.33 fix” to Linus.

OOM. David Rientjes posted a patchset re-implementing the OOM killer, in the wake of a number of discussions concerning its brokenness. It includes a complete rewrite of the badness() heuristic, which he is then described in some detail within the corresponding patch. Quoting David, ‘The baseline for the heuristic is a proportion of memory that each task is currently using in memory plus swap compared to the amount of “allowable” memory. ” Allowble,” in this sense, means the system-wide resources for unconstrained oom conditions, the set of mempolicy nodes, the mems attached to current’s cpuset, or a memory controller’s limit. The proportion is given on a scale of 0 (never kill) to 1000 (always kill), roughly meaning that if a task has a badness() score of 500 that the task consumes approximately 50% of allowable memory resident in RAM or in swap space.”

Performance counters. Christoph Hellwig had complained that a patch had been merged back in September from Arjan van de Ven entitled “perf_core: provide a kernel-internal interface to get to performance counters”. That was intended to facilitate in-kernel use of the performance counters framework, but it was Christoph’s opinion that it had no users and should be reverted. Ingo Molnar countered that there actually were a growing number of users, now including the latest work by Don Zickus to create a generalized NMI watchdog handler.

Relax. Michael Breuer posted an interesting analysis of the implementation of the function cpu_relax on x86 systems. This function is called during spinlock spinning cycles in order to give the CPU a break (power management, etc.). Apparently, that function currently uses a nop, but both the Intel and AMD documentation recommend the PAUSE instruction instead (partly because it can be detected on recent CPUs and used to give special treatment to guest instances running under virtualization that are wasting CPU cycles when multiple vpus are allocated and some are spinning away). Arjan van de Ven, and others too, seemed to find this odd, and Artur Skawina wondered if this might be an odd alignment issue. Nonetheless, Michael detects a noticeable performance impact in various tests between these two instructions.

Stack sizes. The kernel contains various task startup code that will create a vma region for its stack use. Existing kernels make this size determination based upon the PAGE_SIZE for the architecture, even though this really is independent of the userspace code that will use the stack, and even given existing rlimits that might see the stack theoretically larger than has been allowed by system limits. Michael Neuling sent a patch to decouple stack sizing from PAGE_SIZE and to default to basing it upon the rlimit.

SysFS. Amerigo Wang posted an RFC patch implementing “mutable sysfs files”. The basic idea is that all potentially “mutable” (that is to say, files that may be yanked out from underneath at any time a hotplug or other operation occurs) files should use a specific API to avoid warnings.

In today’s miscellaneous items: An interesting discussion started by Salman Qazi (Google) centered around a missunderstanding of the ptrace API (and eventual iteration from Oleg Nesterov that the existing API sucks), a January XFS update from Christoph Hellwig (noting new support for netlink provided quota communication, better power saving in XFS kernel threads), Mel Gorman posted version 2 (v2r12) of his “Memory Compaction” patch series that is intended to “defragment” memory by reconciling GFP_MOVABLE pages, and another one of Al Viro’s entertaining rants, this time about pohmelfs and its use of direct access to the current->fs->{root,mnt} entries.

In today’s announcements:

Git version 1.6.6.2. Junio C Hamano announced an update to the 1.6.6 series of the Git SCM tool, releasing version 1.6.6.2. This contains a few fixes.

Git version 1.7.0. Junio C Hamano also announced version 1.7.0 of the Git SCM had been released. This is the latest official version and includes a number of behavioral changes to “git push”, “git send-email”, and other commands as previously noted in this podcast. Users should read the release notes before upgrading if they want to make sure they catch all of the improvements.

Linux 2.6.32.8. Greg Kroah-Hartman, apologizing for the slight delay due to a few crashes that had been reported and a need to verify a security fix, as well as various travel plans, announced the release of 2.6.32.8. It contains a few fixes 2.6.32 users really should have on their systems.

The Linux Storage and Filesystems Summit. James Bottomley announced that the annual Linux Storage and Filesystems summit will take place concurrently with the VM summit on the two days before LinuxCon in Boston (Sunday and Monday), on the 8th and 9th of August. Interested parties can visit either the Linux Foundation website, or email agenda topics to the program committee at lsf10-pc@lists.linuxfoundation.org.

Userspace RCU 0.4.1. Mathieu Desnoyers announced the latest release of his Userspace RCU implementation (remember, patent encumbered, but with a waiver for GPL projects). Version 0.4.1 contains a compilation fix for s390.

As a followup to last weekend’s kerneloops statistics, Arjan van de Ven also posted statistics purely for the 2.6.33 at that time. In his statistics, he showed that the most popular oops was in memcpy_toiovecend (found 391 times).

The latest kernel release is 2.6.33-rc8.

Andrew Morton announced an mm-of-the-moment mmotm for 2010-02-11-21-15.

Don’t forget to read my latest blog posting on jonmasters.org for more information on using the Cyclades TS-3000 with kgdb for remote target debugging, and don’t forget to support Jason Wessel’s proposed kgdb and kdb merge for 2.6.34. You know it makes sense to get this out there widely.

That’s a summary of the week’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/02/07 Linux Kernel Podcast

February 10th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100207.mp3

This podcast is brought to you by the awesome power of Jason Wessel’s kgdb patches, helping to support those who believe in kernel debuggers find hard to reach kernel bugs since 2009. Kernel debuggers: the way of the future.

For the weekend of February 7th, 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

In today’s issue: Linux 2.6.33-rc7, regressions, Google Summer of Code, IMA, OOM, and sys_membarrier.

Linux 2.6.33-rc7. Linus Torvalds announced the 2.6.33-rc7 release of the Linux kernel on Saturday, February 6th, 2010 at 2:44pm (14:44) Best Coast Time (PST). In his announcement, Linus remarked, “I have to admit that I wish we had way fewer regressions listed by this time, so I hereby would like to point every developer to” a link to a recent post to the linux wireless mailing list archive on gmane.org showing a copy of a recent email from Rafael J. Wysocki detailing known kernel regressions between 2.6.32 and 2.6.33-rc6 as posted originally to the LKML. He added, “But we’ve certainly fixed a few things, and it’s been a week, so here’s -rc7″. Most of the changes are in PowerPC defconfigs (default configs), but there are even more i915 updates, radeon KMS updates, and lots of other smaller bits all over the tree. Linus also wondered (in another email) whether it was worth making the .gz files any more given that bzip2 has been around more than long enough by now. Some thought the gzip files were still useful on systems without bzip2 or for some really slow systems that apparently handle gzip files more easily.

Regressions. Rafael J. Wysocki followed up to Linus’ 2.6.33-rc7 announcement (as he had also done with 2.6.33-rc6) with a list of outstanding regressions beteen 2.6.32 and 2.6.33-rc7. There are currently 20 “unresolved” issues in the list of regressions given. Rafael also noted that Maciej Rutecki has, “generously volunteered to work on the tracking of kernel regressions”. The work done by Rafael (and now, hopefully Maciej also) is very valuable to the community and we really do owe them our gratitude for helping out. Arjan van de Ven also posted a list of oops and warning reports on kerneloops.org from the week, including a very common ext4/quota issue in Fedora.

Google Summer of Code. Luis Rodriguez stated that, “Google has confirmed it will have a Google Summer of Code for 2010″, then mentioned that last year’s effort (4 suggested projects, of which 3 were accepted) resulted in only one success. Witold Sowa followed up saying that he didn’t know he was the only student who completed his project, but that the work to add an AP mode to NetworkManager, “with use of wpa_supplicant’s newly developed AP mode” was relatively easy to accomplish and so he had worked on other things also. Apparently, the initial GSoC work is now available in NetworkManager. Nonetheless, it sounds as if Luis is keen to see a higher than 33% success rate if any entries are accepted this year under the Linux Foundation.

IMA. Mimi Zohar replied to an email from Shi Weihua concerning a NULL pointer deference bug in the IMA security code (ima_file_free), which Al Viro and others had previously discussed solutions for.

OOM. Lubos Lunak and David Rientjes resurrected the OOM killer discussion again after Lubos posted some analysis of various KDE processes running on his system, and wondered why the OOM killer uses VmSize rather than RSS to determine tasks that should be killed (in other words, why should it not favor tasks actually resident in memory at the time?). This discussion has been had recently, and David Rientjes explained that the kernel favors overall VmSize in its calculations so as to catch memory leakers as a preference (which are often not resident at the time). David did seem to like the suggestion of catching the the child with the highest badness calculation before killing its parent, and posted an untest patch. He also suggested that the KDE process tree example was “a textbook case for using /proc/pid/oom_adj to ensure a critical task, such as kdeinit is to you, is protected from getting selected for oom kill”. Lubos replied with some very good points about how simply setting oom_adj doesn’t scale, and Balbir Singh was amongst those still favoring a switch to RSS-like accounting but with support for shared pages (for example “PSS”) eventually. Rik van Riel noted that he had no strong opinion one way or the other. David posted various patches proposing an alternative fine grained oom_adj mechanism.

sys_membarrier. Mathieu Desnoyers posted a three part patch series implementing sys_membarrier, a new system call that can be used to “distribute the overhead of memory barriers asymmetrically”. In particular, he wants it for his urcu userspace RCU implementation (for use within the synchronize_rcu call). Sensibly, Mathieu proposes incremental additions to each architecture (even though he believes that it “should be portable to other architectures as-is”), reserving the system call numbers now, then implementing gradually.

In today’s miscellaneous items: Matti Aarnio posted to let everyone know that a recently discovered hole in the bayesian filtering system as used by the vger.kernel.org mailing list server to reduce SPAM has been plugged (it had been possible to reach the list using a specific “backend” majordomo domain), Catalin Marinas decided to simply patch the USB HCD driver that had resulted in cache coherency problems when using USB storage (and noted that a followup posting to linux-arch would call for a flush_dcache_range function), some miscallenous rewrites of obsolete syscall handlers to use generic versions from Christoph Hellwig, a request for an opinion on mergeing the kFIFO rewrite in 2.6.34 from Stefani Seibold, a potential issue with the kernel implementation of LZO compression reported by Nigel Cunningham (for which he will switch back to LZF in TuxOnIce again for the moment), Stephen Rothwell wondered aloud whether Linus would really be interested in taking the percpu changes currently sigging in percpu “next”, and Mathieu Desnoyers announced he is switching email from his academic address in Montreal (where he recently completed his PhD around LTTng) to a consulting firm he is involved with at http://efficios.com.

In today’s announcements: Greg Kroah-Hartman posted review patches for the 2.6.32.8 stable series kernel.

Scott James Remnant announced the release of upstart version 0.6.5. It includes a large number of fixes, amongst which is the completion of the splitting out of libnih into its own project. There is a new /sbin/reload command for reloading upstart daemons, a restored sync() before reboot, improved documentation, and more goodies.

Junio C Hamano announced version 1.7.0.rc2 of the Git SCM, which includes a number of forthcoming behavior changes as mentioned in this podcast when discussing the rc1 release from the previous week.

Subrata Modak announced that the Linux Test Project (LTP) for January 2010 has been released. It now contains over 3000 tests. Separately, Garrett Cooper noted a rather severe bug in the top level LTP Makefile that could result in an “rm -rf /” in the wrong circumstances, suggesting that all LTP users comment out three lines from that file.

Willy Tarreau (re-)announced the release of 2.4.37.9. The previos 2.4.37.8 hadn’t actually contained the required e1000 backport with a CVE fix that had triggered the previous release. Willy noted, “I don’t know how I managed to do that because it once was OK and I could successfully build it. Well, whatever I did, the result is wrong and the issue it was supposed to fix is still present in 2.4.37.8. So here comes 2.4.37.9 with the real fix this time”.

The latest kernel release is 2.6.33-rc7.

Andrew Morton posted an mm-of-the-moment (mmotm) for 2010-02-03-20-09.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/01/31 Linux Kernel Podcast

February 10th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100131.mp3

This podcast is brought to you by the power of Al Viro’s ima_file_free fix, saving in-progress crashed podcast recordings since February 2010, and now powering the all new 2010 2.6.33 series Linux kernel with all wheel drive.

For January 31st, 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

In this week’s issue: Linux 2.6.33-rc6, ide2libata, kFIFO, lock types, netfilter connection tracking, netperf regressions, sparse, and USB storage.

Linux 2.6.33-rc6. Linus Torvalds announced Linux 2.6.33-rc6 on Friday January 29th 2010 at 2:20pm (14:20) Best Coast Time (PST), again describing it as containing “nothing earch-shattering”. About 50% of the changes were architecture updates, and 40% were drivers, with the remaining being mostly filesystem and networking updates. He called for people seeing regressions to begin making “loug noises”, since ‘things mostly should “just work”‘.

ide2libata. Bartlomiej Zolnierkiewicz posted a 68 part patch series entitled “ide2libata” that does roughly what it sounds like – it facilitates a conversion of sorts such that legacy IDE driver code can use a small “translation” layer to share source with the libata codebase. It doesn’t remove IDE but it does (allegedly) make it far easier to maintain both until IDE finally does go away. Alan Cox and others weren’t convinced. Alan thought that, “it will be a nightmare for maintenance with all the includes and the like plus the ifdefs making it very hard to read the drivers and maintain them”. He saw value in the effort, but more as a means to find subtle differences between drivers, and thought IDE was “drifting” a little too much to truly be described as in “maintainance mode” at this point.

kFIFO. Stefani Seibold posted an “enhanced reimplementation of the kfifo API”, which is apparently the last in the series of RFC patches intended to rework the kFIFO implementation (to be generic) without changing the existing API. Stefani included some analysis of the impact of the patch upon text section usage and found that it wasn’t much larger, but that the “hand optimized” inline code was substantially faster than the previous implementation.

lock types. Mitake Hitoshi posted an RFC patch (most for the review of Peter Zijlstra) that adds lock type information to the output of lockdep, as used by tools such as perf. As he points out, “Of course, as you told, type of lock dealing with is clear for human. But it is not clear for programs like perf lock”. On a related note, Frederic Weisbecker stated that he really liked the perf lock report layout, but would love to see a tree view that “can tell you which lock is delaying another one”. He gave varous examples of how this might be visualized as well as describing the benefits.

netfilter connection tracking. I discovered that one of my test systems was falling over on all recent 2.6 series kernels, when using KVM. I wasn’t alone (as I would find out later, looking at Fedora bug reports). The backtrace was variable, but typically involved some kind of IPv6 packet. After mailing the netfilter guys (”PROBLEM: reproducible crash KVM+nf_conntrack all recent 2.6″) and getting some general advice, I spent the entire weekend solid debugging the issue with the aid of Jason Wessel’s kgdb-next tree. The problem was that libvirt (the KVM server management daemon) would attempt to create a second network namespace (netns) on startup – just to see if it would be possible to also support containers – and autostart KVM guests started at that moment would crash because conntrack was missing various chunks of support code for dealing with multiple namespaces. This resulted in hash corruption, kmem caches that would get corrupted, and eventual panics.

netperf regressions in 2.6.33-rc1. Lin Ming performed a bisect analysis and determined that a “sched: Rate-limit newidle” commit had once again introduced a loopback regression (on the order of 50%) in the netperf benchmark, when run on an Intel Nehalem system. Lin assumed that this was due a large amount of rescheduling IPI (inter-processor interrupt) traffic, as evidenced by the perf top data, and /proc/interrupts output. Others could not reproduce this issue.

sparse. Tejun Heo posted a series of percpu patches intended to instrument modular use of percpu data, for the benefit of the sparse source checker utility recognizing that such data lives in a separate data section. Tejun included various descriptions within the individual patches, which only affect building when using the sparse checking tool.

USB mass storage. Catalin Marinas posted a message (mostly aimed at Matthew Dharm) concerning cache coherency of the kernel’s USB mass storage driver. In the case of Harvard Architecture (split I/D caches) ARM processor cores, when using PIO based USB host controllers, root mounted filesystems generating a page fault will only fault the requested page into the data cache, but the USB storage driver fails to call flush_dcache_page to ensure I-cache visibility and results in incoherency between the two. Catalin asked Matthew if he might add support for explicit flushes when doing PIO rather than DMA for IO. Oliver Neukum thought that this belonged in the HCD driver rather than USB storage, due to the wide range of possible underlying layers beneath USB storage, and Matthew Dharm agreed, “Given that an HCD can choose, on the fly, it it’s using DMA or PIO, the HCD driver is the only place to reasonably put any cache-synchronization code. That said, what do other SCSI HCDs do?”.

In today’s miscellaneous items: Chinang Ma posted a comparitive performance analysis between RHEL5.4 kernel 2.6.18 and upstream 2.6.33-rc4 in which he found a 0.8% OLTP performance regression, Simon Kagstrom send a “provoke crash” mail in which he described a module to force crashes for testing, Mark Lord wondered why he was seeing a large number of “page allocation failure” messages on upgrade from 2.6.31.5 to 2.6.32.5, a continuation of previous style discussions concerning 80 character line length “limits” in the kernel, a question from Andi Kleen as to whether the PnP probe code (for PS/2 mice in this particular instance) is racy as he experiences variable probe behavior, Christoph Lameter posted version 15 of “one of these year long projects to address fundamental issues in the Linux VM”, aka “SLAB fragmentation reduction”,Alex Chiang posted a patch to increase the maximum number of Infiniband HCAs per system from 32 to 64 in a “backwards-compatible manner” (hence only raising the limit to 64), and Al Viro posted an informative message entitled “Open Intents, lookup_instantiate_filp() And All That Shit(tm)” on his plans for handling atomic file open+possible create for NFS in the grand future.

In today’s announcements: Greg Kroah-Hartman announced the release of the 2.6.32.7 kernel (having previously announced the 2.6.32.6 earlier in the week and posting a series of review patches for 2.6.32.7). He also announced the 2.6.27.45 “long term release” kernel.

Clark Williams announced the latest version 0.63 of the rt-tests package is now available. This includes various utilities used to verify and experiment with the RT patchset that Thomas Gleixner and others maintain.

Mathieu Desnoyers announced the release of version 0.4.0 of his Userspace RCU library, which includes a few “minor API changes” as previously described. urcu is available for download at http://lttng.org/urcu.

Junio C Hamano announced version 1.7.0-rc1 of the Git SCM. The forthcoming release has a number of items in the draft release notes, including some behavior changes to “git push”, “git send-email” (no deep threads by default), “git status”, “git diff”, and various other goodies.

The latest kernel release was 2.6.33-rc6.

Andrew Morton posted an mm-of-the-moment (mmotm) for 2010-01-28-01-36.

Willy Tarreau announced version 2.4.37.8 of the 2.4 series kernel. It mainly includes fixes for a recentl discovered vulnerability in the e1000 network driver that could allow a carefully crafted frame to skip over filtering.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2010/01/24 Linux Kernel Podcast

February 10th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100124.mp3

For the weekend of January 24th, 2010, I’m Jon Masters with a summary of the week’s LKML traffic.

Linux 2.6.33-rc5. Linus Torvalds announced the release of the 2.6.33-rc5 kernel, noting that he didn’t “think there is anything earth-shaking here”. Mostly, the only new stuff was in the i915 and (new) DVB “Mantis” driver. Rafael J. Wysocki followed up with his usual list of regressions since the release of 2.6.32, for which there were no know fixes yet in Linus’ tree. The number has fallen a little, but there were still 23 unresolved.

devtmpfs. The devtmpfs filesystem is a shared memory filesystem used to mount /dev nodes that are needed even before udev starts on modern Linux systems (or for those systems that do not use udev, to provide a minimum environment). The suggestion had been made to remove the EXPERIMENTAL flag on its configuration option and enable it by default. The latter received complaints as a change in behavior that would be visible to users, even if many of them would need to have devtmpfs enabled for the most recent Linux distributions.

Interruptions. Steven Rostedt, and Peter Zijlstra did some analysis of the kernel source tree, looking for inappropriate setting of TASK_*INTERRUPTIBLE (which should never be done explicitly, and in general one should always use the set_current_state macro). They found a fairly large number of incorrect code paths and posted a list of “examples of likely bugs”. David Daney replied, asking what kind of barrier should be implied in using set_current_state, as pertains to the visibility of this assignment by other CPUs.

IO error semantics. Nick Piggin started a thread entitled “IO error semantics”, in which he raised the ugly issue of kernel IO error handling behavior once again, as he said he had done during Andi Kleen’s posting of HWPOISON patches. Nick sought to clearly define specific anticipated behaviors in response to “read IOs”, “write IOs”, and so forth – how many retries? etc. He also made the point that write IO errors should not invalidate the data before an IO error is returned to “somebody” (fsync or synchronous write syscall).

NOIO. Rafael J. Wysocki posted an initial PM patch implementing forced GFP_NOIO during suspend operations (preventing the kernel from attempting to allocate memory by going to e.g. disk to offload some existing unused pages), this was largely in reaction to specific issues with the Nvidia closed source binary driver, but was something that had apparently been on the cards for some time. The problem with the patch was that it changed the VM according to the state of the system, rather than relying upon drivers to do the right thing in using explicit GFP_NOIO allocations during suspend and resume routines.

In the week’s miscellaneous items: Tejun Heo posted version 3 of his concurrency managed workqueue patches, Peter Anvin proposed the rapid removal of CONFIG_X86_CPU_DEBUG (since all such information is already exposed elsewhere), the addition of “nopat” boot option documentation to Documentation/kernel-paramters by Jiri Kosina, ongoing discussion of generalization of certain PCI functions in the wake of and intention to merge various Xilinx PCI support bits, a cache coherency problem with mmaped writes on ARM systems posted by Anfei Zhou, a patch correcting priority inheritance deboosting in the RT kernel patchset to be POSIX compliant, Dimitry Golubovsky inquired as to the current state of UML (User Mode Linux, not the silly and pointless modelling technique) development, some Restricted Access Register (Intel MID platform) patches from Mark Allyn, and a large number of floppy (yes, floppy) cleanups from Joe Perches.

In the week’s announcements: Linux 2.6.31.12 and 2.6.32.5 (proceeded by the 2.6.32.4 kernel earlier in the week) were released by Greg Kroah-Hartman. Greg stated that he no longer intended to update the .31 stable kernel short of “something really odd happening”. Greg repeated his previous assertions that the .27 kernel would live on as a “long term” stable release (but probably only for 6 more months of viability), and that the .32 kernel would also be a “long term release” because a number of distributions were apparently basing their distributions around it. His efforts depend upon engineers working on those distributions to help.

Len Brown announced that the Linux Power Management Mini-Summit would be held in Boston on Monday, August 9th 2010, the day before the LinuxCon 2010. For further information, refer to http://events.linuxfoundation.org/.

Mathieu Desnoyers (whose excellent PhD thesis was published recently and covered by LWN) announced an updated LTTng 0.187 for the 2.6.32.4 kernel.

Junio C Hamano announced Git 1.6.6.1 is now available from the kernel.org site at http://www.kernel.org/pub/software/scm/git/. The latest version contains fixes for issues such as “git blame” not working when a commit lacked an author name, “git count-objects” not handling packfiles larger than 4G on platforms with a 32-bit off_t, “git rebase -i” not aborting cleaning if it failed to start the user’s EDITOR, some issues with
the GIT_WORK_TREE environment variable, and more besides.

Thomas Gleixner announced the release of 2.6.31.12-rt20 RT patchset. This was a forward port to 2.6.31.12, which included a number of RCU assumption fixes, the aforementioned PI POSIX compliance fix, and so forth. Thomas noted the delay in releasing a new version of the patch, but noted that various locking infrastructure changes had gone upstream (advancing the cause of mainlining various bits of RT). There will be no 2.6.32-rt, but will skip directly over to 2.6.33. He also let us know about a new “housemate” of his: http://tglx.de/~tglx/housemate.png.

Sorry for the delay in getting this episode released.

Categories: episodes Tags:

Updates coming!

February 9th, 2010 jcm No comments

Folks,

A couple of weeks of updates are coming, hopefully tonight. I am planning to get back into a routine here. Thanks for being patient!

Jon.

Categories: Uncategorized Tags:

2010/01/17 Linux Kernel Podcast

January 18th, 2010 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20100117.mp3

For the weekend of January 17th, 2010, I’m Jon Masters with a summary of the past week’s (and some holiday) LKML traffic.

Yep, we’re back, and ready for 2010.

In this weekend’s issue: A new format (version 3.0), async page faults, async suspend and resume, feature removal, mod_timer_msec, page allocator, raw_spinlocks, and Linus’ birthday.

A new format. Ever since I started doing these podcasts last May, it’s been a constant struggle to find the time each day to prepare and produce a show. It often takes upwards of an hour to prepare the material and produce the show. I have asked for volunteers, but clearly this will remain a one man effort for the moment. I’ve been sick for the past week (just working on day job stuff) and had a lot of hassles over the holidays that prevented me from pushing updates. It got me thinking about how I can make this show better – easier to produce, and more reliable. I have decided that in 2010 I will try to produce a daily show, but I will only “commit” to a weekly show covering the events of the past week (which will be longer format and produced on Sundays). Days that don’t have a show will be rolled into the next rather than producing multiple episodes that “catchup” the record (so I won’t bother covering a linux-next tree that’s already a week out of date). I would love to do a show every single day, but I need volunteers to help make that a reality (I need time to write on other projects and read a crazy backlog of books). What really matters is that there’s some kind of reference tracking what happened each day. So you’ll just have to make do with some episodes covering multiple days at a go. Unless you want to volunteer with production…

Async page fault. There was some discussion concerning asynchronous page faults. Hirouki Kamezawa originally posted a patch over the holidays that was now in its third iteration. “asynchronous page fault” support is intended to aid highly threaded applications deal with page faults without needing to contend for the per-process (containing multiple threads) mm->mmap_sem. This is achieved through a certain amount of speculative vma handling (knowing that it might be modified or unmapped without the protection of a lock) and falling back to taking the lock if the VMA RB tree is modified while walking it. There were some issues surrounding atomicity with the patches that Hiroyuki posted, while others (including benh) were concerned that some arches make various assumptions about mmap_sem being held. Peter Zijlstra followed up several days later with version 3 of the “speculative pagefault” patch series, making use of RCU for freeing vm_area_struct (VMAs). On yet another VM note, Gleb Natapov posted an updated round of “Add host swap event notifications for PV guest” patches (that allow guests to know about host page faults), and Mathieu Desnoyers posted version 2 of a patch series implementating a new system call that is named sys_membarrier (forces a process-wide barrier).

Async suspend and resume. Rafael J. Wysocki posted some updated benchmarks for his asynchrnous suspend and resume patches, for which he replaced the disk containing rotational media with a solid state device, and enabled KMS. His results suggest that asynchronous resume is 50% faster at suspending (300-350 ms vs. 600-700 ms) and much faster at resuming (1.1-1.2 seconds vs. 4 seconds). That might given Apple a run for their money.

Feature removal. Robert P. J. Day asked about Documentation/feature-removal.txt and whether it was going to be updated (since some items referred to 2005) soon.

mod_timer_msec. Arjan van de Ven posted a patch (intended for drivers) implementating mod_timer_msec, which can be used to set or change a timer for a relative amount of milliseconds. This allows Arjan (and also others too) to remove the need for certain drivers to work directly with jiffies and HZ.

Page allocator. Mel Gorman posted a patch from Corrado Zoccolo that divides freed pages into two classifications – those that have a high probability of being merged with their next-highest buddy, and those that do not. Those pages not likely to be merged in the near future are preferred on the freelist to those that might be mergeable in order to make higher order allocations available using the newly merged buddy pages, and reduce fragmentation.

raw_spinlocks. John Kacur, noting that Thomas Gleixner’s recent work had freed up the “raw_spinlock” name within the kernel for re-use, posted a number of patches converting existing spinlocks over to raw_spinlocks. This is required in the RT tree, wherein all spinlocks are be default converted to a sleeping variety, but for which certain locks must explicitly remain non-sleeping. Locks that must always be real true spinlocks should use raw_spinlock.

Finally today, Linus’ birthday. It was noted by a few posters that Linus’ birthday is December 28th. Linux Journal had an article, and as these things are want to do, Linus’ reply took the conversation off to a tangent about 387 co-processors, to which Avi Kivity added some remarks on the original design.

In today’s miscellaneous items: a new version of kFIFO (Stefani Seibold – and an even newer version on January 14th that is re-implemented and apparently does not require any changes to existing code users of the old kFIFO API), an XFS status update from Christoph Hellwig (who notes that ongoing work includes support for new event tracing code and mkfs.xfs default support for “lazy superblock counters”), a refresh of the jump labeling patches (v4) from Jason Baron, a patch from Dave Jones removing his name from checkpatch.pl (so people stop asking him about it), initial PCI support for Xilinx Microblaze (and a discussion about generic PCI support file locations), a modpost patch implementing support for ELF objects with greater than 64K sections – recall that this has to be handled specially in ELF – for use when compiling a kernel with -ffunction-sections and having e.g. an allysconfig, and a mini-rant from Stefan Richter concerning “Changelog quality” that is worth reading. Finally, Dan williams (the Intel one) is taking over maintainership of I/OAT.

In today’s announcements: LTTng version 0.186o for 2.6.32-rc8 and Userspace RCU 0.3.3 were both released by Mathieu Desnoyers on January 4th. He posted an updated Userspace RCU 0.3.4 on January 10th, that had some additional fixes.

LTP. Subrata Modak announced the December 2009 Linux Test Project release. It included a number of build system fixes for various distributions.

rt-tests version 0.60. Clark Williams announced the latest version of rt-tests on December 29th. It includes a new ‘pip’ (Priority Inheritance stress test) from John Kacur and adds an unbuffered output option to cyclictest for those parsing the output at runtime. The source is available on git.kernel.org.

smatch 1.54. Dan Carpenter announced version 1.54 of his “smatch” static source code checker tool for C programs such as the Linux kernel. His intention is for “smatch” to become “a smarter version of checkpatch.pl”. It includes cool things like a check for DMA use on the kernel stack.

SystemTap version 1.1. David Smith announced the 1.1 release of SystemTap. This includes better support for gcc 4.5 “richer” debuginfo, amongst other fixes. It’s available from http://sourceware.org/systemtap. Frank Ch. Eigler followed up to note that this release also includes a fix for CVE-2009-4273.

util-linux-ng v2.17. Karel Zak announced the latest 2.17 release of util-linux-ng, which contains a number of new features (an fallocate command, unshare command, wipefs command), and updates to libblkid, blockdev, fdisk, and other fixes.

The latest kernel release is 2.6.33-rc4. There was an rc3 too, but that was pretty small “due to the holidays”. 2.6.33-rc4 had few updates, but 40% of those were in DRM (nouveau and radeon in the staging tree, i915 updated too), and Linus called that out as being “unusual”. There were also some bootloader issues on non-x86 systems, but Linus figured “we’ll sort it out”.

Andrew Morton posted an mm-of-the-moment (mmotm) for 2010-01-15-15-34.

Greg Kroah-Hartman posted some stable review patches for 2.6.32.4, 2.6.31.12 (there had previously been a 2.6.31.10 that was replaced with a build fix in 2.6.31.11), and 2.6.27.44.

Stephen Rothwell posted a linux-next tree for January 14th (and announced that there would be none for January 15th). Since Wednesday, there was a new mtd-current tree, Linus’ tree still had a build failure for which he reverted a comment, the net-current tree lost its conflict, and the tip-tree gained a conflict against the kgdb tree. The total sub-tree count increased to 156.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags: