Archive

Archive for July, 2009

2009/07/27 Linux Kernel Podcast

July 30th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090727.mp3

For Monday, July 27th, 2009 I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Abuse, dynamic ticks, HVM info, and the tty layer.

Abuse. We’ve seen filesystems in userspace, character devices in userspace, and even generic interrupt abstractions (from Thomas Gleixner). Now it seems only logical that we would also receive the blessing that is a block device layer implemented in…userspace. Zachary Amsden posted an (RFC) patch implementing such a scary notion. The interface uses ioctl()s to create the device, then passes bios to and from the kernel. I hate to think what’ll happen if someone tries to do swap-on-userspace block devices and runs into low memory situations. If this ever merges, there’ll be a lot of caveats. But Kudos to Zachary for getting this discussion going – I’m sure there’ll be a nice Linux Weekly News summary of the design if it persists.

Dynamic ticks. As you may be aware, modern Linux kernels support dynamic tick based timer interrupts. Under this new world order, Linux replaces the traditional timer tick with the scheduling of a (High Resolution) timer event to expire whenever it next would actually need to wake up. There is also some special case handling and a concept of going tickless (it’s not possible for the kernel to operate in this mode all of the time) in which the various CPUs might enter a lower power state, and so forth. But one problem that has existed up until now (on 32-bit systems) involved the inability for any sleep period to exceed about 2.15 seconds, due to the wrapping of a 32-bit quantity in the clocksource representing the hardware delivered time. Jon Hunter has a patch series that aims to detect and avoid this – potentially allowing idle systems to sleep for much longer periods of time than a few seconds.

hvminfo? A thread started elsewhere was brought over to LKML by Jan Kiska in which the various usual suspects have been debating how to properly express hypervisor capability bits, for which it turns out there are quite a few. There has been some concern that simply adding them to /proc/cpuinfo would eventually create a list similar in length to that currently representing CPU feature capability bits. For that reason, /proc/hvminfo has been proposed. This would be of special interest to those with Real Time systems, allowing one to determine whether Real Time determinism can actually be expected.

The tty layer. Those who follow LKML closely (or read LWN) know what’s coming, but I’ll save the plot twist until Tuesday’s summary. In any case, as you may be aware, the TTY (and PTY) layer provide support for a wide range of interactions with Linux systems, even (especially) today. This includes the terminal upon which this podcast is being prepared, the one used by your gnome-terminal, your 3G (also perhaps POTS) modem pppd session, etc. For something we all rely upon, this code has had little love in a long time. As Linux Weekly News reports this week, that is something Alan Cox has been trying to correct. He has reworked the locking, fixed some DoS attacks (and proposed a kernel hack for catching NULL pointer “execution” attempts), and otherwise attempted somewhat of an overhaul. But it seems every time you touch one thing in that code, another breakage rears it’s ugly head (largely because the various standards are wiggly). Today’s conversations started out being around a breakage in “kdesu”, but ultimately that fix turned out to be the least of Alan’s worries. Emacs makes some (admittedly rather awkward) assumptions about behavior on close() – that the TTY layer has completed processing and delivery to the other end – and then there’s a locking problem experienced by Stephen Rothwell (and probably others) on boot. All in all, the recent experiences had Linus calling to revert patches – something that is harder than it sounds due to layer dependencies, and the fact that actual series bugs still need to be fixed, especially DoS bugs, whatever happens.

In today’s miscellaneous items: a fix for a “section mismatch” in i386 init caused when CONFIG_HOTPLUG_CPU is enabled, since the code might be needed even after init completes (Robert Richter), some x86 fixes (Peter Anvin), a discussion concerning newer low-voltage MMC parts and how these may (or may not) be handled by the existing codebase (for which there is no maintainer), some updates for microblaze (Michal Simek), some perfcounter fixes for powerpc (either coming in through the powerpc tree or directly via Peter Zijlstra – that is currently to be decided), mention of an infinite loop in get_futex_key in 2.6.31-rc4 (Jens Rosenboom), a suggestion that checkpatch somehow enforce all new config options have a help summary (Johannes Berg), an academic question surrounding the navigation of task page tables in a physical memory image dump file (M. Shuaib Khan), some feedback discussion from Vivek Goyal concerning benchmarks performed on his latest IO scheduler controller patches provided by Gui Jianfeng in which a 7% performance loss was being observed (for “normal” writes), a large number (22) of patches implementing an WM831x hardware monitoring driver, that is fairly intrusive (Mark Brown), version 5 of a patch adding 1GB page support to KVM (Joerg Roedel), a weird lockdep problem in which David Howells believes lockdep is mischaracteristing two different locks as actually being the same one (a false positive), an RFC patch attempting to make AGP work with IOMMU (David Woodhouse), an issue with signal delivery not being guaranteed to reach a specific thread and yet being used to deliver performance counter events (which are thread specific) was raised (Stephane Eranian), a number of fanotify followups (Eric Paris), version 2 of a cleaned up, simplified RCU patch from Paul E. McKenney, some networking updates from David Miller, an attempt at full NAT support for IPVS, and Amerigo Wang noticed that setting CONFIG_SYSFS_DEPRECATED_V2 is required in order to boot recent kernels on older distribution userlands, such as RHEL5.

In today’s announcements: 2.6.31-rc4-rt1. Thomas Gleixner is back with another thrilling installment of the RT patch. As he summarizes in his announcement, a decision was made to skip over .30 and go straight from .29 to .31 in order to avoid having to play catchup (and other reasons). Thomas sent a very detailed and very useful announcement, which I can only summarize here, but for the full detail, refer to his announcement. In the latest release, interrupt threads become a simple extension of the mainline ones, so it’s now possible to schedule thread priority at the device (not interrupt line) level. Also, this patch introduces a major change to RT locking. Gone is the use of cunning compile-time hacks, and in is the introduction of the atomic_spinlock_t (which is similar to the old raw_spinlock – alternative names are welcome, but off-list, since Thomas wishes to avoid bikeshed painting exercises), that is used for a few specific locks that cannot be replaced with sleeping versions. Thomas also takes the opporunitity to cleanup semaphores (as previously reported), and has begun to use git on an initially limited basis. The next main target, once he returns from 10 days of vacation (helping run a summer camp for kids) will be shooting down the Big Kernel Lock for good.

The latest kernel release is 2.6.31-rc4, which was released last week.

Rafael J Wysocki followed up to his previous postings concerning individual regressions for which bugs on the kernel.org bugzilla instance had been filed with some updates – for example, realizing some issues had been introduced in even older kernel versions than those cited in the original reports. The number of regressions in the current RC remains somewhat concerning.

Stephen Rothwell posted a linux-next tree for July 27th. Since Friday, there are new “drbd” and “benh-mm” trees. The former is obvious (it’s the distributed replicating filesystem that we’ve covered on numerous occasions prior to now), while the latter is temporary only while an API change is made on Cell architecture. The powerpc tree still fails to build in an allyesconfig build configuration, and overall the tree lost build failures (when one considers also a reverted commit from the “rr” tree, and a patch Stephen did for the drbd to resolve a build failure in the latter). The total sub-tree has increased yet again, up to 134 trees at this point once more in the latest linux-next tree compose.

That’s a summary of today’s LKML traffic. For further information visit kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/26 Linux Kernel Podcast

July 28th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090726.mp3

For the weekend of July 26th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Fanotify, IPI, MM, Mutexes, and Scheduling.

Fanotify. Eric Paris posted to let everyone know (in a thread entitled “fanotify – overall design before I start sending patches”) that he plans to begin sending patches for fanotify in the next week or two. Meanwhile, he would like to receive any helpful comments about the existing design. As he notes in his lengthy email on the subject, fanotify is a notification system originally inspired by the anti-malware vendor’s perceived need to monitor and “approve” certain file operations before they are allowed to proceed. Eric has done a great job of moving this forward from the abstract concept it once was.

IPI (Inter-Processor-Interrupts). Xiao Guangrong posted a patch implementing a lockless version of call_function_data. This is used as part of the generic IPI signalling as part of the core Linux SMP support and presumably represented a bottleneck for Fujitsu. Andrew Morton says it looks good to him, and I have no further information about what this buys in real world performance.

MM. David Rientjes posted a patch that was a variation on the previous theme of inherited oom_adj (”badness”) values for tasks. Rather than have tasks simply inherit the oom_adj value, as required by the existing implementation (in which the value is literally shared between tasks during a clone), David introduces a new oom_adj_child (/proc/pid/oom_adj_child) that can (only) be increased from userspace to make cloned threads more likely to get killed.

[Correction: I mean one doesn't have multiple simultaneous holders of a mutex, since it's not a "counting lock", but I seem to have phrased this badly]

Mutexes. The Real Time patch performs various changes to the core kernel as it converts spinlocks into sleeping mutexes and generally converts semaphore users into mutexes aswell (since mutexes are essentially a special case of semaphore in which the lock is either held or not, but without the additional potential for multiple waiters). Thomas Gleixner and others have done a good job (as part of their work on the Real Time patchset) at locating the obvious mutex candidates and converting them over in the patchset, but there still remain various examples in the unpatched upstream kernel of obvious mutex conversion candidate locks. Thomas posted a 37 part patch series aimed at implementing the switch for many of these example locks. He requested one of the patches be merged immediately, since it is merely preparatory pre-.32.

Scheduling. Sen Wang posted a complaint about the Real Time scheduler algorithm. Apparently, when enabling rt_bandwidth (throttling), Sen is surprised (and deems rediculous) to see the idle task getting picked. But Peter Zijlstra points out that it is sometimes necessary to give other tasks a little time to run, and that this is why the throttle was implemented (I believe it’s around 5%) once Real Time tasks exceed a certain threshold. And, as Peter points out, if one doesn’t like the threshold, it can be disabled. Linux Weekly News previously covered this topic on their Kernel Page in greater detail, back when it was first introduced (unless I’m crazy).

In today’s miscellaneous items: some ALSA updates (Takashi Iwai), some ISDN updates (Karsten Keil), a potential SLQB allocator locking bug (Sebastian Andrzej Siewior), a potential serial USB regression (”Hartmut”, who goes by the email address of “e9hack” and doesn’t include a full name), version 2 of the previously mentioned patch series implementing userspace MMU mapping change event notification, version 7 of the IO scheduler IO controller patches (Vivek Goyal) – which includes a group_idling feature similar to CFQ’s slice_idle that is intended to aid with fairness and a lot of other changes, a fix to stop tracing in oops_enter() (Steven Rostedt), another suggestion that built-in modules are included in /sys (Tomas Carnecky), some v4l/dvb fixes (Mauro Carvelho Chehab), a number of consolidated kmemleak updates (Catalin Marinas) that are getting ready for the next merge window, a question concerning ext4 online debugging (Clemens Eisserer), some S390 patches (Martin Schwidefsky), wall time support for the ring_buffer (Zhao Lei), version 2 of the patch series previously covered implementing uid mount options for ext2/3, take two of the FAT root timestamp patches, version 3 of the kcore cleanup patches (Kamezawa Hiroyuki), and the addition of an EXPORT_SYMBOL fro kmap_atomic_prot as required by TTM (Thomas Hellstrom). Tejun Heo was unsurprised to learn that the patches he had previously posted – and explicitly said he was unable to test on real hardware – converting IA64 to dynamic per-cpu allocation did indeed not boot on real IA64 systems.

Finally today, Laurent Pinchart would like some advice concerning the preference for using kmap vs. kmap_atomic, and in particular the pressure placed upon the VM by the possibilities. He could use kmap outside of interrupt context, which is expensive (but needed infrequently), or repeatedly use atomic mappings from within the interrupt itself, brief in duration. He is concerned that keeping many pages kmap()ed for a long time is unplesant but perhaps less so that calling kmap_atomic 4500 times per second for a 640×480fps video stream. Perhaps some folks will offer him advice.

In today’s announcements: Linux 2.4.37.4. Although many have long since moved to 2.6, the venerable 2.4 series kernel remains widely used (especially in older embedded systems), and as such Willy Tarreau does a great job maintaining it for its users. In the latest release, a build error is fixed, NULL pointer security issues with mmap_min_addr are discussed (in the announcement), and various other minor fixes are provided also. Willy notes that the security fixes only really guard against faulty setuid root tasks, since only suitably privileged tasks can map the zero page in any case.

Containers version 0.6.3. Daniel Lezcano posted to let everyone know about the latest release of the Linux container “lxc” tools.

Git version 1.6.4.rc3. Junio C Hamano posted RC3 of the Git 1.6.4 release. The impending release was already covered in the last podcast.

Man Pages. Michael Kerrisk announced the release of the kernel manpages version 3.22 thereof. Since you might read his blog, I would like to also draw attention to his forthcoming book (No Starch Press) on the Linux kernel-userspace and glibc APIs. Watch for that in 2010.

The latest kernel release is 2.6.31-rc4, which was released by Linus last week. A number of regressions have been reported (including in tools), so it seems unlikely that we’ll be ready for a .31 final yet.

Rafael J. Wysocki took a break from being merely awesome to be more awesome in compiling a list of existing regressions between 2.6.30 and 2.6.31-rc4. The total number of reported regressions is generally increasing (a bad sign), having doubled over the past month, of which more than half an unresolved. Most of these are driver problems (as perhaps expected), however there are various core kernel concerns in there also. These include a boot failure (Gene Haskett), various suspend/resume problems, more tty layer instabilities, another lockdep limit hitting bug (didn’t we just raise the limit?), and a VM problem. And those are just the regressions Rafael posted without patches, there are a number of other issues for which patches are known to exist.

Greg Kroah-Hartman pushed out another round of stable kernel updates (2.6.30.3 and 2.6.27.28), which aim to resolve a boot problem some have experienced. On a related note John Hawley noted that requests to update the front page of the kernel.org web pages were worthwhile, especially in pointing out “long term” releases (such as .27) however the scripts currently used are aging and nobody has had enough time recently to overhaul those and deal with the other recent activities. One would generally assume geodns is more useful right now but obviously these scripts will get fixed up in due course.

Stephen Rothwell posted a linux-next tree for July 24th. Since Thursday, the tree still fails to build in an allyesconfig build configuration on powerpc, the sound tree lost its conflict, and the ttydev tree lost its build failure but gained another for which a patch was applied. The current sub-tree count in the latest compose remains consistent at 134 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/23 Linux Kernel Podcast

July 28th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090723.mp3

For Thursday, July 23rd 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: AMD(64), binutils, group scheduler fairness, MMC, (lockless) ring buffer, and removable USB serial devices.

AMD(64). Michael Tokarev posted asking why the kernel would generate the warning message “WARNING: This combination of AMDprocessors is not suitable for SMP” (in which “AMDprocessors” is printed as one word). Andre Przywara (AMD) picked up on the fact that this warning was coming from within KVM guests with the response that the qemu64 CPU model provides guests with an AMD K7 CPU with 64bit extensions. But only some K7 CPUs (Athlon-MP) were actually certified for SMP support, so this warning pertains to that, and of course isn’t a problem with KVM. Still, Andre tells us that a new “safe64″ CPU type using family 15 instead of 6 will be appearing sometime soon. As a last ditch solution to make the harmless warning disappear, one can also specify “-cpu host” to pass the host CPU capabilities directly to guests, losing many migration capabilities (to other CPUs) in the process, as Andre notes.

Binutils. The kernel relies upon tools such as GCC (compiler) and GNU binutils (linker) during build. Recent builds of binutils generate broken kernel builds, but Alan Modra points out that this is because the kernel linker script fails to account all generated sections, leading GNU ld to need to guess the location in the generated memory map for the kernel image. In the specific case in question, the kernel build breaks because a section named .data_nosave immediately follows an alignment directive, but the linker inserts another “orphan” section between these, breaking the alignment. Since the bug isn’t technically in GNU ld, the kernel linker script will probably need to be changed in any case, to avoid similar kinds of breakage occuring.

Group scheduler fairness. Bharata B Rao posted to let everyone know that the CFS group scheduler is currently not behaving (”completely”) fairly in tests and that this can be traced to a load-balancing consistency patch recently added that came from Ken Chen. The tests show a marked difference begining with post-2.6.29-rc1 kernels, so this regression has existed for a while.

MMC. A new MMC maintainer is being sought, as we have covered previously. Andrew Morton noted that he became the de facto maintainer as he is for “about 1000 other identifiable subsystems”, while others have offered to help. This gets more interesting though because Stephen Rothwell’s suggestion of replacing the (currently quite empty) “mmc” linux-next sub-tree with one containing some of the various pending patches was met (from Andrew Morton) with the (tangential) assertion that he could just jiggle his mm tree around so that he no longer pulled linux-next as part of it, and Stephen could then pull the core mm bits directly. Quite a tangent indeed.

Ring buffer (lockless). Lai Jiangshan wasn’t too happy with Steven Rostedt’s existing lockless ring buffer implementation, deeming it too complex (he actually said: “And there are no more than 5 guys in the world understand it, I bet”). His posted patch shifts the complication to the read case, though he implies that it remains lockless in the process of his changes. Steven said he’d take a detailed look at the patches, which he did. He noted that they don’t work as posted (failing stress tests), that he has documented the existing design, and that any new design should be similarly explained. As Steven summarized, “I’m all for simplifying the code, but it must still work at the end”. Steven also posted some unrelated tracing fixes.

Serial. Alan Stern wondered aloud about the correct handling for removable USB serial dongle devices, given a change in behavior between 2.6.30 and 2.6.31-rc. Previously, if a process (task) had a USB device open as it was removed, the device minor number would become immediately available. But this was changed so that that it could not be re-used until all references to it via open files were closed. Alan objects to this, but understands the (potential) security implications of a timing-related/DoS attack. Still, he would like to weigh up the options and consider allowing the minor number to become available immediately upon a device being removed.

In today’s miscellaneous items: a rebased to 2.6.31-rc4 patch from Seto Hidetoshi causing a forced panic on oops when using magic sysrq-c triggered crash dumps, to preserve consistency between dumps triggered on the command line and those triggered using the keyboard combo, a “uid” mount option for ext2/ext3 filesystems that behaves similarly to the same option on e.g. vfat – allowing those with removable ext2/ext3 to deal with differing uids more easily than might have been the system dependent case before (Ludwig Nussel), a fix to MODULE_SYMBOL_PREFIX handling (Rusty Russell – who also cleaned up the comments in kthread_stop since we can handle signalling dead threads now), version 4 of the “flexible array implementation” patch (Dave Hansen), some jfs updates from Dave Kleikamp (who’s email client never gives his name), some device-mapper fixes from Alasdair Kergon, some hardware breakpoint fixes (Thomas Gleixner), a number of lockdep patches (Peter Zijlstra), and version 4 of the asynchronous device actions patchset (Rui Zhang) that is designed to speed up individual device suspend/resume dynamic power management actions.

Finally today, Roel Kluin points out a read buffer overflow exploit in the smbfs filesystem code, and Valdis Kletnieks gets way too excited about keeping a piece of legacy USB trackball hardware that is failing – attempting to debug the USB stack shows some severe attachment to an outdated and unreplacable piece of hardware but it’s probably time to let it give up the ghost now.

In today’s announcements: Git 1.6.4-rc2 and the Git User’s Survey. Junio C Hamano posted to let everyone know that a new RC release of git 1.6.4 is now available, along with draft release notes. Separately, Jakub Narebski posted to let everyone know that a 2009 “Git User’s Survey” is now available at git.or.cz/gitwiki/GitSurvey2009 (which was typo’d in the announce mail). And don’t forget that the LinuxConf AU (LCA) 2010 Call For Papers was extended by another week (apparently not due to lack of submissions) until July 31.

The latest kernel release is 2.6.31-rc4, which was released by Linus on Wednesday.

Stephen Rothwell posted a linux-next tree for July 23rd. Since Wednesday, the tree still fails to build in an allyesconfig build configuration on powerpc, the v4l-dvb and sound trees gained conflicts against the davinci tree, and Stephen remerged an updated Linus tree at the end to get a build fix. The current subtree count remains at 134 following yesterday’s addition.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/22 Linux Kernel Podcast

July 27th, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090722.mp3

This Podcast is brought to you in association with the humble coffee bean, providing constant uptime since sometime yesterday.

For Wednesday the 22nd of July, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Checkpatch, checkpoint and restart, High Resolution Timers and tasklets, Linux 2.6.27.27, Tux, Userspace MMU notifications, VFAT, and multiple hardware watchdogs.

Checkpatch. Several people have been discussing the checkpatch script and its whining about extraneous whitespace in kernel patches. Pavel Machek asked whether it would be possible to add autofixing to checkpatch, to which Bartlomiej Zolnierkiewicz pointed out that there are two other scripts already in the kernel tree that do this – scripts/cleanfile, and scripts/cleanpatch. He notes that it might be worth checkpatch pointing this out on whining.

Checkpoint and restart. Oren Laadan surfaced with another round of checkpoint patches that enable the kernel to preserve the current state of running applications so that they can be resumed later from that point. This is similar to how processes (tasks) are suspended and resumed upon hibernate but allows one (in theory anyway) to resume tasks on an entirely new system, following a reboot, or in various other contexts. The latest version introduces a new system call entitled “clone_with_pids()” to preset the pid(s) of created processes (if they are available with the current namespace). The patch includes an FAQ that points out that this already has some limited real world use cases that could actually work, a guide to actually using the patch, and a number of hints at the sorts of features yet to come. Checkpointing is a hard Computer Science topic that none of the existing attempts have ever gotten quite right, but one has to give Oren credit for his persistence.

High Resolution Timers and tasklets. Peter Zijlstra posted a patch implementing a generic tasklet_hrtimer infrastructure (and accompanying tasklet_hrtimer_init exported symbol) that allows one to register a combination of a hrtimer and a tasklet that will be scheduled the moment that the timer expires. Thomas Gleixner posted this patch in the (shared) tip tree and requested that Linus pull into 2.6.31 since this is being requested by the network folks, who are waiting for it. Thomas noted that Peter has cleaned this patch up since the last posting, moving it to softirq.c and adding documentation and comments to aid others who might use it later. Separately, Thomas posted a bunch of other timer fixes for 2.6.31 in the form of a pull request of Linus.

Linux 2.6.27.27. Following on from earlier discussions concerning broken GCC code generation when specifying -fno-strict-overflow, the problem was tracked down to incorrect code generation in edd_checksum() in driver/video/fbmon.c. Linus was genuinely impressed at the ability for Troy Moure to seemingly stare at tens of thousands of lines of dissassemly diffs and work out where the code generation had failed, then concluded Troy might have instead simply looked for the section with the greatest change. But Troy instead replied that he had simply gotten lucky after looking at the boot output and persuing the radeonfb code on a hunch. As Troy says, “obviously I got a bit lucky that problem was actually basically where I started looking for it. But I figured even if I didn’t find it, I’d learn something about the radeonfb code. And who would pass up an opportunity to learn about that?”. In any case, Linus posted a patch that he thought might make the difference, which Krzysztof Oledzki noted did indeed fix his system, which could now boot, although there was still likely some broken code generation present because the console geometry was incorrect after running the affected radeonfb loop.

Tux. Some years ago now, Ingo Molnar (famous for inventing nearly everything, or at least re-writing it in an impossibly short period of time) wrote a static in-kernel web server named “tux”. This was in a time before dynamic web content (or even wide adoption of sendfile()) when slow machines struggled to push through large amounts of static content – and as Dave Jones points out (in referencing a humorous blog entry), especially to a certain high bandwidth internet user demographic interested in graphics and motion video – and so in-kernel optimization of such essential content delivery made a lot of sense. But times have moved on (and most Internet users desire dynamic content generation and streaming web video – especially the demographic mentioned before). Today, apache is more than good enough even for sites like youtube, and the mirrors on kernel.org (as Peter Anvin confirms in the same thread). So when Luis Rodriguez inquired as to the fate of the earlier khttpd that had inspired tux, he was told largely the same things – that there’s little incentive for an upstream in-kernel HTTP accelerator at this time. The whole thing served as a nice reminder of how far we have come since those days. And also gave us an opportunity to reference Dave’s humorous analysis once again.

Userspace MMU notifications. Roland Dreier posted an RFC patch series implementing userspace support for MMU notifications. This enables (for example) libraries in userspace using RDMA to track precisely when application code changes memory mappings via cals to functions like free(), munmap(), etc. It replaces the previous malloc hooks and other tricks with a robust implementation inspired by (but nevertheless not identical to) previous discussions (that were referenced). The patch series implements a new character device entitled ummunot that can be opened and upon which address range mapping change notifications can be requested of the kernel using ioctls. A fast path zero page offset mapping of the device allows quick checks for notification changes without (e)poll()ing, select()ing, or other syscalls.

VFAT. Something non-Micrsoft related. Jorg Schummer noted that standard FAT implementations (especially in Linux) cannot store any of the FAT root directory’s timestamps. But there is a hackish way to implement this via timestamps on the FAT volume label, as supported by Mac OS X. Jorg posted a patch that implements the same on Linux, via a “rootts” mount option.

Watchdog (multiple thereof). Simon Braunschmidt noted that he has an emlix board with two hardware watchdogs (woof! woof!) and would like to use both of them. The current watchdog implementation really only supports one, as Alan Cox confirmed in his comments – suggesting that previous attempts to add multiple watchdog support hadn’t really gone anywhere. Simon had posted some code and may well become the poor smuck who has to implement it officially.

In today’s miscellaneous items: support for named (and also empty) cgroup hierachies, version 3 of Dave Hansen’s “flexible array implementation”, some perf counters patches (Peter Zijlstra), some wireless fixes (John Linville), some NFS bugfixes (Trond Myklebust), some lockdep patches (Peter Zijlstra), some IRQ and scheduler fixes (Thomas Gleixner), some block fixes (Tejun Heo), some networking and IDE fixes (David Miller), and flashing LED support using GFS2 tracepoints in GFS2 (Steven Whitehouse). Valdis Kletnieks pointed out a lockdep “whinge” in the ext3/quota code as part of other unrelated bisection efforts.

Finally today, someone still cares about Amstrad’s E3 E-Mailer Videophone. I don’t think you can buy these any more (although they were very cheap, and I physically have three at home that I need to get setup – an update on current development is appreciated) but Janusz Krzysztofik posted some patches for the audio chipset nonetheless.

In today’s announcements: Linux 2.6.31-rc4.

The latest kernel release is 2.6.31-rc4, which was released by Linus on Wednesday evening (at 19:44 PDT). In his release announcement, Linus reminds us that this has been a “fun week”, filled with an exciting triple whammy of tools bugs to screw with our minds, and kernel builds. As Linus says, “and that was just the bugs that were outside the kernel”. Aside from the tools problems, things have been progressing reasonably, with lesser churn.

Stephen Rothwell posted a linux-next tree for July 22nd. Since Tuesday, there is a new “ext3″ tree (also containing some jbd changes – typoed in the announce mail), the tree still fails to build in an allyesconfig build configuration on powerpc, and the net tree gained 2 conflicts against the wireless-current tree maintained by John Linville. The total sub-tree count is now up to 134 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

LKML Podcast Update

July 23rd, 2009 jcm No comments

This week saw the first day with over 1,300 downloads of a single episode. That’s pretty exciting – it means that a number of people are interested in what’s happening with the kernel on a day-to-day basis. Typical listener figures are still more like 250-300 every day and then 500ish for a given episode over the course of a week, which I’m still fairly happy with even though it’s not quite as exciting. Around 1/5th of listeners are using iTunes, whereas most are now using Linux Podcasting software. Which is nice.

Anyway. Since you find this useful, I will keep doing it. But I might substitute catching up individual episodes for larger single “catchup” episodes when I’m travelling or too busy to do the semi-daily Podcast. Catching up the last two weeks took 5 hours on Monday evening at a popular coffee shop, which might not always be possible. It probably would have been better to do a bumper show and move on from there.

I’m going to try experimenting with a “Week in Review” section on weekends. I don’t know whether I’ll get to start this weekend, or whether it will be sustainable to do that, but I think some highlights from the week in a longer format would be useful to those who just like to listen on Monday mornings (of which there are already a large number – seems many use the Podcast instead of reading the list over weekends).

I’ve played with compression plugins, audio volume levels, and many other tweaks. But let me know if there’s something I can do to make the Podcast better. One thing I cannot do, however, is guarantee typo-free Podcasts or insert links to each item covered. I’m a guy doing this in my spare time, I’m not The New York Times Company (and let’s face it, their daily corrections are getting disturbing enough anyway) and I don’t have the time and resources to offer a full news service. Sorry about that :)

Jon.

Categories: Uncategorized Tags:

2009/07/21 Linux Kernel Podcast

July 23rd, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090721.mp3

For Tuesday, July 21st, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Block devices, flexible array implementation, I/O bandwidth controller, Linux 2.6.27.27 problems, Microsoft, modules, per-cpu, performance counters, taskstats, and VFAT.

Block devices. Alan Jenkins posted a patch implementing bdcopy(), which is a function that allows one to create a copy of an existing reference to a block_device, rather than relying upon spinlock protected access to bdget(). This is particularly useful in the case of the hibernation code, which needs to create a copy of a refernece to the active swap device but doesn’t want to call bdget() directly because it might sleep. bdcopy() is safe to call from any context and corrects the PM/hibernate regressions that some have been seeing of late. Thanks to Alan for tracking that particular fix down.

Flexible array implementation. Dave Hansen posted an RFC proposal for a flexible array implementation. This allows the kernel to create and manage “flexible arrays”, which are formed by single pages containing pointers to second level array objects. The idea is to make it easier to create and manage dynamic arrays in-kernel, reducing the need for large contiguous memory allocations (or calls to vmalloc for situations in which kernel virtual memory can be used, and where there is vmalloc room to make the mapping(s)). The patch introduces 4 functions: alloc_flex_array(), free_flex_array(), and the accessors flex_array_put() and flex_array_get(). As Andrew Morton points out in his reply, that yields 2MB worth of objects on 64-bit platforms using a 4K page size, which he hopes is enough for likely callers.

I/O bandwidth controller and BIO tracking. Ryo Tsuruta posted to let everyone know about the latest version of his patches implementing dm-ioband and blkio-cgroup, which implement I/O bandwidth limiting at the device-mapper level (per partition, per user, per process, per virtual machine, etc.) and I/O tracking using cgroups to identify the owners of any type of I/O. There is also even tracing support and documentation. This patch series is just one of several (three, I believe) alternative I/O bandwidth limiting patchsets around, with Vivek Goyal’s work being one of the obvious alternatives. It remains unclear whether the different groups are actually going to meet at Kernel Summit or some other event to reconcile a common solution.

Linux 2.6.27.27. There are some concerns about 2.6.27.27 and the introduction of a patch aimed at avoiding the use of a GCC compiler option named ‘-fwrapv’ that is reportedly buggy in gcc-4.1.x. Apparently, with this fix applied, some systems are failing to boot (gcc-4.2.4). The problem is that both -fwrapv and -fno-strict-overflow exhibit bugs depending upon which version of the compiler one has chosen. Linus Torvards noted that it might be best to simply restrict -fwrapv to GCC 4.2.x and newer. But the problems get more complex as others have chimed in with differing issues (including an inability for Marc Dionne to build upstream kernels using ccache on rawhide), leading Linus to believe there are currently three different tools issues hurting people (the last one being a Debian/sid binutils package failure). Later, Linus began analyzing the assembly (or rather dissassembly) of different kernel builds in an effort to determine why the compiler options were generating broken code.

Microsoft. It’s skiing season in hell this week, and season passes are available. Microsoft followed up to their initial patch postings with a confirmation that they plan to continue posting regular Hyper-V driver updates to Greg’s staging tree, and then on into the mainline kernel proper. The current work can be found in the linux-next tree (/drivers/staging/hv). All joking aside, this is great news. It publicly endorses the GPL and perhaps warms relations a little, although a patent promise would be far more useful. And for those thinking it’s simply April come early, Slashdot reports Microsoft also posted userspace GPL code this week.

Modules. Li Zefan posted a patch impementing tracepoints for module_load, module_free, module_get, module_put and module_request. He included sample output and received favorable feedback from Steven Rostedt (who is keen for Rusty Russell to comment as owner of the in-kernel module loader). Also today, Reinhard Tartler wondered if anyone really uses scripts/checkkconfigsymbols.sh to reconcile symbol requirements with config options (sometimes it is possible that a particular configuration requirement will be missed, and necessary module dependencies will not exist in the build, which obviously causes runtime problems). He pointed out that there is a growing tendency in the kernel for these dependencies to be missing, and even several typos of configuration requirements (e.g. CONFIG_CPUMASK_OFFSTACK).

Percpu. Tejun Heo posted a (not signed-off-by yet) RFC patch removing the legacy per-cpu allocator on IA64 systems (making it dynamic), and then another obvious subsequent RFC that removes the legacy per-cpu allocator functions completely. These are not signed off because Tejun hasn’t been able to test this on actual hardware, but only on the simulator (which had a few problems), and even then only in one particular build configuration. He is awaiting further testing from those with IA64 systems before proceeding. Separately, Tejun (who has obviously been busy) posted a patchset entitled “implement and use sparse embedding first chunk allocator” that enables the per-cpu allocator to use bootmem allocated memory directly, even on NUMA.

Performance counters. Jason Baron posted a perf utility patch building upon Peter Zijlstra’s initial support for tracepoints in the performance counters tools. Jason’s patch adds a ‘perf list’ and ‘per stat’ command, and makes use of debugfs to obtain this data. The use of debugfs is compounded by the fact that there are a variety of possible mount points for it on target systems (most kernel documentation was recently updated to refer to /sys/kernel/debug as the standard location but many – including this author – still steadfastly use /debugfs for the mountpoint out of sheer stubborn debugfs originalism), which necessitates Jason poking around in /proc/mounts. This is something Ray Lee thinks should be optimized so that perf will only do this in cases where the /sys/kernel/debug location is not the correct mountpoint since some systems have a very large number of mountpoints, making this expensive. On a related note, Arjan van de Ven posted a patch (that he noted elsewhere was a really tricky issue to track down) entitled “avoid structure size confusion by using a fixed size”, correcting a compiler issue in which struct perf_header would vary in size from one compiled file to the next.

Taskstats. Nikanth Karthikesan posted an RFC patch series implementing a netlink based notification mechanism on fork (refered to within the kernel as clone), allowing one to track the creation of new tasks without having to constantly poll and walk /proc. Nikanth points out that this can also be used by utilities such as iotop, which gains a performance improvement. As he points out, the existing polling process won’t scale.

VFAT. Andrew Trigell followed up to Pavel Machek’s rather terse comments about his previous math with a broken down summary of the combinatorial likelihood of crashing a Windows system when presenting it with his modified VFAT patch. According to Andrew’s figures, the likelihood of a single collision in a maximally full directory containing 32767 files is about 0.0052 or 0.5% when using an exponential birthday approximation (for those who are not math inclined, refer to Wikipedia for a summary of the “Birthday Problem”, “Pigeon Hole Principle”, approximations for collisions, and related topics of interest to Computer Scientists). Even this doesn’t necessarily result in a bluescreen on Windows, since that only occurs when Windows “fastfat” driver attempts to access two colliding files in quick succession. Andrew encourages others to check his math and makes a number of other comments surrounding VFAT that I won’t go into because I personally believe it best not to comment at all.

In today’s miscellaneous items: a request to pull from the notification tree (Eric Paris) to handle some fallout from the fsnotify conversion (a generic framework which replaces the existing backends to both dnotify and inotify with a single universal notification mechanism), an RFC patch series containing clocksource cleanups (Martin Schwidefsky – including use of the expensive stop_machine context for clocksource switches), an optimisation hack (also from Martin Schwidefsky) that caches the next timer interrupt on CPU sleep when running on NOHZ systems, an ALSA update (Takashi Iwai), some HID fixes (Jiri Kosina), a new regulator_get_exclusive() API (Mark Brown), some /proc/kcore cleanups (Kamezawa Hiroyuki), a patch switching i8042 to dev_pm_ops from Dmitry Torokhov (aside: dev_pm_ops was covered at last week’s Linux Symposium in the PCI suspend and resume presentation), some input and driver core updates (Dmitry Torokhov – the latter making pm operations a const pointer since it shouldn’t be changed by module users), a patch from Joe Perches implementing separate sections for printk format strings, some informative comments from Thomas Gleixner concerning the correct way to handle threaded interrupts for hardware level triggered devices in which the interrupt generation cannot be easily disabled, and a suggestion from Nick Piggin that it might be nice to remove PG_reserved (the Page Table Entry bit) and replace it with a more useful PG_arch_2 bit that could be used for e.g. pfn_is_ram.

The latest kernel release is 2.6.31-rc3, which was released by Linus last week (a more recent -rc4 release exists as of this recording however).

Stephen Rothwell posted a linux-next tree for July 21st. Since Monday, a new ecryptfs tree has been added and the tree still fails to build in an allyesconfig build configuration on powerpc. The total sub-tree count rises to a new total of 133 trees in the latest compose, with the addition of ecryptfs.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/20 Linux Kernel Podcast

July 21st, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090720.mp3

For Monday, July 20th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: BIOS, closed source graphics, hardware breakpoints, LTTng, Microsoft releases GPL Hyper-V drivers, UIO, and VFAT (redux).

BIOS. Siarhei Liakh enthusiastically posted a patch modifying the BIOS support to be in-line with the “BIOS32″ specification. That specification says things like that at most two pages of memory per BIOS32 service should be set executable and that the other protections of physical memory on PC systems between 640k and 1MB are not needed. This is all well and good, but as Peter Anvin pointed out, there is a wild disconnect between what BIOS specifications say and what BIOS writers actually do in the field. Siarhei admitted that this patch had only had minimal testing, mostly under virtualization.

Closed source graphics. Thomas Hellstrom started a thread entitled “DRM drivers with closed source user-space” in which he suggested that a lot of politicking over open source graphics drivers that have closed source userspace clients is only hurting progress. He argues that one should not reject a driver posted for inclusion solely because it has a closed source userspace client, but that one should instead request enough information from those submitting such drivers so as to understand the uses and any risks. He argues that open documentation and a security analysis of the driver itself should be sufficient for considering such drivers for inclusion.

Hardware breakpoints. Frederic Weisbecker posted an RFC patch series implementing a generic API for hw-breakpoints and adding support for perfcounters into the mix. There are some issues with only part of the breakpoint events being recorded with the perf tools and he is going to develop the patches more to correct that, but requests comments.

LTTng. Mathieu Desnoyers posted to let everyone know that version 0.149 of LTTng includes an experimental ASCII output module. He gave an example of the kind of output that can be generated, and included some documentation updates.

Microsoft Hyper-V drivers for Linux (Hell finally froze over). “It’s getting cold in here” was how Greg Kroah-Hartman summarized the situation in his blog posting that followed his public announcement on LKML that Microsoft was releasing Hyper-V drivers for Linux. Apparently, many months of discussions have lead to this 54 part patch posting that is initially targeting the staging tree, and is obviously released under the GPLv2. Expect to see a lot of people playing up the significance of Microsoft releaing GPL code, and for Hyper-V at that, and likely an LWN article on the topic (one would expect). On a side note, the LKML filters initially managed to eat the Microsoft posting – perhaps they are smarter than anyone really knew.

UIO. Michael S. Tsirkin posted version 5 of a Userspace IO driver for PCI 2.3 devices. This generic driver allows userspace tasks to bind to a hardware PCI device without using a kernel driver, so long as the hardware supports the PCI 2.3 specification that includes a generic Interrupt Disable bit in the PCI command register and Interrupt Status bit in the PCI status register for each device. The first user of this driver will be KVM and other virtualization projects since they can now easily give guest OS access to PCI 2.3 devices.

VFAT. On another semi-Microsoft related note today, and as if we didn’t need reminding that they are not our friends, Andrew Trigell posted another version of his VFAT patches. These seem to have a number of cleanups, apparently work with Windows 98 (albeit with some ugliness in the 8.3 filenames displayed) and work with most of the devices available to Jan Engelhardt and himself. His posting has more detail on the legal topics that I won’t cover here.

In today’s miscellaneous items: Catalin Marinas continued to find and point out potential memory leaks detected using Catalin’s kmemleak detection tool, a new version of the “enable x2APIC without interrupt remapping under KVM” patch (Gleb Natapov), a request from Lai Jiangshan that the “simplify sysrq-c handler” patch be reverted as it breaks tools like kdump, a fix replacing use of for_each_zone in the hibernation code with for_each_populated_zone (Gerald Schaefer) so that unpopulated ZONE_MOVABLEs don’t cause a BUG_ON on resume, some kbuild fixes (Sam Ravnborg), some tracing fixes (Steven Rostedt), some ponderings from Eric Paris about the need for a second mmap_min_addr_lsm, and Jason Wessel posted some earlyprintk reliability improvements for debugging using a USB device connected to an EHCI debug controller.

The latest kernel release is: 2.6.31-rc3, which was released by Linus over a week ago now.

Greg Kroah-Hartman released Linux 2.6.27.27 and 2.6.30.2 just before going off on some vacation. These were based upon the review patches posted previously.

Willy Tarreau released Linux 2.4.27.3. This contained a number of fixes, including one (CVE-2009-1389) that affected the r8169 driver that is used by many cheaper motherboards. Apparently it took Willy days to backport the appropriate 2.6 bits to the older kernel, so kudos to him for doing that.

Stephen Rothwell posted a linux-next tree for July 20th. Since Friday, the tree still fails to build in an allyesconfig build configuration on powerpc and several other conflicts were removed. The total sub-tree count in the latest compose is steady at 132 trees.

That’s a summary of today’s Linux Kernel Mailing List traffic, for further information visit www.kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/19 Linux Kernel Podcast

July 21st, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090719.mp3

For the weekend of July 19th 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Fast and fine-grained clock sources, kmemleak, KSM, large disk arrays, and NX kernel data page protections.

Fast and fine-grained clock sources. John Stultz posted an RFC patch discussing the need for CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks. These provide fine-grained timestamps, but only as of the last tick, and therefore save the (possibly very expensive) accessing of hardware. This is especially useful on Real Time systems when used in combination with the non-syscall heavy VDSO. The only real problem I can see is that this likely isn’t covered by any of the POSIX.4 specification(s), and yet exists in the same namespace as the other POSIX-defined clocks.

Kmemleak. Catalin Marinas posted an RFC patch implementing scanning for all kernel thread stacks. This will enhance the existing ability for kmemleak to locate memory leaks by looking for leaking pointers on the kernel stack(s). Catalin also posted a patch protecting kmemleak_seq start/next/stop with a call to rcu_read_lock() since such objects may a freed object reference.

KSM (Kernel Shared Memory). Izik Eidus reposted the KSM patches for consideration. KSM – Kernel Shared Memory – is a patch that aims to reduce the wasted (duplicated) memory footprint of virtual machines by scanning physical RAM pages for identical copies of the same data, replacing them with a single COW page intead, and various accompanying resource tracking. Andrea Arcangeli gave an excellent presentation on KSM at the 2009 Linux Symposium, the proceedings of which are now available. Those interested are encouraged to refer to the paper, and to the postings on the LKML also.

Large disk arrays. Neil Brown posted to point out that 32-bit Linux doesn’t handle devices larger than 16TB particularly well (the page cache is limited to a pgoff_t number of pages for example) and due to the prevelence of large disk devices now, one shouldn’t just assume that only 64-bit systems will have large block devices. Instead, it might be necessary to institute a policy – for example refusing to create devices larger than 16TB on 32-bit Linux.

NX kernel data page protection. Siarhei Liakh posted a patch expanding the functionality of CONFIG_DEBUG_RODATA to add protection for the main (static) kernel data area. The patch modifies several kernel linker scripts such that the kernel .text, .rodata, and .data sections always end on page boundaries.

In today’s miscellaneous items: a tracing fix (Frederic Weisbecker), a post from Barry Song removing the check for IRQ_DISABLED in interrupt thread functions (this appears to be a bad idea, and I’m waiting to see what Thomas says about it – I think it’s better to reconsider the existing code comment), a tracepoint for the timer event (Xio Guangrong), a fix to load average accounting and some tracing fixes (Thomas Gleixner), some lguest and virtio fixes (Rusty Russell), xfs tracing support (Christoph Hellwig), some sound fixes (Takashi Iwai), a PCI fix (Jesse Barnes), some ide-tape fixes (Borislav Petkov), improved rfkill support for hp-wmi (Alan Jenkins), and a question from Per Forlin as to the intended usage of the kernel DMA engine code, and how it might be extended to do the kinds of things that Per would like to do.

In today’s announcements: trace-cmd. A command line reader for ftrace. Steven Rostedt posted to announce trace-cmd, which is a command line reader for ftrace designed to be an alternative to the existing pseudo-text files. It doesn’t replace the existing interface, but complements it, and as a userspace utility is hosted in a git repository on git.kernel.org.

The latest kernel release is still 2.6.31-rc3, which was released by Linus last weekend. The biggest issue at the moment appears to be some lingering TTY issues related to ongoing cleanups, and several problem reports are ongoing.

Greg Kroah-Hartman posted a series of review patches for the 2.6.30.2 and 2.6.27.27 stable series. The former had 24 patches, while the latter had only 8. As usual, there wasn’t much time for public review – this case exascerbated by the fact that there were public security issues covered by the patches, and the fact that Greg was due to go on vacation.

Stephen Rothwell posted a linux-next tree for July 17th. Since Thursday, the tree still fails to build in an allyesconfig build configuration on powerpc, and several other build failures appeared in the tree. The total sub-tree count remains steady at 132 trees in the latest linux-next tree compose.

That’s a summary of today’s LKML traffic. For further information visit kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/16 Linux Kernel Podcast

July 21st, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090716.mp3

For Thursday, July 16th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Git, Moorestown, scheduler, and Xinterface.

Git. Yesterday’s announcement of a new git release (1.6.4.rc1) triggered some discussion (from Jeff Garzik) concerning the correct ways to create new git repositories, especially “bare” repositories as are often hosted on kernel.org, and the correct gitconfig entires involved in so doing. On a related note, this author is hopeful that someone will document the best practices for git tree management on kernel.org, including whether the use of “shared” repositories on git.kernel.org is actually to be recommended.

Moorestown. Jacob Jun Pan posted (obviously from Intel) with a series of ten patches implementing x86 support for the new Intel Moorestown MID platform. Moorsetown is not a standard x86 system, does not include non-MMAPed IO, doesn’t feature ACPI, and so forth. It does include Lincroft (a north complex with CPU, memory controller, and graphics unit), and Langwell (IO hub, system controller unit, etc.). The posted patchset includes a number of features.

Scheduler. Frederic Weisbecker posted a series of patches, including an update to one scheduler patch that drops a looping call to need_resched() in cond_resched(), since the check has already been performed elsewhere. This improves overall performance and was mentioned previously. Separately, Peter Zijlstra posted a series of other scheduler fixes.

Xinterface. Gregory Haskins posted a series of patches implementing “xinterface”, this is the successor to the original vbus code, which was tightly integrated with KVM and which Avi Kivity had suggested be split out. That lead in part to what is now irqfd/ioeventfd, but one of the remaining pieces (pointer-translation) had been outstanding. Greg’s latest patches aim to implement a generic means for tracking of guest memory slots and notifying upon changes to them, in a generic fashion that can be used by things like virtio-net, and other modules, at some point. This allows kernel modules that are external to KVM to interface with a running guest, as Greg explains.

In today’s miscellaneous items: some powerpc fixes (Ben Herrenschmidt), version 4 of the Zero Page patches (Kamezawa Hitoyuki), a suggestion from Catalin Marinas that it would be worthwhile creating a tree containing fixes to leaks found by the kmemleak detector (presumably intended for linux-next inclusion), an RFC patch from Luming Yu stating that the current hotplug memory code is making inappropriate assumptions about the meaning of the Hot Pluggable bit of the Memory Affinity Structure (the SRAT table in the ACPI spec), some timer fixes (again) from Thomas Gleixner, some perfcounter fixes (Anton Blanchard), version two of the VGA arbitration patches (Tiago Vignatti), a libata patch (Matthew Garrett) exposing information about port hotplug capabilities to userspace, some Blackfix fixes (Mike Frysinger), and a request for clarification from Ted T’so (of Alan Cox) as to whether there is still an outstanding PTY bug in 2.6.31 or whether that has been fixed by now (he had been experiencing a weird PTY failure after a number of hours).

The latest kernel release was 2.6.31-rc3, which was released by Linus over the weekend. Caleb Cushing posted to say that he has seen a 20-50% drop in packets since switching from 2.6.30 to 2.6.30.1, which if true, needs investigation.

Andrew Morton posted another mm-of-the-moment for 2009-07-16-14-32.

Stephen Rothwell posted a linux-next tree for July 16th. Since Wednesday, the tree still fails to build in an allyesconfig build configuration on powerpc, and several trees lost their build failures. The total number of sub-trees remains steady at 132 sub-trees in the latest compose.

That’s a summary of today’s LKML traffic. For further information visit kernel.org. I’m Jon Masters.

Categories: episodes Tags:

2009/07/15 Linux Kernel Podcast

July 21st, 2009 jcm No comments

Audio: http://media.libsyn.com/media/jcm/linux_kernel_podcast_20090715.mp3

For Wednesday, July 15th, 2009, I’m Jon Masters with a summary of today’s LKML traffic.

In today’s issue: Ceph, headers, MM, and unused symbols.

Ceph. Sage Weil posted version 0.10 of the “Ceph” distributed filesystem client. Apparently, this latest version fixes a number of bugs since the previous 0.9 posting. Sage asks “what [people would] like to see for this to be merged into fs/?”.

Headers. Michal Simek and Arnd Bergmann had a discussion concerning asm-generic pgtable.h and the belief (of Michal Simek) that this could be simplified as there a lot of functions shared by all architectures. Later, Arnd followed up, suggesting that he had previously missed the commonality of the pgtable.h implementations, and provided a patch with a common version.

MM. Ben Herrenschmidt posted concerning new 64-bit “BookE” powerpc systems. These “embedded-like” systems use software-assisted page table management like their 32-bit cousins (this author worked heavily on 32-bit PowerPC 4xx TLB management code back in the day) but also have a capability for a special form of multi-level PTE in which the RPN of an individual PTE is actually an array of PTEs from which a TLB can automatically create entries. This implementation necessitates the presence of the virtual address in the TLB freeing code, so Ben’s patch updates all architectures in the process of adding this additional information to all forms of these functions.

Separately but also on an MM note, Mel Gorman posted a series of MM patches. These included warning when a page is freed but has PG_mlocked set, ensuring that OOM killed tasks set TIF_MEMDIE and thus exit the page allocator, and a resend of a patch covered previously in this podcast, which will suppress warnings about order >= MAX_ORDER page allocations where the caller knows how to handle these and has set __GFP_NOWARN.

Unused symbols. Robert P. J. Day posted asking if there was now any value in retaining the EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL macros, intended to annotate exported symbols that should not be used by modules in the longer term, and scheduled previously for feature removal from the kernel way back when in the 2.6.19 timeframe. At this point, there is only one actual in-kernel user of these macros (libfs – simple_prepare_write).

In today’s miscellaneous items: The ext4 memory leak reported on Wednesday by kmemleak on a system under the control of Alexey Fisher seems to have been fixed by a patch from Aneesh Kumar, some ide-tape fixes (Borislav Petkov), a patch to the connector code (Mike Frysinger) such that it actually explicitly uses struct cn_msg everywhere (as is documented) rather than void pointers, asynchronous device actions (asynchronous power management) patches (Zhang Rui), ptrace regsets support for S+Core (Liqin Chen), a second version of the patch implementation gmtime and localtime from yesterday, following feedback from Andrew Morton (Zhao Lei), some DRM fixes from Dave Airlie, and an SMI workaround in pit_expect_msb for certain systems vulnerable to ill-timed SMIs triggering an incorrect pit calibration and CPU MHz value (Wei Chong Tan).

Finally today, Jeremy Fitzhardinge wondered aloud about the plumbing of splice into the Linux network stack. He is specifically interested in carrying pages granted by one Xen domain through the Linux network stack without copying, and asked Jens for his opinions on using splice to implement this. One complication he saw was that he does not easily have a struct page available for the management of the splicing, but thinks this can be solved.

In today’s announcements: Jesper Dangaard Brouer posted to announce that he has achieved 10Gbit/s bidirectional routing on standard hardware running Linux, using pre-release versions of Intel’s 82559 chip. Summing the totals for ingress and egress across several interfaces, Jesper is actually handling a total of around 38 Gbit/s. He plans to give a talk at LinuxCon on the subject of 10Gbit/s routing on Linux systems. Also in today’s announcements, version 2.16 of the util-linux-ng from Karel Zak, and version 1.6.4.rc1 of the Git SCM was posted by Junio C Hamano.

The latest kernel release was 2.6.31-rc3, which was released by Linus over the weekend. Current reports suggest stability is improving over previous RCs.

Andrew Morton posted an mm-of-the-moment for 2009-07-15-20-57, which contains a number of patches against 2.6.31-rc3.

Stephen Rothwell posted a linux-next tree for July 15th. Since Tuesday, the usb and its dependent staging tree were undropped, the tree still fails to build in an allyesconfig build configuration for powerpc, and two additional build failures necessitated older versions of the pci and dwmw2-iommu trees. The total sub-tree count in the current linux-next compose remains steady at a total of 132 trees.

That’s a summary of today’s LKML traffic. For further information visit kernel.org. I’m Jon Masters.

Categories: episodes Tags: