Sunday, July 15, 2012

The Linux Kernel Versioning and Development Cycle

The myriad versions of the Linux kernel released every now and then, can be quite confusing for a newbie to look at, for the first time. In this post, I would try to give an abstract of how the various kernels are numbered, and a rough idea of how the Linux development cycle goes on.

The Old Order:

Prior to the 2.6 kernel release, there were two types of kernels - Development kernels and the stable kernels. These were differentiated by their minor release numbers being odd or even (odd for development kernels, even for stable ones). So, for example, the kernel 2.4 was a stable kernel ('2' being the major release number, '4' being the minor release), while 2.5 was a development kernel. A Development kernel often used to be a chaotic mess, with the code undergoing massive changes to introduce new features, completely overhauling code paths etc. Once the development kernel started showing resemblance of some stabilisation, Linus would fork out a stable kernel release - so the 2.5 kernel series would stabilize into the 2.6 series. However, this type of versioning (odd minor release representing a development kernel) has been discontinued since 2004, and the 2.6 kernel series marks a breakaway from such semantics of versioning.

The New Order:

Starting with the 2.6 kernel series, the notion of having different development and stable kernel trees has been dropped. To quote Greg K-H (a leading Linux developer) - "Every release is a preferred release". The kernel development now ensues in the form of 'rc' releases (rc for release cycle) or 'revisions'. So, for a stable release say, 2.6.17 ('17' is the revision), a new branch 2.6.18-rc1 is forked out to accept new features etc., initially for approximately two weeks. The subsystem maintainers send multitudes of patches to be merged into the -rc1 branch. If someone misses out on the rc-1 branch, then he/she can send the patches for merging into the rc-2 branch. Thereafter, mostly it is the bug-fixes or regressions introduced in the -rc1 and -rc2 branches that are tracked and fixed in the subsequent rc releases. Once a -rc(n) branch stabilizes, the next kernel revision, in this case, 2.6.18 is released. A typical release cycle (say, from 2.6.17 to 2.6.18) typically takes around 2-3 months.

An additional thing here is that, since every kernel is a 'preferred' release, a stable kernel branch is also maintained by Greg K-H and Chris Wright. Typically, its the bug-fixes and security updates already present in an upstream tree, ie. one of the -rc releases, that go into the stable tree. So, if 2.6.18-rc1 includes an important bug-fix, it may be back-ported to 2.6.17 and the new 'stable' kernel released as 2.6.17.1. If there are more bug-fixes in say, 2.6.18-rc2 or 2.6.18-rc3 then they may be again back-ported to 2.6.17.1 and a new kernel released as 2.6.17.2. However, no new features are accepted when releasing a 'stable version' of a revision. Its almost always the fixes and the security updates. Following diagram sums it all up.



Another point worth noting here is that, the stable-branch (of Greg K-H and Chris Wright) isn't maintained forever, for a particular kernel revision. So, for example, once the 2.6.18 kernel is out, and we already have a stable kernel 2.6.17.4 (as in the figure), then perhaps there would be at most one more stable kernel release like 2.6.17.5, to back-port fixes from the 2.6.18 kernel (or not even that). However, beyond that, the stable revision (2.6.17.4/5) is dropped and the next stable release would be a 2.6.18.1 and so on. That said, exceptions do exist. For example, in case of Long Term Support (LTS, ex: 2.6.32 release), kernel vendors like Suse and RedHat might want to continue back-porting fixes from upstream branches even though a newer kernel release is there, to maintain support commitments of the LTS release (Greg K-H was till recently an employee of SUSE, and Chris Wright is an employee of RedHat, so this is entirely possible).

This model of development has been in place for quite some time now, with the minor renaming of the 2.6.40 release to a 3.0.0 release. Once, the 2.6.39-rc7 was out, instead of stabilizing it into a 2.6.40 release, Linus Torvalds renamed it into 3.0.0 release in order to do away with a rather inconvenient numbering system in honour of 20th anniversary of Linux (sic). See the original post on LKML here.
So, the only thing that has changed is that a stable release is marked as 3.x.y.z rather than 2.6.x.y. As mentioned in the link above, there have been _no_ major changes in moving from 2.6 to 3.0 series.

As of this writing, the latest stable release is 3.4.4 (based on 3.4 kernel) while the latest development release is 3.5-rc7). So, perhaps a stable kernel 3.4.5 can be expected soon, back-porting from 3.5-rc7 and subsequent releases.

Postscript:
It should be clear by now that the mainline tree maintained by Linus (linux/kernel/git/torvalds/linux-2.6.git) won't contain any stable-branch releases ie. it won't have a 2.6.32.1 release, it would have just the 2.6.32 release. The 2.6.32.1 release can be obtained from Greg's and Chris' tree (linux/kernel/git/stable/linux-stable.git). Similarly, if you are looking for the 3.4.4 stable release, look at the stable tree. The 3.4 release however, can be found in either of the trees.

References:
1. Greg K-H on Linux (http://www.youtube.com/watch?v=L2SED6sewRw)
2. http://www.kroah.com/lkn/
3. Linux Kernel Development - Robert Love

Thursday, April 26, 2012

OS: It all depends on how you say it

This is the best one-line definition of an Operating System, I have come across so far.
Source: http://tldp.org/LDP/khg/HyperNews/get/devices/whatis.html

"An operating system is essentially a privileged, general, shareable library of low-level hardware and memory and process control functions and routines."

It kind of, takes away the halo that surrounds the word 'kernel' and 'operating system'. Isn't it?

Sunday, April 22, 2012

Linux: Why 'to recurse' is NOT divine in the kernel

In CS folklore, it's often stated - "To iterate is human, to recurse divine". However, someone involved with kernel programming is likely to reject this belief outright - Reason being the limited size of the per process allocated 'kernel-stack'.

Although, this detail is often carefully pushed under the wraps, the fact is that whenever a process (task) executing in user-space makes a system-call, the kernel code starts utilizing the kernel stack to support function calls (being made while executing the kernel code). Now, this per-process kernel stack happens to be very small - generally 2 pages, which roughly translates to 8 KB or 4 KB, depending on page-size. And this size is fixed: the kernel stack can't dynamically expand like the user-space stack, and therefore, we don't have the same kind of liberty to define random local variables or make recursive calls in the kernel-land as in user-land. Also, for the x86 architecture, the data structure that defines each process (task): task_struct, is stored at the end (end as in at a lower memory address) of kernel stack (for stacks that grow down towards lower memory addresses), thus effectively reducing the size of usable kernel stack. So, if the stack pointer keeps on increasing in lieu of say, repetitive function calls, the kernel stack may ultimately encroach upon the 'task_struct' object for that process, corrupting it, and thus leading to a kernel crash. However, in practice, the fixed kernel-stack size of 8 KB (or 4 KB) has been found to be good enough.

The rationale of keeping task_struct at the bottom of kernel stack is that, in the x86 architecture, we have few processor registers, and this approach allows us to extract the task_struct of a process through the stack-pointer itself (stored in %esp). If the page-size is 8 KB, then masking off the lower 13 bits of the stack pointer, gives us the address of task_struct object. If the page-size is 4 KB, just mask off the lower 12 bits off the stack-pointer, and we have the address of task_struct.

* task_struct was used prior to 2.6 kernel release; in later releases, thread_info struct is used, which has a pointer to the task_struct. However, thread_info also resides at the end of stack in x86 architectures.

Reference:
Linux Kernel Development, Robert Love (3rd Edition)

Saturday, April 14, 2012

Address Alignment of a struct

Consider the following struct on a 64-bit Linux machine:

struct example{

unsigned int x;
unsigned int * ptr1;
unsigned int * ptr2;

}

Now the general rule that is followed in alignment of fundamental data types (int, short etc.) is that they are "self-aligned" - which means that a variable gets aligned at a z-byte aligned address where 'z' is the size of the variable. So, a unsigned int variable on a 64-bit machine would get aligned at a 4-byte aligned address (unsigned int is of size 4 bytes on a 64-bit machine that follows LP64 standard - which Linux does) and any pointer at a 8-byte aligned address.

However, it turns out that this pretty intuitive "self-aligned" rule is applicable only for the fundamental data types. For user-defined structs, as in the example above, the rule is a bit different. Instead of the struct getting aligned according to the alignment requirements of its first data member (this is what I expected initially), it gets aligned according to that data-member which has the largest size, and consequently strictest data-alignment requirements. So, the 'struct example' above would start at an 8-byte aligned address (because of the 8-byte pointers), even though its first data member has a looser requirement of a 4-byte aligned address.

Things like these become extremely important when you are unlucky enough to be reading a raw dump of kernel memory with just a struct definition in hand. Even the slightest of mistakes like this as outlined here, can throw all your analysis out of the window.

References:

2. Also, see this for a nice table showing the sizes of fundamental data types on 32 bit and 64 bit machines following various standards (including LP64 - the one followed by Linux).




Monday, April 2, 2012

SLEEP: There is more to it than meets the eye

Today, I learnt a new thing. We all remember from our OS class that a Process (referred to as 'task' in Linux) can be in 'Running', 'Ready', 'Blocked'/ 'Sleeping' state etc. etc. However, in Linux, the 'Blocked' state is really represented by two states: TASK_INTERRUPTIBLE & TASK_UNINTERRUPTIBLE.

Following are the five states defined for a process in linux kernel 0.01:
Source File: include/linux/sched.h

#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
#define TASK_ZOMBIE 3
#define TASK_STOPPED 4

TASK_INTERRUPTIBLE: As the name suggests, a task (process) in this state can be interrupted by a signal. Thus, a process in this state has two ways to be woken up from its slumber:
1- Event notification for the event for which this particular process went to sleep in the first place.
2- A signal interrupt came (say, SIGKILL, SIGTERM etc.), and the process returns from sleep after properly handling the signal through a signal handler.

TASK_UNINTERRUPTIBLE: As the name suggests again, a task in this state can't be interrupted; not even by the old faithful `kill -9 ` command. It so happens, that sometimes tasks are at such a point in their execution history that they are not expected to be ever interrupted by any signal and they wake up from sleep only on getting a notification for the event, for which they went to sleep. One reason to do this that I know of, is to avoid corruption of kernel data structures.
A typical example of a task in this state, is of a task that is waiting for IO to complete on an underlying hardware device. That is why, you can't kill a task waiting on completion of IO even by `kill -9`. Another example is of a task trying to access NFS mounted files. And if by chance, your NFS server is down, then you can pretty much expect the task to sleep infinitely.

References:

Saturday, March 31, 2012

Setting Picasa Photo Viewer as the Default Image viewer in Linux Mint (12)

There are some things in life, you can't just compromise upon. Image viewer is one of them and Picasa Photo Viewer happens to be my favorite amongst all that I have used till date. So, when I moved to Linux Mint (Lisa), I was itching to use the same on Linux too. As it turned out, some gentleman (caiacoa) had designed a wrapper (it actually still runs through wine) for Picasa Photo Viewer and to open a particular image, one had to just type

/usr/bin/PicasaPhotoViewer <Image-File-Name>


However, it is rather cumbersome to go to the terminal every time to open a image(!), and if you wanted to set the default image viewer through 'right-click->Properties', Picasa Photo Viewer just won't show up in the list. In Ubuntu, we do have a field (a text box) to specify the path of the app to open a particular file; but in Linux Mint, they don't have even that.

The solution that I found out, is as follows:

The problem basically was to get 'PicasaPhotoViewer' listed in the list of applications, when one does a right click->Properties->Open With. And this link got me going (see last point: TroubleShooting).

This was just a minor variation of what is mentioned in the link above. Instead of PicasaPhotoViewer.desktop being located at /usr/share/applications, it was located at /usr/local/share/applications/PicasaPhotoViewer.desktop. As also mentioned on the linked page, adding a %U to the Exec line did the trick.
ie.
< Exec=PicasaPhotoViewer # Goes out
> Exec=PicasaPhotoViewer %U # Comes in

Picasa Photo Viewer would now show up as an app to open images, and rest is easy.

P.S: Btw, I also found a small bug in /usr/bin/PicasaPhotoViewer script. Would post on it, once it is confirmed by the developer Irakli Gozalishvili.

/etc/motd

Ever wondered, what does 'motd', as in, /etc/motd - the ubiquitous file that contains the 'welcome message' to greet any user who logs in - stand for? Well, it stands for - "Message of the day". Pretty guessable, I guess.

Thursday, March 29, 2012

File Name pattern matching versus Regular Expressions

"Wherever women are concerned, the unexpected always happens!" - so goes the saying. However, I would say, add 'unix' to it too.

So, it happened that I was using 'find' command to find a certain set of files:

  • find . -name "filename*"
Now, having been brought up on the diet of regular expressions, I naively assumed that "filename*" must be also acting as a regular expression. However, as it turned out instead of being a regular expression, it is what they call a 'shell pattern'. To cut the long story short, the semantics of special characters in shell-pattern is different than that in regular expressions.
For example, the asterisk (*) means matching zero or more of any character in shell-pattern while in case of regular expressions, it means matching zero or more of the previous character, which could change the meaning entirely depending on context.

Find more on this at:


hello world

This blog is going to be a dump of all the interesting (annoying?) things that I come to know about Unix, Networks and all things computers, in daily life. Basically, it is to serve as a persistent record of all the good things in this world, so that I don't have to run back and search for that elusive URL in my browser's history again and again. That others might find it useful, is an added bonus and a minor motivation.