Sunday, June 28, 2009

Build your kernel faster

Normally when building custom kernels for our laptops or desktops, we tend to make use of the kernel config file used by the particular distro. However the distro config files tend to be huge, having loads of modules turned on, ven those which might not even be needed on our particular laptop or desktop. This is the case since the distro kernels need to cater to a large configurations of systems. . This leads to the kernel taking ages to compile ! If you want to build your kernel fast, and turn off all those modules/drivers which are not needed on your system, streamline_config.pl script by Steven Rostedt is what you need (at the sametime ensuring that your kernel does have all that is necessary). Here is the thread where Steven explains how this script can be used. In brief,

Run the script with the arguement being your architecture's Kconfig file and save the output
  • # ./streamline_config.pl arch/x86/Kconfig > config_stream
Copy config_stream as your new .config and run 'make oldconfig' or 'make menuconfig' if you want to continue configuring the kernel. Your build would now take much lesser time !

Note: If you already have a .config file that has some of your custom config options set and you want to streamline that, no worries as streamline_config.pl will work on that .config itself (provided its present in the kernel src dir). You might still want to take a backup of your .config [;-)]

Sunday, June 21, 2009

Using large pages

Linux has had support for large pages (also called huge pages) for a long time now. The size of large pages supported depends on the platform. For example, on Intel it has mostly been 2MB. Large pages offer the advantage of having fewer entries in the TLB and thus fewer cache misses. However, it could lead to more wastage of memory and fragmentation. Many applications typically use large pages for certain designation functions. For example, if supported and required number available, JVM heap is composed of large pages.

An application can request large pages using the shmget API:

#include <sys/ipc.h>

#include <sys/shm.h>

int shmget(key_t key, size_t size, int shmflg);

The SHM_HUGETLB flag part of shmflg field specifies creation of large pages.

Linux kernel provides an interface using which large pages can be requested.

#echo 1000 > /proc/sys/vm/nr_hugepages

The above causes 1000 large pages to be allocated by the kernel. More information on large pages can be obtained from the /proc fs:

#cat /proc/meminfo

MemTotal: 8114308 kB
MemFree: 5867312 kB
Buffers: 8412 kB
Cached: 107304 kB
SwapCached: 0 kB
Active: 48000 kB
Inactive: 87592 kB
Active(anon): 22704 kB
Inactive(anon): 0 kB
Active(file): 25296 kB
Inactive(file): 87592 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4883752 kB
SwapFree: 4883752 kB
Dirty: 48 kB
Writeback: 36 kB
AnonPages: 20212 kB
Mapped: 10948 kB
Slab: 25988 kB
SReclaimable: 12916 kB
SUnreclaim: 13072 kB
PageTables: 2400 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 7916904 kB
Committed_AS: 46040 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 43496 kB
VmallocChunk: 34359693843 kB
HugePages_Total: 1000
HugePages_Free: 1000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 3824 kB
DirectMap2M: 8384512 kB

On a NUMA system, the kernel splits large page allocations equally across the different nodes. For example, if the system has 2 nodes, a request of 1000 large pages would get split into 500 pages from each node. Per node large page information can be obtained from the /sys interface:

# cat /sys/devices/system/node/node0/meminfo

Node 1 MemTotal: 4194304 kB
Node 1 MemFree: 40004 kB
Node 1 MemUsed: 4154300 kB
Node 1 Active: 2166524 kB
Node 1 Inactive: 810704 kB
Node 1 Active(anon): 2127084 kB
Node 1 Inactive(anon): 8360 kB
Node 1 Active(file): 39440 kB
Node 1 Inactive(file): 802344 kB
Node 1 Unevictable: 0 kB
Node 1 Mlocked: 0 kB
Node 1 Dirty: 0 kB
Node 1 Writeback: 0 kB
Node 1 FilePages: 841792 kB
Node 1 Mapped: 11008 kB
Node 1 AnonPages: 2135884 kB
Node 1 PageTables: 5136 kB
Node 1 NFS_Unstable: 0 kB
Node 1 Bounce: 0 kB
Node 1 WritebackTmp: 0 kB
Node 1 Slab: 33704 kB
Node 1 SReclaimable: 30708 kB
Node 1 SUnreclaim: 2996 kB
Node 1 HugePages_Total: 500
Node 1 HugePages_Free: 498
Node 1 HugePages_Surp: 0

Recently, in one of the benchmarks (JAVA benchmark) I was running, I was seeing a huge performance degradation of about 6-8%. After some debugging, the issue turned out to be that the application was not able to utilize the large pages allocated (thanks to some weird environment I had ;-) ). To find out the number of large pages being utilized by the app, besides the above meminfo output, you can also use numa_maps. For example,

# cat /proc/<process pid>/numa_maps

00001000 default anon=1 dirty=1 N0=1
00400000 default file=<....library file info..> mapped=10 mapmax=3 N0=10
0050b000 default file=<....library file info..> anon=1 dirty=1 N0=1
0050c000 default heap anon=213 dirty=213 N0=213
00600000 default file=/SYSV00000000\040(deleted) huge dirty=472 N0=472
40600000 default
40601000 default anon=2 dirty=2 N0=2
40641000 default
40642000 default anon=4 dirty=4 N0=4
40682000 default
40683000 default anon=2 dirty=2 N0=2
4090f000 default
40910000 default anon=3 dirty=3 N0=3
40a68000 default
40a69000 default anon=4 dirty=4 N0=4
40a70000 default
40a71000 default anon=2 dirty=2 N0=2
40ab1000 default
40ab2000 default anon=2 dirty=2 N0=2
.....

41fc9000 default anon=10 dirty=10 N0=10
427c9000 default anon=535 dirty=535 N0=535
2aaaaac00000 default file=/SYSV00000000\040(deleted) huge dirty=1 N0=1
7f6024000000 default anon=5578 dirty=5578 N0=5578
7f6027398000 default
7f602a402000 default anon=821 dirty=821 N0=821




Friday, June 19, 2009

Useful staps to track task movement across CPUs

Quite sometime back, I was faced with a situation where I needed to track instances when a particular task was being migrated away from a cpu. It was in the context of a real-time system, where a real-time task was facing huge context switch delays. Obvious suspect being the scheduler, I used systemtap to infer a few things, besides other debugging:
  • To find if the task was being migrated away to some other cpu, used the following trivial stap script:
/* Filename: migrate.stp
* Author: Ankita Garg <ankita@in.ibm.com>
* Description: Captures information on the migration of threads
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* © Copyright IBM Corp. 2009. All Rights Reserved.
*
*/

probe kernel.function("__migrate_task")
{
if(($1 != 0 ) && (tid() == $1)) {
printf ("thread %d (%s) is migrating from %d to %d \n", $p->pid,
kernel_string($p->comm), $src_cpu, $dest_cpu);
}
}


  • Below is a script that tracks all the cpus that a particular task ran on. Pl note it does not track the context switches.
/* Filename: chng_cpu.stp
* Author: Ankita Garg <ankita@in.ibm.com>
* Description: Captures information on the number of times java thread
* switches cpu
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* © Copyright IBM Corp. 2009. All Rights Reserved.
*
*/

global threads

probe kernel.function("finish_task_switch")
{
if ((threads[tid()] != cpu()) && (tid() != 0) && (execname() == @1)) {
printf("thread %d (%s) context switched on %d \n",
tid(), execname(), cpu());
printf("state: %d\n", task_state(task_current()))
print_stack(backtrace())
}
threads[tid()] = cpu();
}


These are a bit older techniques, as now there is a new tracepoints infrastructure which can do these things. But on older kernels, the above would be useful. Expect more posts on kernel RAS features in due time.

Thursday, June 18, 2009

Importing .ics into Lotus Notes 8

A number of times I get calender invites for meetings on my non-Lotus notes email IDs. The calender invites are normally in the .ics format. Once can easily import it into Lotus Notes. Here is how:
  1. Compose a mail inside notes
  2. Attach the .ics file to it
  3. Right click the attachment, and click on "View"
  4. The calender view would open, with the meeting details. Now accept/decline the invite, save and exit
Voila, the entry gets saved to your calender. Ofcourse, there might be other ways to achieve this in notes :-)

Free up that memory

Recently came across this cool interface in the Linux kernel. Typically, the memory might be over-provisioned on the system. Instead of wasting the memory, the kernel normally utilizes a lot of it for page cache, dentry cache and inodes. These caches speed I/O operations and improves performance. However, there are cases when large amount of memory might actually be needed by the apps. While most of the cache pages could be easily reclaimed, there is obviously some overhead involved (the pages could be dirty and might have to written back to the disk, thus incurring disk write latency). Linux has a neat kernel.. so now, while it uses its smart to utilize the memory well, it also provides a method for people to indicate that they do not want the kernel to use its smarts ;-)

To free memory, just do the following:

# echo 1 > /proc/sys/vm/drop_caches

(the above frees only page cache)

# echo 2 > /proc/sys/vm/drop_caches

(for freeing dentry caches and inodes)

# echo 3 > /proc/sys/vm/drop_caches

(for freeing all of the above)

It would be advisable to first do a 'sync' before dropping the caches, so that all the dirty pages could be acted upon.