7. Building a QoS Ready Kernel

In order to make use of these splendid traffic control features, you need to build your kernel with appropriate support. If you are interested in L7 Filter, make sure you patch your kernel accordingly.

7.1. Kernel Options for Traffic Control Support

The selections for traffic control for a 2.6 series kernel are listed under Device Drivers -> Networking support -> Networking options -> QoS and/or fair queuing. At a minimum you will want to enable the options selected below. Unselected options have been pruned.


[*] QoS and/or fair queueing
<M>   HTB packet scheduler
<M>   The simplest PRIO pseudoscheduler
<M>   RED queue
<M>   SFQ queue
<M>   TBF queue
<M>   Ingress Qdisc
[*]   QoS support
[*]     Rate estimator
[*]   Packet classifier API
<M>     Firewall based classifier
<M>     U32 classifier
[*]     Traffic policing (needed for in/egress)

For 2.6.9 and later kernels, you have the additional option of specifying the scheduler clock source directly during kernel configuration. The option is available from within the QoS and/or fair queuing configuration section above.


Packet scheduler clock source (Timer interrupt)  --->
 ( ) Timer interrupt
 ( ) gettimeofday
 (X) CPU cycle counter

For modern x86 machines you can select CPU cycle counter without incident. The scheduler clock source selection above replaces the need to hand edit the include/net/pkt_sched.h file as described below.

7.2. Kernel Options for Netfilter Support

The selections for Netfilter for a 2.6 series kernel are listed under Device Drivers -> Networking support -> Networking options -> Network packet filtering (replaces ipchains) -> IP: Netfilter Configuration. You will want to enable at least the options selected below to use Netfilter effectively to classify traffic flows. Include anything else you use for firewalling, too. Unselected options have been pruned.


<M> Connection tracking (required for masq/NAT + layer7)
<M>   FTP protocol support
<M>   IRC protocol support
<M> IP tables support (required for filtering/masq/NAT)
<M>   limit match support
<M>   IP range match support
<M>   Layer 7 match support (EXPERIMENTAL)
[ ]     Layer 7 debugging output(2048)  Buffer size for application layer data
<M>   Packet type match support
<M>   netfilter MARK match support
<M>   Multiple port match support
<M>   TOS match support
<M>   LENGTH match support
<M>   Helper match support
<M>   Connection state match support
<M>   Connection tracking match support
<M>   Packet filtering
<M>     REJECT target support
<M>   Full NAT
<M> MASQUERADE target support
<M> Packet mangling
<M>   TOS target support
<M>   MARK target support
<M>   CLASSIFY target support
<M> LOG target support
<M> ULOG target support

7.3. Kernel Source Changes

To get the most mileage out of your kernel, there are a few options in the kernel source you will want to change. You can find all these files under your kernel source tree. Paths are specified from that root.

7.3.1. PSCHED_CPU

If you have a Pentium class CPU or better and are using a kernel prior to 2.6.9, in include/net/pkt_sched.h, change PSCHED_CLOCK_SOURCE to PSCHED_CPU and save the file. If your CPU supports tsc you can use PSCHED_CPU. You can skip this modification if you are using a kernel newer than 2.6.8.1.


# cat /proc/cpuinfo
...
flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov \
  pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow

While in days of old, the default was PSCHED_GETTIMEOFDAY, today PSCHED_JIFFIES is used and it isn't terribly bad. PSCHED_CPU can't hurt, though.

7.3.2. HTB_HYSTERESIS

When working with peak bandwidth rates less than 1Mbps, the HTB_HYSTERESIS option is set to your detriment. It trades accuracy for faster calculations, but with really slow network links this is not necessary. Open up net/sched/sch_htb.c and change HTB_HYSTERESIS to 0. This setting also effects bursts.


#define HTB_HYSTERESIS 0/* whether to use mode hysteresis for speedup */

7.3.3. SFQ_DEPTH

When dealing with smaller bandwidth quantities, the default queue length of 128 is too long. Flows that demand low latency can suffer if sfq begins to fill up its queue. You can edit SFQ_DEPTH in net/sched/sch_sfq.c and shorten the queue to your liking. A popular depth is 10.


#define SFQ_DEPTH               128