8. A Traffic Control Journey: Real World Scenarios

Having read the previous sections and familiarizing yourself with traffic control concepts and the tools available under GNU/Linux to deploy QoS, you should be ready to rock. Now, let us examine some real world scenarios and effective resolutions.

Below I overview two popular scenarios, guaranteeing a specific rate and guaranteeing flow priority. The first involves a basic Web server, the second a consumer broadband Internet connection. First, let us examine a few strategies to deal with situations that exist in many environments that may wish to employ traffic control.

8.1. Common Traffic Control Situations

Whether you're trying to guarantee a specific rate or priority for flows, you need to handle situations where TOS flags are improperly set (especially in the case of the prio qdisc), handle TCP handshake packets, and classify network resource intensive p2p traffic flows. What follows are Netfilter based solutions, although Netfilter need not be employed for actual classification. It is often easier to classify with Netfilter if you are already using it for stateful packet inspection.

8.1.1. Handling TOS Flags

Some applications, specifically OpenSSH, provide incorrect type of service (TOS) information which can result in misclassification of tunnels and bulk data transfers. With reliable remote shell connectivity typically being a must for servers, this can be a problem. What's more, p2p applications like to mask bulk data packets as TCP acknowledgment packets. Erik Hensema has an excellent two pronged Netfilter based solution for this situation.


iptables -t mangle -N tosfix
iptables -t mangle -A tosfix -p tcp -m length --length 0:512 -j RETURN
iptables -t mangle -A tosfix -m limit --limit 2/s --limit-burst 10 -j RETURN
iptables -t mangle -A tosfix -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A tosfix -j RETURN
...
iptables -t mangle -A POSTROUTING -p tcp -m tos --tos Minimize-Delay -j tosfix

First, a new chain is created to examine Minimize-Delay packets. They are evaluated for length and then a short burst is allowed for. When both of these sanity checks are violated, packets larger than 512 bytes with the TOS Minimize-Delay flag set have TOS reclassified to Maximize-Throughput instead. The underlying assumption is that packets that need Minimum-Delay priority are small and only exceed 512 bytes for short bursts. Traffic flows from OpenSSH mesh well with this rule. Without it, using OpenSSH tunneling and copying files with scp or sftp can render your OpenSSH session rather useless for the duration if your packets are queuing.


iptables -t mangle -N ack
iptables -t mangle -A ack -m tos ! --tos Normal-Service -j RETURN
iptables -t mangle -A ack -p tcp -m length --length 0:128 \
  -j TOS --set-tos Minimize-Delay
iptables -t mangle -A ack -p tcp -m length --length 128: \
  -j TOS --set-tos Maximize-Throughput
iptables -t mangle -A ack -j RETURN
...
iptables -t mangle -A POSTROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK -j ack

Last, a new chain is created specifically for modifying the TOS bits if they are not sane. TCP packets with the ACK flag set that already have TOS assigned are ignored. If the TCP packet is no larger than 128 bytes, it is considered a candidate for Minimize-Delay and elevated accordingly. Strange TCP packets with the ACK flag set, like those generated by p2p applications generally fall into the category of being larger than 128 bytes and are flagged Maximize-Throughput accordingly. The chain is only applied to TCP packets with the ACK flag set.

8.1.2. Prioritizing TCP Handshake Packets

To prevent establishing and breaking down connections from encountering potentially lengthy delays, it's useful to assign these packets a higher priority. It's not strictly necessary to elevate these packets, as they will be properly classified for any specific flows you classify and treated the same as unclassified traffic otherwise. Reclassifying these packets is more a matter of personal taste.


iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp -m tcp --tcp-flags ! SYN,RST,ACK ACK \
        -j CLASSIFY --set-class 4:1
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK \
        -m length --length :128 -m tos --tos Minimize-Delay \
        -j CLASSIFY --set-class 4:1

The first rule matches TCP SYN and RST packets and classifies them using the CLASSIFY Netfilter target discussed earlier. The second rule builds on the TOS reclassification chain discussion above and again the CLASSIFY target is used on TCP packets with the ACK flag set that don't exceed 128 bytes and have a TOS flag of Minimize-Delay.

8.1.3. Handling Pervasive p2p Traffic

p2p traffic can very easily saturate a network's entire upstream bandwidth. Fortunately, with L7 Filter it is now rather easy to classify these flows and grant them priority below that of all other traffic. p2p applications are always evolving, so L7 Filter is no magic bullet. It can help pin down p2p traffic, however.


iptables -t mangle -A POSTROUTING -m layer7 --l7proto edonkey -j CLASSIFY --set-class 4:5
iptables -t mangle -A POSTROUTING -m layer7 --l7proto fasttrack -j CLASSIFY --set-class 4:5
iptables -t mangle -A POSTROUTING -m layer7 --l7proto gnutella -j CLASSIFY --set-class 4:5
iptables -t mangle -A POSTROUTING -m layer7 --l7proto audiogalaxy -j CLASSIFY --set-class 4:5
iptables -t mangle -A POSTROUTING -m layer7 --l7proto bittorrent -j CLASSIFY --set-class 4:5

There is no single pattern match for all known p2p applications, so you will need to specify a rule for each that's present now or you believe may be in the future. You may have to create your own patterns for p2p traffic that does not yet have an L7 Filter pattern match. Packet analysis is beyond the scope of this document.

8.2. Guaranteeing Rate

When guaranteeing a minimum bandwidth rate is necessary, the classful htb qdisc is your friend. In this scenario our objective is to guarantee a specific rate for HTTP traffic while sharing the link with SMTP, POP3, and OpenSSH traffic. All other traffic is assigned to the default class.

8.2.1. Designing the Classful qdisc Hierarchy

A Web server networked via Ethernet has 8Mbps of bandwidth available to it. Web traffic is most important. Other traffic is secondary. Accordingly, the class hierarchy created below allocates 6000Kbps for HTTP traffic. The remaining bandwidth is split into three more classes. SMTP and POP3 are allocated 1000Kbps. OpenSSH gets 500Kbps as does the default class. All classes except the default class can use excess bandwidth up to the full speed of the line. The careful reader will note that all the rates add up to the overall rate specified in the first htb parent class, as they always should.


#!/bin/bash

RATE=8000

if [ x$1 = 'xstop' ]
then
        tc qdisc del dev eth0 root >/dev/null 2>&1
fi

tc qdisc add dev eth0 root handle 1: htb default 90
tc class add dev eth0 parent 1: classid 1:1 htb rate ${RATE}kbit ceil ${RATE}kbit

tc class add dev eth0 parent 1:1 classid 1:10 htb rate 6000kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:20 htb rate 1000kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:50 htb rate 500kbit ceil ${RATE}kbit
tc class add dev eth0 parent 1:1 classid 1:90 htb rate 500kbit ceil 500kbit

tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
tc qdisc add dev eth0 parent 1:50 handle 50: sfq perturb 10
tc qdisc add dev eth0 parent 1:90 handle 90: sfq perturb 10

The above shell script will create the class structure described above. It is rather simplistic and no deep nesting occurs. The parent class only has immediate children and no additional ancestors. For fairness, the sfq scheduling qdisc is attached to each leaf htb class.

8.2.2. Classifying Flows

Classification of flows is done using tc's filter combined with the u32 selector, discussed earlier.


tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 80 0xffff classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 22 0xffff classid 1:20
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 25 0xffff classid 1:50
tc filter add dev eth0 parent 1:0 protocol ip u32 match ip sport 110 0xffff classid 1:50

The tc commands above classify flows using the u32 selector based on TCP source port number. HTTP, SSH, SMTP, and POP3 are classified based on their traditional source ports. Any unclassified traffic is assigned to classid 1:90 as specified earlier when the htb class hierarchy was created.

8.2.3. Observations

The classful htb qdisc is excellent at accurately guaranteeing rates for classified flows. Each htb class can dequeue at its assigned rate and, if allowed, exceed that in proportion to its parent's rate. It's especially useful for guaranteeing particular rates for specific services or entire ranges of network traffic.

8.3. Guaranteeing Priority

When guaranteeing flow priority is necessary, the classful prio qdisc is your friend. In this scenario our objective is to guarantee interactive applications have priority over bulk transfers and p2p applications.

8.3.1. Designing the Classful qdisc Hierarchy

The prio qdisc only knows about bands, where each band corresponds to a level of priority. While band numbering starts at zero, each band is described by major:band+1. To ensure that the priority classifications stick, the classful shaping qdisc tbf must be employed in conjunction with the prio qdisc. tbf will ensure that if link speed is exceeded a queue fills locally, where it is still controllable. Such a configuration is possible using tbf qdisc with Linux 2.6.1 and beyond.

Structurally, the class hierarchy utilizes a tbf qdisc and serves as a container for the prio qdisc, ensuring any packet queue remains local. The prio qdisc is then assigned to the only tbf class, with an extra band added. As described earlier, the prio qdisc automatically creates a class structure for as many bands as you create, with the default being three. Finally we assign the sfq scheduling qdisc as the leaf for three of the four new prio qdisc classes. The fourth, which is for p2p traffic, is assigned the tbf scheduling qdisc, with a pfifo qdisc attached to the tbf class.

It's important to note that the prio qdisc is merely a scheduler. As such, it cannot perform any shaping. Therefore, if one or more higher priority bands consume the link, lower priority bands will never have an opportunity to dequeue packets. In other words, starvation occurs. To combat this careful planning is necessary. If starvation cannot occur, you should look instead at guaranteeing rates above.

The proposed configuration is effective for residential consumer broadband, in the form of ADSL or Cable Internet services, where one must often suffer an asymmetrical connection. The example below assumes a usable rate of 160Kbps on a residential ADSL connection with an advertised rate of 256Kbps. The tricky part is guessing what your actual bandwidth rate is.  With overhead it's usually 60% to 90% of your rated connection.


tc qdisc add dev eth0 root handle 1: tbf rate 160kbit burst 1600 limit 1
tc qdisc add dev eth0 parent 1:1 handle 2: prio bands 4
tc qdisc add dev eth0 parent 2:1 handle 10: sfq perturb 20
tc qdisc add dev eth0 parent 2:2 handle 20: sfq perturb 20
tc qdisc add dev eth0 parent 2:3 handle 30: sfq perturb 20
tc qdisc add dev eth0 parent 2:4 handle 40: tbf rate 144kbit burst 1600 limit 3000
tc qdisc add dev eth0 parent 40:1 handle 41: pfifo limit 10

The above commands will create the class structure described above. The actual hierarchy is more complex than immediately obvious, due to the prio qdisc automatically creating a class for each band it manages.

8.3.2. Classifying Flows

Now we can use Netfilter and its CLASSIFY target to classify traffic. We handle packets with type of service set as described earlier. TCP packets with the ACK flag are also handled as described above. As you may recall, the prio qdisc uses the TOS flags to classify packets by default. Most importantly, Minimize-Cost is assigned priority level zero, Normal-Service priority level one, and Maximize-Throughput priority level two. Ensuring packets have a proper TOS flag is obviously of paramount importance.


# Is our TOS broken? Fix it for TCP ACK and OpenSSH.

iptables -t mangle -A POSTROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK -j ack
iptables -t mangle -A POSTROUTING -p tcp -m tos --tos Minimize-Delay -j tosfix

# Here we deal with ACK, SYN, and RST packets

# Match SYN and RST packets
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp -m tcp --tcp-flags ! SYN,RST,ACK ACK \
        -j CLASSIFY --set-class 2:1

# Match ACK packets
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp -m tcp --tcp-flags SYN,RST,ACK ACK \
        -m length --length :128 -m tos --tos Minimize-Delay \
        -j CLASSIFY --set-class 2:1

# Match packets with TOS Minimize-Delay
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp -m tos --tos Minimize-Delay \
        -j CLASSIFY --set-class 2:1

The first packets classified are those that can delay flows if not handled expediently. All TCP flows are handled the same in that packets with handshake flags set are promoted. Later, some of these flows will be entirely demoted. The most generic rule wins. Each time a rule matches, the packet is reassigned to the associated traffic control class. Classification progression with Netfilter should proceed from the least to the most specific.


### Actual traffic shaping classifications with CLASSIFY

# ICMP (ping)

iptables -t mangle -A POSTROUTING -o $LOCALIF -p icmp -j CLASSIFY --set-class 2:1

# Outbound client requests for HTTP, IRC and AIM (dport matches)

iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp --dport 80 -j CLASSIFY --set-class 2:2
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp --dport 6667 -j CLASSIFY --set-class 2:2
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp --dport 5190 -j CLASSIFY --set-class 2:2

# Enemy Territory (UDP, realtime gaming packets)

iptables -t mangle -A POSTROUTING -o $LOCALIF -p udp --dport 27960:27970 \
        -j CLASSIFY --set-class 2:2

After the earlier magic, classification of most flows is generally as easy and straightforward as using iptables matching rules. Above we assign ICMP traffic, which includes things like the packets sent in association with the ping command, to class described by 2:1. We assign all other interactive traffic to the class described by 2:2. Notice we have classified both ICMP and UDP flows in additional to more common TCP flows.


# SSH

# The most general rule always wins, so list specific rules _LAST_

iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp --dport 22 -j CLASSIFY --set-class 2:2
iptables -t mangle -A POSTROUTING -o $LOCALIF -p tcp --sport 22 -j CLASSIFY --set-class 2:2

iptables -t mangle -A POSTROUTING -p tcp -m tos --tos Maximize-Throughput \
        --sport ssh -j CLASSIFY --set-class 2:3
iptables -t mangle -A POSTROUTING -p tcp -m tos --tos Maximize-Throughput \
        --dport ssh -j CLASSIFY --set-class 2:3

# Matches for Edonkey and Overnet

iptables -t mangle -A POSTROUTING -m layer7 --l7proto edonkey -j CLASSIFY --set-class 2:4

Finally, we handle flows generated by OpenSSH and a p2p application. The former is assigned to the interactive class for sessions originating both within the local network and destined for the local network from the Internet. (Or perhaps from another segment on a WAN.) Earlier, packets with the TOS flag Minimize-Delay larger than 512 bytes had their TOS altered to a more reasonable Maximize-Throughput. That is taken advantage of now implicitly in the second pair of rules relating to OpenSSH. Tunnels and transfers using scp and sftp will now correctly be assigned to the class described by 2:3. The final rule uses L7-Filter to match packets sent by the p2p application eMule by applying a regular expression against each packet and matching the protocol at the application layer. The traffic is then assigned to the class represented by 2:4, the p2p class.

8.3.3. Observations

The classful prio qdisc paired with the classful tbf qdisc is an excellent way of guaranteeing priority for flows in situations where you can live with one or more bands dominating lower priority bands, possibly starving them entirely at times.