Common ISP Mistakes

This page lists some common mistakes made by ISPs (or other network administrators) which cause other people grief. Some of these apply to backbone providers, some apply to non-backbone providers, and some apply to enterprise/organization network administrators. You may want to use this as a guide towards "best practice".

Some of the sections discuss things which you can do wrong and some suggest things which you can do right. The two are often not clearly separated; I trust people will be able to distinguish the two. A Little more consistency would be nice.

Advertising routes to 0.0.0.0/0 via BGP.

This causes all (or much of the) traffic on the internet to be routed to you, denying service to you and everyone else. This is likely to be caused by specifying a default route on a machine running BGP.

Blocking ICMP

Many ISPs, and enterprise networks have blocked some or all ICMP traffic, in an attempt to provide permanent or temporary protection against DoS attacks. Blocking ICMP "too big" messages will break PMTU discovery. If you must block ICMP "echo" and "echo reply" as a temporary stop gap measure, block only those message types (and preferrably from as small a subset of the internet as possible). While you are blocking these types, it is a mistake to claim that the network is "up". The network is operating in a degraded mode.

Breaking ping is a bad thing, only to be done in emergencies.

Breaking PMTU discovery has the effect that connections between any two hosts that have an MTU greater than that of the smallest MTU hop on the path will not be able to communicate. Packets will be dropped, consistently enough to prevent communications even though some small part of each flow will frequently get through (generally the same part). This is a pattern (packet size) sensitive packet loss not a random one so TCP retransmission does not recover. For SMTP, the HELO, MAIL, RCPT dialog will happen and then the connection will hang on the message DATA (unless the message is very short) tying up the servers at both ends until they timeout. Normally all attempts to resend the message will also fail.

For HTTP, the browser will probably be able to do a "HEAD" (short response) but a "GET" will fail. The symptom to the user is that they are consistently unable to get through to certain web sites with the connection stalling at the same place each time. On very rare ocassions, I have seen a connection actually succeed to one of these unreachable servers when it was sufficiently loaded that it transmitted the data in smaller chunks.

If this happens to you, a workaround is to set the Max MTU to 576 on each of your clients (to fix outbound connections) and servers (to fix inbound ones). Setting the Mtu on your router does not seem to work (it might help in one direction (of data flow), by sending ICMP too bigs to your inside hosts at a threshold lower than the lost ICMP too bigs, but not the other direction in which both sets of "too bigs" get dropped. This does put a higher packet load on the backbone (more smaller packets have to be routed). The cause is usually because a router somewhere drops packets which are too large but have the DF (don't fragment) bit set without generating the required ICMP too big message (there is some defective hardware out there) or, more likely, that some cluelesss network operator filtered all ICMP traffic, usually as a naive attempt to protect against real or anticipated DoS attacks.

This is made much worse by filtering software which does not allow you to filter specific ICMP types and (unfragmented) packet sizes. A much better way to handle most of the DoS attacks (except smurf) is to force fragment reassasembly on a router which is not sucseptible before forwarding the packets; this puts a significant load on the router so it is best done on a router/firewall close to the systems being protected (which is also desireable because it gives more protection).

Note that PMTU can be broken at least three different ways, if you are suffering from it: Failure to generate ICMP "too big", filtering ICMP "too big", and configuring equipment which handles traffic which may come from the public internet with private addresses on the routers.

Marc Slemco has a page on Broken PMTU Discovery.

Using defective hardware that drops oversized packets without sending ICMP "too big" messages

This breaks PMTU discovery, among other things.

Some "repeaters" which bridge incompatible (from an MTU perspective) media types (like fddi to/from ethernet) are guilty of this offense. A repeater is generally a brain dead device which simply cannot generate ICMP messages. These devices should be avoided. They might be usable in certain carefully controlled situations (where every single device on the larger MTU side can be, and is, programmed with the smaller MTU.

Filtering Broadcast addresses incorrectly

Filtering on 0.0.0.255 mask 0.0.0.255 is also irresponsible. The lame arguments that anyone who has a host in that range is asking for trouble are specious; just because they may be adversely affected by some clueless individual somewhere does not justify your being clueless as well. Yes, personally, I would avoid putting a (externally accessable) host at .255 because of the general clue deficit.

Failure to filter traffic with forged IP source addresses

This is the single most important thing you can do to protect against net abuse originating within your network.

On some cisco's, you may find the "ip verify unicast reverse-path" command to be helpful. On almost any box which has IP filtering capability, you can filter out packets from any interface which do not match the IP address range assigned to that interface.

Blocking traceroute

Traceroute typically uses udp ports 33435 to 33524 for the first 30 hops (for additional hops beyond that add 3 ports per hop). You need to allow these through firewalls or packet filters. Do not allow any vulnerable servers to use this port range inside your net.

Don't allow your downstream networks to be used as smurf amplifiers

On cisco's, you can use "no ip-directed broadcast". On almost any box with IP filtering, you can deny any packets which originate on an outside interface and have an address matching your inside network(s) broadcast addresses. You should do a smurf scan on your downstreams and block traffic to those broadcast addresses which are found to result in smurf amplification (if a ping to that address results in multiple replies ("DUP!"), you should block traffic to that specific address until your downstream can do it themselves).

Configuring router interfaces with Private IP Addresses

This breaks things because many networks, legitimately prevent any packets with a source address in one of the Private blocks from entering or crossing their networks. Any ICMP responses from the router (such as "TTL exceeded" or "too big") will have the illegal Private address. This can break traceroute, PMTU discovery, and others.

Reassembling packet fragments

There are two reasons one might want to do this: improving perfomance in certain (unusual) situations (while putting a huge load on the routers) and for security.

RFC-1812 specifically forbids a router from doing this because the fragments may take different routes and therefore not be all be availible for reassembly, thus causing packet loss which may be pattern sensitive and not random and therefore not fixed by retransmission. However, RFC-1812 was clearly written taking into consideration only the dubious performance reasons for doing this. The security reasons are far more compelling and I think RFC-1812 will need to be rewritten in the future to permit reassembly under certain limited situations, namely: A) where the router performing reassembly is on the only possible route into or out of the network being protected, or B) where steps have been taken to force all fragments from any given packet to converge at a common point for reassembly (be carefull that you don't inadvertantly cause fragmented packets to appear to have originated from inside your protected network in the processs).

Failure to read RFC-1812

RFC-1812 is the current Router Requirements RFC.

Failure to document any deviations from RFCs

If you absolutely must deviate from any RFCs or best practice, these should be clearly documented on your web site in a easily found location.

Failure to properly delegate reverse address mappings

Some ISPs do not know how to properly delegate reverse address mappings to a downstream customers domain name server for network blocks smaller than /24. The O'Reilly and Associates DNS book devotes several pages to how to do this. This failure interferes with the downstream networks administration of their address space.

Failure to prevent SPAM originating within or relaying through your net

Disable forwarding of messages through your smtp servers unless they come from or are destined to your network or downstream networks. Spammers misuse other peoples smtp servers to handle delivery for them, allowing them to send large amounts of spam from even a slow link and making it harder to trace and block spam.

Enforce your acceptable use policy.

Failure to have, and enforce, an Acceptable Use Policy

Make sure your Acceptable Use Policy gives you the power to terminate accounts which engage in net abuse. Some spam prevention web sites have links to examples of AUP which are used by other ISPs to prevent abuse.

Require proper identification for all prospective network customers (both dedicated circuit and dailup). Require all prospective customers to sign a statement that they will abide by the AUP and that they have never had access terminated before for any type of network abuse. At a minimum, run all credit cards through with name/address verification before granting access. Check all credit cards, names, addresses, and phone numbers against known abusers. Verify that the prospective customer can actually be reached at what they claim is their phone number. Publish the name, address, phone number, and the MD5 hash of the credit card number used by any spammer to the net-abuse newsgroup.

Although AOL does enforce its AUP after the fact (at least by terminating accounts, if not taking perpetrators to court), it is their promiscuity in signing up customers that leads to so much spam from their network.

If you do not want to bother with the hassle of suing perpetrators to recover damages, at least assign your rights to collect damages to a law firm which will sue on their own behalf. Write your AUP to make lawsuits lucrative by establishing minimum damages.

Your AUP should also allow you to monitor suspected net-abuse. If someone sends 1000 copies of an email message, you should be able to read it to see if it is SPAM (if they are sending 1000 copies, it probably isn't that private anyway).

Your AUP should differenciate from unauthorized security attacks and authorized security attacks by reputable security consultants or the target networks owner themselves.

Transparently proxying traffic without consent

It is completely unethical, and probably illegal as well, to transparently proxy traffic that transits your network without the consent of at least one of the two parties to that traffic, except to provide informative rejection messages in the case of traffic blocked because of net-abuse. Digex was, until recently, considered a reputable backbone provider but in June of 1998 it became public knowledge that they were transparently caching web traffic; now their name is mud. Internet packets are not fungible; you can't treat them like lumps of coal. Transparent proxying or caching can be used to:

Server as a web accelerator for downstream customers who have explicitly requested in.
Improve performance for downstream customers provided you have notified them of the transparent caching and provided a means (such as a web form) by which they can configure this on or off at will for their accounts.
You are providing an informative error message for connections to which you would otherwise deny service silently.
You can force caching of your downstream customers outgoing web requests without option to disable only if you are explictly selling substandard service.

Users of transparent caching should also beware that they can only legitimately do so if all tcp packets for a connection must travel through one point; otherwise, you have the same kinds of problems as with fragment reassembly.

Making changes without notifying affected parties

It is your responsibility to notify your downstream contacts if you do any of the following:

Filter traffic to/from their network in any way.
Deviate from any RFC on any of your routers, particularly RFC-1812
Change the point of attachment for their connection.
Change any of the above

Failure to adequately monitor for net abuse

SPAM
Any host or account on your net which sends excesive mail (total number of recipients for all messages) during a given period of time should trigger an alarm. In response to that alarm, you can investigate if it is spam or a legitimate mailing list.
Ping of Death and other similar DoS
Smurf
Monitor for inbound attacks, outbound attacks, and attacks which are using your network as an amplifier.
Any forged source address packets

Failure to document any deviations from RFC's

If you deviate from any RFC's or filter traffic in any way, you should document that on an easily located web page on your web site dedicated to troubleshooting network problems. This page should have links to/from your downtime page if they are not the same and should have a link directly from your organizations main web page.

Failure to consider who is affected by your countermeasures

Countermeasures to various types of attacks should be designed to minimize the affect on innocent people and maximize the effect on the perpetrators or those who have helped the perpetrators by contributory negligence. If you are smurfed, blocking all traffic from the "smurf amplifier" sites is better than blocking any type of traffic (such as ICMP) from the whole net.

If you have to block some traffic, it is also a good idea to consider redirecting http (and possibly other common protocols like SMTP and FTP) to protocol specific servers which return suitable error messages referring people to "http://accesssdenied.your.net/". This minimizes the time wasted by people who may not be at fault, cuts down on calls to your tech support lines, and directs hostility towards those who are at least partially responsible. Allow all http traffic to the minimalist webserver "accessdenied.your.net", which does not necessarily need to be inside your own network; the unix program "/bin/cat /path/accessdenied" (which contains http headers+html text) might even be sufficient if it is invoked from a inetd substitutes with process fork limits. Similar programs may suffice for ftp and smtp access servers. Linux boxes (and probably *BSD boxes as well) have the capability to redirect blocked traffic to a specified port on the firewall, where you can have a protocal specific error server. Here is a sample error message file, asuming your 554 is a suitable error number for the protocol and it uses smtp/ftp style messages (http and imap need a little change).

554- Access has been blocked at your.net. to/from certain networks 
554- because of netabuse or security violations originating at
554- or aided by the blocked networks.
554- For further information, including where to direct your complaints,
554  visit http://accessdenied.your.net

Any decent client will display these error messages to the user.

Failure to adequately control your upstream links

You should have an upstream provider with a responsive 24 hour NOC which can and will reprogram the routers for you link as needed.

Alternatively, you can install your own router on your upstream providers end of your upstream links. In some cases, this will require that you have a direct router to router connection so your upstream can protect against any shenagans (promiscuous mode, masquerading, etc) from the router under your control.

These measures allow you to (possibly automatically) stop hostile traffic (such as smurf echo replies) before it clogs your upstream link.

Another possibility is to get your upstream provider to implement a secure http based server which allows you, and only you, to program certain characteristics, and only those characteristics, of your link. Filter rules (except those which prevent you from sending/forwarding traffic with spoofed source addresses).

Check suspected abuse against known good guys

You should have a list of downstream connections or accounts which are used by reputable security consultants. Your policy should require your NOC to check against this list before disabling an account or line because of abuse alarms. In such cases, you should check if the attack is authorized before taking any action. It also makes sense to compare the name/address on the credit card used on a dailup with known administrators of the target network (particularly those listed in whois).

Block spammers web sites

Redirect requests for web sites for companies which advertise via spam to a page that indicates that access has been denied for unethical conduct. Explain on that page that if they are trying to get past the block (for example to collect info for retaliation) they can go through the anonymizer.

Try to verify that the owners of the web site were the ones who really sent the spam (i.e. it wasn't a frame). Posing as an interested customer is usually helpful in this case.

Subscribe to vixie's validated spam blocking BGP feed

Dont Shoot the messenger

When someone points out a security problem in your network or systems, do not blame or take action against the messenger. They are friend, not foe. When the famous physicist Richard Feynman went around breaking into safes used in the manhatten project and then reported his successes to security, the morons handling security sent memos out to people to change their combinations if Feynman had been in their offices recently rather than telling them not to use their birthday, or other stupid choices, as their safe combination. Don't fix the blame, fix the problem.

Security isn't important here

Many organizations neglect security, on the grounds that it isn't important for their organization. This is wrong. Academic organizations often suffer from this attitude.

Your site, once compromised, may be used to attack other sites. You may even be liable if your site is used to harm others because of your negligence. Your connectivity to all or parts of the internet may be terminated if you cannot control access and stop abuse.
How would your organization be affected if all the computers were unavailible or malfunctioned simultaneously? Particularly at a crucial time? How would you feel if you couldn't pay your rent because someone crashed all the machines in payroll? If you couldn't meet your own deadlines because someone crashed your machines?
How many people use secure services from within your net? How many people order books online with a credit card, for example, from your campus computers?
You may have much more confidential information than you realize. How would your customers react if their credit card number were stolen off of your online store? How would your students react when a list of the self help books they checked out from the campus library was made public?
Do visitors to your campus log in to their home systems from your network?

Failure to engineer networks with a sniffer resistant topology

Use switches, bridges, or routers instead of hubs, at least in strategic locations but preferrably everywhere. Use good devices which can have learning disabled and be hard configured for more protection.

No host in your web farm, for example, should be able to evesdrop on traffic to/from any other host (particularly those owned, operated, or controlled by a different organization).

Consider setting up a trace host

Particularly if your NOC is not responsive and 24x7, you may want to operate a tracing host which allows authorized downstream providers to trace traffic which is addressed to their own networks only. Tcpdump may be used for this purpose with restrictions imposed on the filters used (or pipe one tcpdump into another with user specified parameters (no shell escapes, -w, or buffer overflows should be permitted). Some skill is required to do this safely.

Or, consider uploading a report from your abuse monitors to your web server every 5 minutes. This report should include clear abuses only to protect privacy.

24x7 NOC or staff on call

Networks of any significant size need to have a 24x7 NOC or at least have knowledgeable people with portable/home computers on call 24 hours a day. If you don't have a 24x7 NOC, you should consider contracting an outside firm to provide that.

A 24x7 answering service is not a 24x7 NOC.

Rapid response is important for tracking down and stopping many types of net abuse.

Educate your network customers

Make information availible to your downstream networks administrators who may be less experienced. You might want to link to this page.

Make sure your dialup customers understand they must not do business with spammers. Make sure they understand that spammers often try to make untargetted unsolicited email look like targetted mail; just because you actually are interested in the products being offered doesn't mean they didn't send the advertisement to 65 million other people who weren't.

Cooperation

Cooperation with other network providers is important in preventing abuse and diagnosing problems.

NOCs need to be responsive to queries from other NOCs.

The preferred solutions to many problems require a significant development effort. Network providers can cooperate by sharing internally developed tools with others or by organizing with others to fund development of tools which meet a common need. Tapping the network infrastructure fund might be appropriate for some of these tools.

Net abuse is one of the areas where the old maxim "If you aren't part of the solution then you are part of the problem" applies.

Failure to support required or recommended email addresses

abuse@yourdomain.net, postmaster@yourdomain.net, usenet@yourdomain.net, domain@yourdomain.net, noc@yourdomain.net, support@yourdomain.net, etc. See RFC2142.txt.

Web host screwups

Failure to have enough disk space availible. You should have enough disk space to satisfy the amount you promised every customer. If you do oversell your disk space, it is your obligation to monitor disk space and add drives (or move accounts) long before the disk fills up.
Make sure you put "UserDir disable" inside each "" section in httpd.conf if you have defined "UserDir" in the main section. Otherwise, you will screw up other peoples namespaces and also create a security hole.

Failure to outsource or hire qualified people

Outsource those services which your network is not big enough to handle or which your staff is not qualified or is too overworked to deal with.

engineering consultants
Security
NOC, at least for 24 hour service. An answering service doesn't cut it.
news server administration or news service.

Another thing to outsource is roaming service, or join a cooperative roaming exchange network like iPass.

Three very common reasons someone might abandon their local ISP Ok. This particular section is theroetically a bit self-serving (because you might outsource some work to me). But failure to outsource where appropriate is one of the primary reasons that most small networks are likely to go the way of the dodo bird, driven out of business by large vertical monopolies with better economies of scale. This is something I don't want to see happen.

You do of course need to use secure communication channels when you allow contract personel to remotely administer some of your systems.