Tomahawk
   home  |   about  |   install  |   tutorial  |   resources  |   license  |   test
About Tomahawk

Abstract
Although network-based Intrusion Prevention Systems (NIPS) have recently entered the commercial marketplace as an important defense component security strategy, methodologies for evaluating these systems are still quite primitive. Few commercial tools are available for evaluating the security functionality of NIPS, and TippingPoint believes that current commercial tools for testing the network performance of these devices do not effectively test and report a network's protection. As part of its internal testing and quality assurance programs, TippingPoint has developed a suite of tools for testing our NIPS over the past three years.

Introduction
Network-based Intrusion Prevention Systems (NIPS) are inline network devices that detect and block network-based attacks. A NIPS is deployed in a network much like a switch or a router. The NIPS inspects every packet that passes through it, scanning for attacks designed to infect, disable, or take over another computer system. When the NIPS detects an attack, it takes an action, typically blocking the corresponding packet stream and generating a notification.

As an element of the network infrastructure, a NIPS must perform like switches or routers in latency and throughput. It must also identify attacks without blocking legitimate traffic and detect attacks despite the use of evasion techniques [1]. As the guardian of the network, the NIPS itself must be extremely resilient against attacks lest it become a target for Denial of Service attacks. Although NIPS are an important component of a defensive security strategy, methodologies for evaluating these systems are still quite primitive. Customers and other evaluators typically rely on test suites based on some combination of commercial, open source, and "home grown" tools. These suites are often aggregated from what is easily available, without regard to their effectiveness.

Over the past three years, TippingPoint has developed a powerful set of tools for evaluating the network and security performance of NIPS. Our methodology for testing NIPS involves two types of tests: Network Performance and Security Performance.

Network and Application Performance Testing

Network Performance testing checks the performance of the NIPS as an element of the network infrastructure by measuring network latency and throughput under a variety of traffic loads. Ideally, the latency and throughput of the NIPS should be on par with other network elements in which it is installed. For example, if the NIPS is installed in a LAN, the one-way latency should be less than 200 microseconds and the throughput 1 Gbps or more. If the NIPS is installed in a campus area network, the one-way latency can rise to 500 microseconds. At the perimeter (Internet connection), higher latencies and lower throughput may be acceptable.

Application Performance testing investigates the effect of the NIPS on the performance of network intensive applications, such as NFS file copy and Web file retrieval. Performance is measured by calculating the time to complete an application workload (e.g., copying a large file) with and without the NIPS in place. The effect of the NIPS can then be measured as a percentage slowdown. Ideally, the effect of the NIPS on application performance should be negligible.

Security Performance

Security Performance testing evaluates the NIPS security functions. This includes the NIPS ability to detect and block attacks, false positive resistance, and evasion resistance. An important component of this testing is repeatability testing: given the same attack repeated many times, the test checks that the NIPS consistently detects and blocks the attack.

The Tomahawk Test Tool and Jig
The Tomahawk Test Tool

Our test jig leverages the Tomahawk test tool, software developed by TippingPoint for testing both the network and security performance of NIPS's.

The following diagram shows a single Tomahawk server connected to a NIPS. The Tomahawk server is run on a Linux machine with three network interface cards (NICs): one for management and two for testing. The two test NICs are eth0 and eth1; the management NIC is eth2. The two test NICs are connected, typically through a switch, crossover cable, or IPS. The network connecting the two test NICs must be a layer-2 network.

Tomahawk is used to replay one or more PCAP files through a NIPS. PCAP files are packet traces captured in tcpdump format by a network sniffer. When Tomahawk replays a PCAP file, the packets in the PCAP arrive on the NIPS interfaces in the same order as the NIPS would have seen had it been installed inline when the PCAP was captured.

For example, when an attacker creates a TCP connection to a victim, the two complete a three-way handshake consisting of three packets: SYN, SYN-ACK and ACK [2]. Suppose this interaction had been captured in a PCAP file. When this PCAP is replayed using Tomahawk, Tomahawk will first transmit the SYN packet on eth0 and wait for the packet to arrive on eth1. When this happens, Tomahawk transmits the SYN-ACK packet on eth1. When the SYN-ACK packet arrives on eth0, Tomahawk transmits the SYN-ACK packet on eth0.

If any packet is lost in transit, Tomahawk retransmits it after a timeout. If the packet does not arrive after a user-specified number of retransmissions, Tomahawk reports that the PCAP has "timed out" on the management console. If no packets timeout, Tomahawk reports that the PCAP has completed.

The throughput of the system just described is limited by the latency of the NIPS. For example, if the latency is one second, then the three-way handshake will complete at one packet per second. To overcome this limitation, Tomahawk uses two techniques: pipelining and parallel replay.

Pipelining improves performance by transmitting as many packets at once as possible. That is, Tomahawk will send 10 packets in a burst. In the example above, this would increase the throughput to ten packets per second. The degree of pipelining is limited by the constraint that the traffic arrival order at the NIPS interfaces must be consistent with the packet order in the PCAP. For example, it would be inconsistent to send the SYN and SYN-ACK in a burst, because in a real client the receipt of the SYN triggers the transmission of the SYN-ACK. Sending them in a burst could result in the SYN-ACK being received at the NIPS interface before the SYN had been received, a situation that could not happen in real networks.

Parallel replay further improves performance by playing multiple copies of each traffic trace. Each copy is given its own unique range of IP addresses. For example, two copies a three-way handshake can be played in parallel by given the first copy one pair of addresses and the second copy a different pair. To the NIPS, the traffic appears to be generated by four machines: two servers and two clients. Tomahawk can play thousands of copies of streams in parallel.

Tomahawk can also be used to generate load by replaying clean, background traffic. Traffic can be captured from a customer network and replayed using Tomahawk to evaluate how well the NIPS will perform with that traffic mix. The combination of pipelining and parallel replay allows Tomahawk to simulate a high-speed network with tens of thousands of individual hosts. This traffic will have the same protocol and payload mix as the traffic trace that is replayed. A typical Tomahawk traffic server can generate up to 400 Mbps of load, so by using a few traffic servers and switches, network loads of several gigabits can be created.

When Tomahawk replays a PCAP containing an attack, it can be used to test NIPS security functions as a "black box", i.e., without parsing the NIPS alert logs. Any PCAP that contains an attack that the NIPS purportedly blocks should time out. If it does not time out, then the NIPS failed to block the attack (a false negative), regardless of messages any log file indicates. Conversely, if a PCAP does not contain an attack and the NIPS blocks it, then the NIPS has blocked legitimate traffic (a false positive).

Because it can replay the same attack thousands of times, each with a unique IP address, Tomahawk can be used for repeatability testing. This test ensures that an attack is consistently blocked by the NIPS.

The Tomahawk Jig
By aggregating the traffic from multiple Tomahawk traffic servers, several gigabits/sec of network traffic can be generated and other tests, such as network latency and application performance, can be executed. The jig to perform this testing is called the Tomahawk jig, and is shown in figure 2. For clarity, the management network (eth2) is not shown.

On each traffic server, eth0 interfaces are assigned IP addresses out of the 192.168.150.0/24 subnet; eth1 interfaces are assigned IP addresses out of the 192.168.151.0/24 subnet. The management interfaces (eth2) are assigned IP addresses out of the 192.168.0.0/24 subnet. In the jigs we use in-house, we refer to these traffic servers using the prefix av followed by the last octet of the IP address. For example, the traffic server whose IP address is 192.168.150.52 is av54.

Each traffic server runs a copy of Tomahawk. Traffic from multiple servers is aggregated through a switch before passing through the NIPS. Other test equipment, such as a SmartBits, Agilent, or WebAvalanche can similarly be aggregated through the switches as shown in the figure.

The Ethernet interfaces of the traffic servers are connected to the switches in alternating pairs. For instance, eth0 on av53 is connected to switch1, but eth0 on av54 is connected to switch2. The interfaces continue to alternate in this fashion for all servers in the jig.

This setup allows tests that evaluate the performance impact of the NIPS without rewiring the Tomahawk Jig. For example, if a program on av54 contacts a server at 192.168.151.51, traffic will be routed through eth1 on av54, switch1, the NIPS, switch2, and eth1 on av53. If the same program on av59 contacts a server at 192.168.151.51, traffic will be routed through eth1 on av59, switch2, and eth1 on av53. The impact of the NIPS on network performance can be evaluated by measuring the difference in the amount of time needed to complete a task such as copying a file in the two configurations.

The Tomahawk Jig can also be used without modification to run attacks through evasion tools such as fragroute [6].

Aggregate throughput is measured by gathering statistics over the management network. Each traffic server runs a small program, called qatcld.tcl, which is used to gather statistics from the traffic servers. These statistics include the number of bytes and packets sent and received on eth0 and eth1. A master controller computes and displays the total amount of traffic passing through the system, not including traffic generated by the "other equipment" in the jig.
The Tomahawk Jig is quite flexible in its use. For example, servers can perform multiple tasks with Tomahawk: some traffic servers can replay attacks using Tomahawk, other servers can generate background load using Tomahawk, while other servers can check application latency (such as timing how long it takes to copy a large file or fetch a Web page through the NIPS). The SmartBits can get an accurate measurement of network latency while all of these actions perform on the servers.


[1] Thomas H. Ptacek, Timothy N. Newsham, "Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection," January 1998, Secure Networks Inc., http://www.insecure.org/stf/secnet_ids/secnet_ids.html

[2] Richard Stevens, "TCP/IP Illustrated, Volume 1: The Protocols," Addison-Wesley, 1994, ISBN 0-201-63346-9

[6] http://monkey.org/~dugsong/fragroute

Copyright @2004. All rights reserved.