Tuesday, November 30, 2010

The Botnet Battle

It wasn't until I did some research on Botnets that I realized what a powerful thing a distributed army of bots can be. Botnets typically consist of thousands of hosts, which are PCs and servers that have been compromised by malware. Each host in a Botnet takes orders from a Command & Control server, where inter-bot communication is encrypted IRC. Further, the malware may accomplish other evil designs, e.g., intercept keystrokes, infect other computers, etc.

Why do Botnets exist? The way I understand it, they are a source of power in the "dark computing underworld." Suppose Joe is administrator of a Botnet with 10,000 computers, and somebody wants an evil deed done (a DDoS attack, stealing credit card numbers, etc.) Then they would pay Joe to use his Botnet to accomplish the deed.

These complex systems pose a significant concern to network security, where Botnets are often hard to identify, and even harder to stop. Botnets often accomplish their purposes through sending spam, where email links point to malware or websites to DoS attack. Addressing the Botnet spam problem, Microsoft Research published a paper discussing how to recognize spam from Botnets. Their publication provides a measurement study which captures important characteristics of a Botnet. It also gives an architecture that discovers these characteristics in spam.

The way I see it, there is a perpetual battle between Botnets and email systems, and any defensive measures on the part of email systems will be opposed by additional offensive measures from the Botnet. In my opinion, the best way to approach the Botnet battle is to expose as much measurement information about Botnets as possible, and hide information about email systems.

Wireless Mesh Networking Summary

As our class has concluded the section on Wireless Mesh Networks, I will summarize what has been discussed and point out a potential area for research. WMNs has attracted significant attention to research groups, because of 1) the challenging nature of networking using a wireless medium for communication and 2) the advantages offered by ad-hoc networks.

Wireless networking poses only a few significant problems, but these problems adversely affect each layer of the network, and many solutions have been suggested at each layer. First, the fact that nodes must share the medium for communication requires that those communicating take turns in order for traffic to flow. Second, because of the hidden node problem, there is inevitably going to be loss due to interference from those outside carrier sensing range. Third, nodes withhold transmission because of the exposed node problem.

At the link layer, we have discussed techniques such as Lagrangian relaxation to allow nodes to have a fair share of bandwidth within their clique of nodes with which they share the medium. We have looked at how network coding can be used to make up for lost packets, such that the receiver is indifferent to which packets it didn't receive. We have also looked at making use of corrupted (partially correct) packets in some cases.

At higher layers, we have discussed how opportunistic routing can be used as a more reliable source of packet transmission, as routes are often broken with traditional routing on a WMN. At the transport layer, adaptive pacing can be used to allow a sender to be silent while its neighbors relay the message.

In short, the nature of wireless networking requires design adaptations from traditional (wired) networking. Many of the aforementioned techniques address the problems caused by 802.11 MAC unfairness and interference due to the hidden node problem. I'm curious to know whether any work has been done to measure the severity of the loss (lack of potential transmission) due to the exposed node problem. And if this loss is significant, I wonder what could be done to remedy it.

Friday, November 19, 2010

EEC: A Brilliant Approach

In a recent SIGCOMM publication on improving wireless communication, someone had a brilliant idea. Usually, when a packet is received by a wireless device, the packet is checked for errors. If the packet has errors that can't be recovered, it is discarded. The whole packet. The paper gives the following graph showing the fraction of bits that are typically corrupted in a wireless mesh network setting:



This shows that 50% of packets have some corruption within 0 to 2%. They claim that corrupted packets have some value, if the fraction of corrupted packets can be estimated. For example, error-tolerant applications, like video streaming could leverage corrupted packets to increase throughput if the error rate is below a threshold. Other applications could measure the error rate to determine setting selection (such as route selection or WiFi rate adaptation).

As a result, they propose a concept known as error estimation codes (EEC). The idea is to introduce a small amount of extra data to a packet used for error estimation. This is similar to error correction codes, which also introduce redundancy, but use the redundancy for error correction. The problem with error correction is that it requires a lot more redundancy to correct a "more corrupted" packet. EEC's additional redundancy guarantees estimation precision, and is the same regardless of the error rate.

Friday, November 12, 2010

Network Coding

Recently I read the paper "XORs in The Air: Practical Wireless Network Coding", and was impressed with the use of network coding. The idea behind network coding is simple. First, there is the principle that if A xor B = C, then B = A xor C and A= B xor C. Applied to a wireless network setting, suppose there are 2 hosts and a router. Host 1 has A but wants B, and host 2 has B but wants A. The router can satisfy the hosts in 2 transmissions by providing each hosts with the needed packet. Or, the router can satisfy both hosts by simply transmitting C to both simultaneously.

Network coding has large potential benefits for wireless communication, because often the bottleneck is the wireless medium. By reducing the amount of transmission by a node, more data is able to be sent. The paper I mentioned names several topologies and classifies the coding gain, or the amount by which network coding reduces the number of data packets to be sent. Some topologies provide a 1-2x coding gain, and a special "wheel-like" topology provided an unbounded coding gain. A wheel topology is where you have N hosts around 1 router. Given a sending host, every other one except for the host directly across it can hear the sender. They mention that classifying coding gain for an arbitrary topology is a hard problem though. I wonder what has been researched in that area.

Wednesday, November 10, 2010

Evaluating Wireless Mesh Networks

Wireless Mesh Networks (WMNs) have been the topic of much research over the past few years. Challenges inherent to transmitting data over a wireless medium have caused researchers to rethink many commonly held assumptions in how to implement a network. For example, the Hidden and Exposed Terminal problems are accountable for much of the collision-based loss in a wireless network. Also, the design of the 802.11 MAC is not optimal for multiple nodes to send concurrently. At the network layer, route selection is different because it depends on the quality of the signal between any two nodes, and the ability of nodes to back off while others transmit. Because most of the loss in WMNs does not come from queue overflow, important assumptions behind TCP break down.

One paper evaluates Roofnet, a test bed in Cambridge, MA. It shows that, unlike wired networks, when wireless nodes are denser, throughput often increases. It also shows that several nodes are responsible for most of the network's success. Another paper uses a simulations of a wireless backhaul network to show that there is significant unfairness when multi-hop flows compete against single-hop flows. One problem is that the 802.11 MAC has a backoff mechanism that makes it difficult for a node to transmit to a medium when another node is already transmitting. Another issue is that the hidden terminal problem causes a lot of loss from collisions.

In short, WMNs are a new and interesting, unsolved research topic. I'm interested to see what conclusions will be made after several more years of research in Wireless Mesh Networks.

Tuesday, November 9, 2010

Simulating TCP with OMNeT++

For the past several years, ns-2 has been the predominant packet-level simulator used by academics in the US. Because TCP was also developed in the West by academics, ns-2 has become a reputable simulator of TCP. In fact, ns-2 has become a sort of paragon for how TCP should behave. A SIGCOMM paper written by Kevin Fall and Sally Floyd in 1996 uses ns-2 to demonstrate the behavior of four different types of TCP when packets are dropped. They use graphs from the simulator's output to show how Tahoe, Reno, New-Reno and SACK respond to one, two, three, and four induced packet drops. Fall and Floyd's paper not only proves helpful in showing how TCP behaves with packet loss, but also can be used as a benchmark for a correct TCP implementation.

We decided that we would try and replicate the results of Fall and Floyd using OMNeT++, an open-source simulator developed primarily in Europe and the East. OMNeT++ is a growing simulator, and it is not restricted to only network simulations. Since OMNeT++ is less common in the West, we were interested in how its implementations of TCP compare with those of ns-2. We used the network modules from the INET package, and ran simulations of Tahoe, Reno, New-Reno and SACK TCP with one, two, three and four drops. A thorough description of our simulations and findings is given in this paper.

We learned that INET's Reno behaves identically to that of ns-2, that INET's NewReno and SACK behave similarly to ns-2's, but INET's Tahoe implementation has some oddities. With two or more packet drops, INET Tahoe tends to time out rather than resume a typical SLOW-START as it should. As far as OMNeT++'s ability to scale, we were able to run 1,000 simultaneous TCP transfers with the simtime:time ratio of approximately 1:170. We conclude that OMNeT++ can simulate TCP reasonably in most cases, and can scale reasonably. Also, INET Tahoe implementation should be improved.

Tuesday, November 2, 2010

Summary of Network-Layer Discussion

The last few weeks have been instructive as our 660 class has discussed a dozen or so significant publications dealing with the network layer of the Internet's architecture. From BGP replacements to efficient queueing to network neutrality to multicast, we have covered a large set of the current problems in the network-layer.

One theme that seemed to recur was that there are lots of people and politics influencing the policies that are implemented in the Internet. For example, the network neutrality debate involves a number of parties; ISPs, customers, content providers, etc. Multicast is not just implemented at the network layer because it works well; router designers must be convinced it is worth their effort to modify their product, ISPs must be willing to deploy it, and there must be demand from end users.

Another theme that I noticed is that ideas often are not deployed immediately because of the risk and consequences involved in switching to an entirely new architecture. HLP is a protocol designed to overcome many problems inherent with BGP; but there is considerable risk switching inter-domain routing protocols; any problems would affect the whole Internet. It's for that reason that research looks for accurate ways to model and simulate performance of a protocol. The IPv6 upgrade that always seems to be on the brink of the future will carry with it some negative consequences for those that cannot upgrade. Because of the difficulties involved with adopting new architectures and protocols, many of these designs serve as building blocks and ideas for future protocols.

While it is not easy to get a network protocol deployed in the Internet today, there is considerable insight from the ideas suggested in the papers we have read. I was interested in new routing methodologies suggested. Routing table size is growing rapidly, so research is looking at new routing designs that can scale indefinitely. I was impressed with the ideas that came from ROFL and Compact Routing.