BitTorrent is a mechanism for file sharing that allows clients to download a file by sharing pieces of a file with each other. In that way, when the demand for a file is high, scalability is not an issue due to BitTorrent's independence from servers. BitTorrent has been popular among users since its invention due to its ability to provide them with important files, and as such it comprises a significant portion of Internet traffic.
BitTorrent over the Internet is an extremely complex system, and people have tried to understand its performance for several reasons. First, its quality of service to users is important in that they want to get a file in a reasonable amount of time, preferably without having to upload too much of the file. BitTorrent's performance varies for clients depending on network conditions. It causes the initial seeder to upload a large amount of the file, which may deter people from being the initial seeder. Also, clients can be engineered that download a file without having to contribute by uploading. For these reasons, people have tried to improve it.
ISPs have been concerned about BitTorrent's bandwidth consumption. Because a BitTorrent client typically uses 4 TCP connections, it tends to hog bandwidth from other connections. As a result, ISPs have throttled BitTorrent traffic. Work has been done to improve spatial locality by matching peers to other peers within the same ISP. This has greatly improved download speed for users, and been beneficial to ISPs in not requiring them to relay external BitTorrent traffic.
The more BitTorrent is improved to be reliable, the more likely it will be used for things in addition to file sharing. For example, research has been made on how to use BitTorrent for video streaming. Since video streaming is not tolerant to delay and jitter, improvements to BitTorrent can be valuable. It makes me wonder what BitTorrent will be used for as it becomes more and more reliable.
Saturday, September 25, 2010
Friday, September 24, 2010
Towards Optimal Internet Video
Streaming multimedia over the Internet has been a work in progress for more than a decade. In 2001, audio streaming had been well-investigated. But with the advent of high-speed technology such as ATM and Ethernet, interest turned toward streaming video. With barely enough network speed to do it at the time, people considered how to adapt video to application layer streaming.
One survey addresses solutions to the challenges on three different levels--encoding, application streaming, and operating system support. It discusses how the data can be compressed at different levels, and how applications can choose which quality-level to stream the data at. It does not assume that video will be streamed over TCP, but shows how UDP can be used with error correction schemes. It even shows how an operating system could be changed to better accommodate a video application.
While many of the same techniques discussed in the survey are still widely used, video streaming has changed and improved since then. Due to the improvements in Internet architecture and the increasing demand for Internet video (and I will add multi-core processors), a variety of other strategies are used to effectively stream video.
Because of high demand for video from certain sites, P2P technologies are used to improve scalability. P2P methods such as those used by BitTorrent are being used to offload the stress on servers to clients and other network hosts, allowing a larger number of clients to receive video.
Most Internet video today is streamed through TCP. This is because network capacity is capable of transferring normal-quality video in a timely manner in spite of the overhead required to ensure no loss and packet ordering. It is also used because it provides congestion control; in other words, it is friendly to other data streams in the network.
But video is loss tolerant to a certain extent, and overhead could be saved in the case of streaming high-density video. Could a video transport-layer protocol be helpful that omitted the overhead required for data integrity, and had the network-friendliness of congestion control?
One survey addresses solutions to the challenges on three different levels--encoding, application streaming, and operating system support. It discusses how the data can be compressed at different levels, and how applications can choose which quality-level to stream the data at. It does not assume that video will be streamed over TCP, but shows how UDP can be used with error correction schemes. It even shows how an operating system could be changed to better accommodate a video application.
While many of the same techniques discussed in the survey are still widely used, video streaming has changed and improved since then. Due to the improvements in Internet architecture and the increasing demand for Internet video (and I will add multi-core processors), a variety of other strategies are used to effectively stream video.
Because of high demand for video from certain sites, P2P technologies are used to improve scalability. P2P methods such as those used by BitTorrent are being used to offload the stress on servers to clients and other network hosts, allowing a larger number of clients to receive video.
Most Internet video today is streamed through TCP. This is because network capacity is capable of transferring normal-quality video in a timely manner in spite of the overhead required to ensure no loss and packet ordering. It is also used because it provides congestion control; in other words, it is friendly to other data streams in the network.
But video is loss tolerant to a certain extent, and overhead could be saved in the case of streaming high-density video. Could a video transport-layer protocol be helpful that omitted the overhead required for data integrity, and had the network-friendliness of congestion control?
Saturday, September 18, 2010
Thoughts on Chord
A peer-to-peer system departs from a client-server system in that each peer has equal privileges and similar responsibilities. Most such systems lack a centralized source of control. Peer-to-peer systems have been built to accomplish a variety of tasks: job partitioning, content distribution, content lookup, file systems, and media streaming.
In 2001, a significant peer-to-peer system known as Chord was invented for the purpose of scalable file lookup. With Chord, a number of peers are organized in a large ring. Each peer stores a number of files of interest. A peer does not choose which files it stores; the files it stores are determined by its position in the ring, and on the name of the files.
When someone wants to access a particular file, they use a lookup service through Chord that can identify where a file is. Because of Chord's design, this lookup can be done in a number of steps logarithmic to the number of peers in the ring. This lookup capability provided great improvement on contemporary file lookup services, which usually required querying a significant percent of the group. With Chord, the group of peers could easily scale.
Chord's architecture is well-designed, but there are several things that Chord does not address. First is the case that one peer in the ring may have less resources (e.g., bandwidth, disk space) than another; therefore that peer will slow down file transfer for some of the files, or not be capable of storing a file. While Chord distributes files evenly among peers, it still would be preferable to keep most of the resources with the most capable peers. It is not likely that a simple change to Chord could accomplish this.
Also, peers may desire to minimize the shuffle of files between each other. That is, a peer may want to maximize on the number of its own files it stores, and to minimize on storing files it previously did not have. To make this possible, Chord could be modified to store references to files instead of the files themselves. With this modification, it remains to figure out how peers announce which files they have to begin with, and what to do when all peers which have a particular file leave the group.
In 2001, a significant peer-to-peer system known as Chord was invented for the purpose of scalable file lookup. With Chord, a number of peers are organized in a large ring. Each peer stores a number of files of interest. A peer does not choose which files it stores; the files it stores are determined by its position in the ring, and on the name of the files.
When someone wants to access a particular file, they use a lookup service through Chord that can identify where a file is. Because of Chord's design, this lookup can be done in a number of steps logarithmic to the number of peers in the ring. This lookup capability provided great improvement on contemporary file lookup services, which usually required querying a significant percent of the group. With Chord, the group of peers could easily scale.
Chord's architecture is well-designed, but there are several things that Chord does not address. First is the case that one peer in the ring may have less resources (e.g., bandwidth, disk space) than another; therefore that peer will slow down file transfer for some of the files, or not be capable of storing a file. While Chord distributes files evenly among peers, it still would be preferable to keep most of the resources with the most capable peers. It is not likely that a simple change to Chord could accomplish this.
Also, peers may desire to minimize the shuffle of files between each other. That is, a peer may want to maximize on the number of its own files it stores, and to minimize on storing files it previously did not have. To make this possible, Chord could be modified to store references to files instead of the files themselves. With this modification, it remains to figure out how peers announce which files they have to begin with, and what to do when all peers which have a particular file leave the group.
Thursday, September 16, 2010
Metrics and Measurements
In 1997, Vern Paxson published a seminal paper on Internet measurement. His publication, titled "End-to-End Internet Packet Dynamics", attempted to characterize the Internet's behavior in terms of concise metrics: out-of-order delivery, packet corruption, bottleneck bandwidth, and packet loss.
Internet measurement effectively "bridges the gap" between theory and practice. Paxson's measurements demonstrated how common assumptions in hardware and protocols were often violated. That gave protocol and hardware designers the information needed to optimize existing protocols and hardware. As the Internet is a dynamically changing system, it must be continually measured in order to evaluate whether its architecture is still sufficient for its load and topology.
Internet measurement effectively "bridges the gap" between theory and practice. Paxson's measurements demonstrated how common assumptions in hardware and protocols were often violated. That gave protocol and hardware designers the information needed to optimize existing protocols and hardware. As the Internet is a dynamically changing system, it must be continually measured in order to evaluate whether its architecture is still sufficient for its load and topology.
Saturday, September 11, 2010
Thoughts on an End-Middle-End Architecture
Many see good reasons to allow functionality to take place on the core of the Internet. Some have suggested modifying the Internet to be more friendly toward middleboxes. In their SIGCOMM '07 publication "An End-Middle-End Approach to Connection Establishment," Saikat Guha and Paul Francis at Cornell University point out several reasons why middleboxes are not only an unavoidable part of the Internet, but that they are also desirable.
Middleboxes are useful for several reasons. First, they provide a way to block unwanted packets before they reach an endpoint, and this provides defense against DoS attacks. Second, by adding functionality to the core, middleboxes allow third parties (ISPs, corporate organizations) to get partial control of and information from connections that pass through their networks. Also, by adding functionality to middleboxes, firewalls can be more accurate in which packets are blocked and which are not.
Francis and Guha in their paper identify five transport services that should be provided on a connection. Most of these services are already provided to an extent with the current Internet:
NUTSS allows two different phases to take place in a connection. The first is where access control is negotiated, and location of end-hosts is determined. The second is the actual transfer of the data. Negotiation takes place through policy boxes (known as P-boxes). P-boxes provide authentication for actual data flows.
In a data flow, the end-hosts will need to provide the aforementioned authentication to middleboxes (M-boxes) along the way. To enhance M-boxes' awareness of data, a fifth element is added to TCP -- the service identifier, or a global unique identifier of the type of application that is used to communicate between the hosts.
Based on what I understand about the NUTSS architecture, I am impressed with how it separates address from naming more so than DNS does. Addresses are provided by P-boxes with the assumption that they may change over time. Mobile IP could easily be implemented in this architecture.
I agree that firewalls should be more intelligent about how they block packets. However, I don't understand how adding a service id to a connection would allow a firewall to better be able to filter packets other than knowing which application that a given packet maps to. For a firewall to more accurately block packets, it may be required to assemble an entire response, which would be difficult.
A third impression I have from reading about this architecture is that there are going to be a lot of attacks made on P-boxes. If a policy box is compromised, then it could be made to grant connection access to hosts it otherwise would deny. If this architecture is be incrementally deployed, more could be said on how to secure them.
Middleboxes are useful for several reasons. First, they provide a way to block unwanted packets before they reach an endpoint, and this provides defense against DoS attacks. Second, by adding functionality to the core, middleboxes allow third parties (ISPs, corporate organizations) to get partial control of and information from connections that pass through their networks. Also, by adding functionality to middleboxes, firewalls can be more accurate in which packets are blocked and which are not.
Francis and Guha in their paper identify five transport services that should be provided on a connection. Most of these services are already provided to an extent with the current Internet:
- User-friendly host naming
- Network-level identification of all hosts, and best-effort delivery
- A way for a host to know which packet should be delivered to which applications
- Blocking unwanted packets
- Negotiation of middlebox usage between endpoints and networks in between
NUTSS allows two different phases to take place in a connection. The first is where access control is negotiated, and location of end-hosts is determined. The second is the actual transfer of the data. Negotiation takes place through policy boxes (known as P-boxes). P-boxes provide authentication for actual data flows.
In a data flow, the end-hosts will need to provide the aforementioned authentication to middleboxes (M-boxes) along the way. To enhance M-boxes' awareness of data, a fifth element is added to TCP -- the service identifier, or a global unique identifier of the type of application that is used to communicate between the hosts.
Based on what I understand about the NUTSS architecture, I am impressed with how it separates address from naming more so than DNS does. Addresses are provided by P-boxes with the assumption that they may change over time. Mobile IP could easily be implemented in this architecture.
I agree that firewalls should be more intelligent about how they block packets. However, I don't understand how adding a service id to a connection would allow a firewall to better be able to filter packets other than knowing which application that a given packet maps to. For a firewall to more accurately block packets, it may be required to assemble an entire response, which would be difficult.
A third impression I have from reading about this architecture is that there are going to be a lot of attacks made on P-boxes. If a policy box is compromised, then it could be made to grant connection access to hosts it otherwise would deny. If this architecture is be incrementally deployed, more could be said on how to secure them.
Friday, September 10, 2010
Should Middleboxes be Allowed?
One of the major design aspects of Internet Architecture is the end-to-end principle; that complexity be kept on the endpoints and that the core remain simple. The basis of this principle is that the core of the Internet needs to be kept simple to allow it to maximize data transmission. The endpoints (servers, clients, etc.) are then used to ensure that all data arrives in the right order.
While the end-to-end principle has been in force since the Internet's beginnings, the principle has been violated increasingly due to middleboxes. A middlebox is any host that sits between two communicating endpoints; i.e., somewhere in the core. What kinds of middleboxes are prevalent in the Internet? NATs, firewalls, proxies, web-caches, traffic shapers, protocol translators are all examples.
Many have looked down on the use of middleboxes in the Internet for a variety of reasons. According to one RFC, NATs cause the following problems:
Why then are middleboxes used? A NAT, or a Network Address Translator, is used to improve the problem of address shortage in IPv4. Firewalls and proxies block unwanted traffic. Caches improve the locality of data content, potentially reducing load on the core of the Internet. Traffic shaping improves service for certain classes of content, and protocol translators are necessary with the incremental deployment of IPv6. While it may be possible that the Internet could be redesigned to obviate their need, middleboxes are necessary given today's architecture.
While the end-to-end principle has been in force since the Internet's beginnings, the principle has been violated increasingly due to middleboxes. A middlebox is any host that sits between two communicating endpoints; i.e., somewhere in the core. What kinds of middleboxes are prevalent in the Internet? NATs, firewalls, proxies, web-caches, traffic shapers, protocol translators are all examples.
Many have looked down on the use of middleboxes in the Internet for a variety of reasons. According to one RFC, NATs cause the following problems:
- They create a single point where fate-sharing does not work
- They make multi-homing difficult
- They inhibit the use of IPSec
- They enable casual use of private addresses, causing name space collisions.
- They facilitate concatenating existing private name spaces with the public DNS.
Why then are middleboxes used? A NAT, or a Network Address Translator, is used to improve the problem of address shortage in IPv4. Firewalls and proxies block unwanted traffic. Caches improve the locality of data content, potentially reducing load on the core of the Internet. Traffic shaping improves service for certain classes of content, and protocol translators are necessary with the incremental deployment of IPv6. While it may be possible that the Internet could be redesigned to obviate their need, middleboxes are necessary given today's architecture.
Saturday, September 4, 2010
DONA's Implications
A recent publication proposed a new architecture called DONA (Data Oriented Network Architecture.) The publication argues that users have several expectations that the Internet does not efficiently fulfill.
First, users expect that naming of resources be persistent. They don't like the disappointing 404 indicating that a resource has been moved, or that content has been re-hosted elsewhere. To remedy this problem, HTTP has the redirect functionality. However, this may not be the most efficient way to do this.
Users also rely on the availability of data, in that their data must be there, and be quickly available. A single server is not always adequate to supply a resource, when that resource is extremely popular. For that reason, CDNs and self-scalable file-sharing protocols like BitTorrent have been created. The Internet was designed to associate data to a location, and this is not consistent with such systems.
Third, users want their data to be authentic; unmodified, and having come from a reliable source. Today this is done by securing a channel, allowing two hosts to communicate securely.
DONA improves how the Internet achieves these goals. It introduces what it calls "flat, self-certifying names". What this means is that names for resources are no longer tied to addresses. Instead, they use principals, which have public/private key pairs, and allow for data authentication. Using these names allow for a resource to be queried independent of the location of the resource. This allows for content to be moved, and to be located in multiple places. In that way, data is persistent, available, and authentic.
Requests for data are routed by name, rather than by host. To facilitate this, there are a number of resolution handlers (RHs), entities that keep track of where data should be routed. This system obviates the need for DNS, speeding up the initial communication process.
With DONA explained, it leaves me with several questions. First is its scalability. The Internet has a large number of resource names; I imagine that the RHs will fill up and take a long time to look up where to route a request for a resource. Secondly, there may be downsides to decoupling a resource from a provider. A resource only need be unique within the provider's list of resources. This reduces the number of names that the Internet must account for. Third, if DONA is to replace the current system, can it be deployed incrementally and seamlessly without affecting the rest of the Internet?
First, users expect that naming of resources be persistent. They don't like the disappointing 404 indicating that a resource has been moved, or that content has been re-hosted elsewhere. To remedy this problem, HTTP has the redirect functionality. However, this may not be the most efficient way to do this.
Users also rely on the availability of data, in that their data must be there, and be quickly available. A single server is not always adequate to supply a resource, when that resource is extremely popular. For that reason, CDNs and self-scalable file-sharing protocols like BitTorrent have been created. The Internet was designed to associate data to a location, and this is not consistent with such systems.
Third, users want their data to be authentic; unmodified, and having come from a reliable source. Today this is done by securing a channel, allowing two hosts to communicate securely.
DONA improves how the Internet achieves these goals. It introduces what it calls "flat, self-certifying names". What this means is that names for resources are no longer tied to addresses. Instead, they use principals, which have public/private key pairs, and allow for data authentication. Using these names allow for a resource to be queried independent of the location of the resource. This allows for content to be moved, and to be located in multiple places. In that way, data is persistent, available, and authentic.
Requests for data are routed by name, rather than by host. To facilitate this, there are a number of resolution handlers (RHs), entities that keep track of where data should be routed. This system obviates the need for DNS, speeding up the initial communication process.
With DONA explained, it leaves me with several questions. First is its scalability. The Internet has a large number of resource names; I imagine that the RHs will fill up and take a long time to look up where to route a request for a resource. Secondly, there may be downsides to decoupling a resource from a provider. A resource only need be unique within the provider's list of resources. This reduces the number of names that the Internet must account for. Third, if DONA is to replace the current system, can it be deployed incrementally and seamlessly without affecting the rest of the Internet?
Internet Design Principles: Anything Left Out?
In a graduate networking course at BYU, we review the design philosophy of the Internet. By the "Internet", I refer to the architecture and protocols under which today's applications and services (web browsing, file sharing, media streaming, communication, etc.) are run. The success of the Internet is evaluated in terms of how well its design goals are met.
So what are its design goals? That has depended on what people have termed they should be over the course of the Internet's history. It has been around since the '70s, so the goals change over time due to advancements in hardware, and due to new apps that are invented, and can be supported by the Internet at the time.
Many of the design principles still in the Internet today were identified by David Clark in 1988. The primary goal he mentioned was to allow communication between different types of networks. Other goals included fault tolerance, support for multiple types of service, accommodation for a variety of networks, distributed resource management, cost effectiveness, ease for computers to attach to the Internet, and accountability for resources.
An additional design consideration since then has been security. Since then, the Internet has been made available for world-wide use, opening up many vulnerabilities relating to privacy and authenticity. Much security research has resulted from this need, and security continues to improve today.
So we have Clark's design goals and security. Are there additional design goals that the Internet has today? By and large, the above goals are still goals for today's Internet. Perhaps we should examine some of the recent applications. Television and video streaming is becoming a big thing. Perhaps the Internet could make quality of service a bigger priority.
So what are its design goals? That has depended on what people have termed they should be over the course of the Internet's history. It has been around since the '70s, so the goals change over time due to advancements in hardware, and due to new apps that are invented, and can be supported by the Internet at the time.
Many of the design principles still in the Internet today were identified by David Clark in 1988. The primary goal he mentioned was to allow communication between different types of networks. Other goals included fault tolerance, support for multiple types of service, accommodation for a variety of networks, distributed resource management, cost effectiveness, ease for computers to attach to the Internet, and accountability for resources.
An additional design consideration since then has been security. Since then, the Internet has been made available for world-wide use, opening up many vulnerabilities relating to privacy and authenticity. Much security research has resulted from this need, and security continues to improve today.
So we have Clark's design goals and security. Are there additional design goals that the Internet has today? By and large, the above goals are still goals for today's Internet. Perhaps we should examine some of the recent applications. Television and video streaming is becoming a big thing. Perhaps the Internet could make quality of service a bigger priority.
Subscribe to:
Posts (Atom)