THE GNUTELLA PROTOCOL

Goals: To understand the Gnutella protocol and to create a java applet simulating it. Results: See applet.

P2P

Peer-to-peer (P2P) is a type of transient Internet network that allows a group of computer users with the same networking program to connect with each other and directly access files from one another's hard drives.

Gnutella is an example of peer-to-peer software in which individuals can directly exchange files over the Internet. Gnutella is fully decentralized so that a user connected to the network may access the files of other users on the network. Users serve as both clients and servers connected in a daisy-chain fashion Since there are no central servers there may be a high bandwidth requirement.

(The P2P and Gnutella definitions above are from http://whatis.techtarget.com)

Gnutella Protocol Descriptors

Ping

Ping is used to find hosts on a given network

Pong

Pong is the response to a Ping.

Query

This is the process used to search the network.

QueryHit

A servent will respond to a Query with a QueryHit it has the desired information.

Push

A process that allows a servent behind a firewall to contribute data to the network.

Gnutella Protocol At A Glance

1. Obtain IP address of another peer connected to the network.
2. Transmit handshake message.
3. Send Ping to peer.
4. Peer responds with a Pong, which is routed back along the path of the Ping. Pong also forwards your Ping to additional Gnutella peers it knows about, after decrementing TTL by 1.
5. As Pongs arrive, your hostcatcher collects the IP addresses of available peers. All are at most seven degrees of separation from you. The network of peers known to you is called your radius. A typical radius includes 2,000 to 10,000 other peers, with 500,000 to 1 million files.
6. To find a specific file, you enter a search term into the Gnutella interface. Your peer sends a query to every known peer on the network. Each peer searches its local files for matches to your query. If a match is not found, there is no reply. This prevents your computer from being bombarded with ‘no results’ messages.
7. When a match is made (i.e. one or more files is located), a query results message is routed to your peer, containing the IP addresses of the sender and the matching file name. Guntella does not notify the used when the process is complete; it is assumed that peers which have not responded are either still searching or have not found any matches. Newer versions allow the user to set a time out for the search.
8. When you select a query result to download, your peer creates a standard http request from the IP address and filename in the results message. It sends this request directly to the peer, which returns the file via http.
9. If the file you want is behind a firewall, your peer will issue a push request. A push request is a broadcast message that winds its way around the network until it gets to the recipient. The recipient responds by connecting to your peer and transmitting the file. It is estimated that 50 percent of Gnutella traffic is across firewalls.

PROBLEMS:

1. Peers on low bandwidth networks will miss or drop messages, causing descriptors to be lost. The result is that a very large section of your radius can ‘go dark’, becoming unreachable.
2. As stated above, when you select one of the query results for downloading, your peer creates a standard http request from the IP address and filename in the results message. It sends this request directly to the peer, which returns the file via http. This is partly why Gnutella is difficult to shut down: file transfers look like ordinary Web traffic.
3. As pings are forwarded on the network, each peer that receives that packet will decrement the TTL by 1 in order to control overflow. Gnutella relies on fat bandwidth to overcome this inefficiency. High TTLs are adjusted before being forwarded to control congestion.

DESCRIPTORS:

Descriptor Header

Descriptor ID (0-15) Payload Descriptor (16) TTL (17) Hops (18) Payload Length (19-2)

Descriptor ID

A 16-byte string uniquely identifying the descriptor on the network.

Payload Descriptor

0x00 = Ping; 0x01 = Pong; 0x40 = Push; 0x80 = Query; 0x81 = QueryHit.

TTL

Time To Live. The number of times the descriptor will be forwarded by Gnutella servents before it is removed from the network. Each servent will decrement the TTL before passing it on to another servent. When the TTL reaches 0, the descriptor will no longer be forwarded. The TTL is the only way to remove descriptors from the network and if it is left unmonitored, high network traffic and poor performance will likely result.

Hops

The number of times the descriptor has been forwarded. The TTL and Hops field must satisfy the following condition as the descriptor is passed from servent to servent:

TTL (0) = TTL (i) + Hops (i),

Where TTL (i) and Hops (i) are the value of the TTL and Hops fields on the header at the descriptor’s i-th hop, for i>=0.

Payload Length

the length of the descriptor immediately following this header. The next descriptor header is located exactly Payload_Length bytes from the end of this header. In other words, there are no gaps in the Gnutella data stream. The Payload Length field is the only way for a servent to find the beginning of the next descriptor in the input stream. This field should be monitored so that the servent remains in synch with its input stream. The connection is dropped is and when the servent becomes out of synch with its input stream.

Ping (0x00)

The purpose of a Ping request is to announce the servent’s presence on the network, or more precisely, to actively probe the network for other servents. It includes a TTL count, which determines how many times the request can be forwarded to other computers. TTL is 7 by default. Ping descriptors have no payload and are of zero length.

Pong (0x01)

Port (0-1) IP Address (2-5) Number of Files Shared (6-9) Number of KiloBytes Shared (10-13)

Port

The port number on which the responding host can accept incoming connections.

IP Address

The IP address of the responding host.

Number of Files Shared

The number of files that the servent with the given IP address and port is sharing on the network. Number of Kilobytes Shared: the number of kilobytes of data that the servent with the given IP address and port number is sharing on the network.

A Pong descriptor is only sent in response to an incoming Ping descriptor. More than one Pong may be sent in response to one Ping, which enables the host caches to send cached servent address information.

Query (0x80)

Minimum Speed (0-1) Search Criteria (2-...)

Minimum Speed

The minimum speed, in kilobits per second, of servents that can respond to this message. A servent receiving a Query descriptor with a minimum Speed field of n kb/s should only respond with a QueryHit if it is able to communicate at a speed greater than or equal to n kb/s.

Search Criteria

A null (i.e. 0x00) terminated search string. The maximum length of this string is bounded by the Payload_Length field of the descriptor header.

QueryHit (0x81)

Number of Hits (0) Port (1-2) IP Address (3-6) Speed (7-10) Result Set (11-...) Servent Identifier (n-n+16)

Number of Hits

The number of query hits in the Result Set. Port: The port number on which the responding host can accept incoming connections.

IP Address

The IP address of the responding host.

Speed

The speed, in kb/s, of the responding host.

Result Set

A set of responses to the corresponding Query. This set contains Number_of_Hits elements, each with the following structure:

File Index (0-3) File Size (4-7) File Name (8-...)

File Index

A number assigned by the responding host that uniquely identifies the file.

File Size, File Name

The size of the result set is bounded by the size of the Payload_Length field in the Descriptor Header.

Servent Identifier

A 16-byte string uniquely identifying the responding servent on the network. This is typically some function of the servent’s network address. The Servent Identifier is instrumental in the operation of the Push Descriptor. QueryHit descriptors are only sent in response to an incoming Query descriptor. A servent should only reply to a Query with a QueryHit if it contains data that strictly meets the Query Search Criteria. The Descriptor_ID field in the Descriptor Header of the QueryHit should contain the same value as that of the associated Query descriptor. This allows a servent to identify the QueryHit descriptors associated with Query descriptors it generated.

Push (0x40)

Servent Identifier (0-15) File Index (16-19) IP Address (20-23) Port (24-25)
PICTURE

Servent Identifier

The 16-byte string uniquely identifying the servent on the network who is being requested to push the file with index File_Index. The servent initiating the push request should set this field to the Servent_Identifier returned in the corresponding QueryHit descriptor. This allows the recipient of a push request to determine whether of not it is the target of that request.

File Index

The index uniquely identifying the file to be pushed from the target servent. The servent initiating the push request should set this field to the value of one of the File_Index fields form the Result Set in the corresponding QueryHit descriptor.

IP Address, Port

The IP address & port of the host to which the file with File_Index should be pushed.

A servent may send a Push descriptor if it receives a QueryHit descriptor from a servent that does not support incoming connections. This might occur when the servent sending the QueryHit descriptor is behind a firewall. When a servent receives a Push descriptor, it may act upon the push request if and only if the Servent_Identifier field contains the value of its servent identifier. The Descriptor_ID field in the Descriptor Header of the Push descriptor should not contain the same value as that of the associated QueryHit descriptor, but should contain a new value generated by the servent’s Descriptor_ID generation algorithm.

Gnutella Protocol Detail

CONNECTING

The first step to connect a Gnutella servent to the network begins by establishing a connection with another servent currently on the network in order to obtain the servent’s IP address. Once the address is obtained, a TCP/IP connection to the servent is created and the Gnutella connection request string is sent. The handshake message “GNUTELLA CONNECT/0.4\n\n is sent to the other peer, who then responds with “GNUTELLA OK\n\n”. Connections may be rejected, for example, because the versions are not compatible or because that particular servent already has too many connections.

DOWNLOADS

Once a servent receives a QH descriptor, it may initiate the direct download of one of the files described by the descriptors Result Set. Files are downloaded out-of-network (i.e. a direct connection between the source and target servent is established in order to perform the data transfer). File data is never transferred over the Gnutella network. The file download protocol is HTTP. The servent initiating the download sends a request string of the following form to the target server:

GET /get/<FILE INDEX>/<FILENAME>/HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0\r\n
User-Agent: Gnutella\r\n
\r\n

The server receiving this download request responds with HTTP 1.0 compliant headers such as

HTTP 200 OK\r\n
Server: Gnutella\r\n
Content-type: application/binary\r\n
Content-length: 4356789\r\n
\r\n

The file data then follows and should be read up to and including the number of bytes specified in the Content-length provided in the server’s HTTP response. The HTTP Range parameter is used so that interrupted downloads may be resumed at the point where they terminated.

FIREWALLS:

If a direct connection to download from a servent cannot be established due to the presence of a firewall, the servent attempting the download may request a file push. The servent with the desired file routs a Push request to the servent requesting the file. Upon receipt of this Push descriptor, the servent should establish a new TCP/IP connection to the requesting servent. If both parties are behind a firewall, the connection cannot be established and the file transfer cannot take place. If a direct connection can be established, the servent behind the firewall sends the following message:

GIV :/File Name>\n\n

Where and are the values of the File Index and Servent Identifier fields respectively from the Push request received, and is the name of the file in the local file table whose file index number is . The servent receiving the GIV request header (i.e. the Push requester) should extract the and fields form the header and construct an HTTP GET request of the following form:

GET /get///HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0\r\n
User-Agent: Gnutella\r\n
\r\n

Then the file is transfered just like in a regular download.

Descriptor Routing: How is network traffic routed?

Pong descriptors are only routed along the path of the incoming ping descriptor. This ensures that only those servents that routed the Ping descriptor receive a Pong descriptor in response. A servent that receives a Pong descriptor with descriptor ID = n but has not seen a Ping descriptor with descriptor ID = n should remove the Ping descriptor from the network.
QueryHit descriptors may only be sent along the path that carried the incoming Query descriptor. This ensures that only those servents that routed the Query descriptor will see the QueryHit descriptor in response. A servent that receives a QueryHit descriptor with descriptor ID = n but has not seen a Query descriptor with descriptor ID = n should remove the Ping descriptor from the network.
Push descriptors may only be sent along the same path that carried the incoming QueryHit descriptor. This ensures that only those servents that routed the QueryHit descriptor receive a Push descriptor in response. A servent that receives a Push descriptor with Servent Identifier = n but has not seen a QueryHit descriptor with Servent Identifier = n should remove the Push descriptor from the network. Push descriptors are routed by Servent_Identifier, not by Descriptor_ID.
A servent will forward incoming Ping and Query descriptors to all of its directly connected servents, except the one that delivered the incoming Ping or Query.
A servent will decrement a descriptor header’s TTL field, and increment its Hops field, before it forwards the descriptor to any directly connected servent. When the TTL field is found to be zero, the descriptor in no longer forwarded along any connections.
A servent receiving a descriptor with the same Payload descriptor and descriptor ID as one it has received before should not forward the descriptor to any connected servents. The intended recipients have already received such a descriptor and sending it again is a waste of network bandwidth.

Routing tables

In order to properly route replies on the network nodes keep a routing table. This table will keep track of recent traffic. Each table should have a message ID, descriptor ID and connection ID. In this way the node can see where specific descriptors came from. For example, a node will get a Ping from Node X with ID = 5. It will reply with a Pong and also forward that Ping onto its directly connected neighbors. When those neighbors reply the node will have no way of knowing that those Pongs go back to Node X unless it has stored the information. When the node sees a Pong with ID = 5 it knows that this is only in response to a Ping with the same ID. It looks them up in the table and routes the Pong to the source of the original Ping. The same is true for Querys and QueryHits. This reduces erroneous packets on the network. If a node is misbehaving and sending out Pongs for no reason, other nodes will konw to discard those packets because they did not see the corresponding Ping at any time.

Click here to view our in class Presentation.

View this info in a DOC file.

APPLET

This applet demonstrates how the Gnutella Protocol works over a network. You can simulate its operation as well as what happens when an error occurs.

Applet

Click here to see some popular file sharing programs.

Popular Uses of Gnutella

Some outside Links.