THE GNUTELLA PROTOCOL

Goals:  To understand the Gnutella protocol and to create a java applet simulating it.
Results: See applet.

P2P

Peer-to-peer (P2P) is a type of transient Internet network that allows a group of computer users with the same networking program to connect with each other and directly access files from one another's hard drives.

Gnutella is an example of peer-to-peer software in which individuals can directly exchange files over the Internet.  Gnutella is fully decentralized so that a user connected to the network may access the files of other users on the network. Users serve as both  clients and servers connected in a daisy-chain fashion Since there are no central servers there may be a high bandwidth requirement.

(The P2P and Gnutella definitions above are from http://whatis.techtarget.com)

Gnutella Protocol Descriptors

Ping

Ping is used to find hosts on a given network

Pong

Pong is the response to a Ping.

Query

This is the process used to search the network.

QueryHit

A servent will respond to a Query with a QueryHit it has the desired information.

Push

A process that allows a servent behind a firewall to contribute data to the network.

Gnutella Protocol At A Glance

PROBLEMS:

DESCRIPTORS:

Descriptor Header

Descriptor ID (0-15) Payload Descriptor (16) TTL (17) Hops (18) Payload Length (19-2)

Descriptor ID

A 16-byte string uniquely identifying the descriptor on the network.

Payload Descriptor

0x00 = Ping; 0x01 = Pong; 0x40 = Push; 0x80 = Query; 0x81 = QueryHit.

TTL

Time To Live. The number of times the descriptor will be forwarded by Gnutella servents before it is removed from the network. Each servent will decrement the TTL before passing it on to another servent. When the TTL reaches 0, the descriptor will no longer be forwarded. The TTL is the only way to remove descriptors from the network and if it is left unmonitored, high network traffic and poor performance will likely result.

Hops

The number of times the descriptor has been forwarded. The TTL and Hops field must satisfy the following condition as the descriptor is passed from servent to servent:

TTL (0) = TTL (i) + Hops (i),

Where TTL (i) and Hops (i) are the value of the TTL and Hops fields on the header at the descriptor’s i-th hop, for i>=0.

Payload Length

the length of the descriptor immediately following this header. The next descriptor header is located exactly Payload_Length bytes from the end of this header. In other words, there are no gaps in the Gnutella data stream. The Payload Length field is the only way for a servent to find the beginning of the next descriptor in the input stream. This field should be monitored so that the servent remains in synch with its input stream. The connection is dropped is and when the servent becomes out of synch with its input stream.

Ping (0x00)

The purpose of a Ping request is to announce the servent’s presence on the network, or more precisely, to actively probe the network for other servents. It includes a TTL count, which determines how many times the request can be forwarded to other computers. TTL is 7 by default. Ping descriptors have no payload and are of zero length.

Pong (0x01)

 
Port (0-1) IP Address (2-5) Number of Files Shared (6-9) Number of KiloBytes Shared (10-13)

Port

The port number on which the responding host can accept incoming connections.

IP Address

The IP address of the responding host.

Number of Files Shared

The number of files that the servent with the given IP address and port is sharing on the network. Number of Kilobytes Shared: the number of kilobytes of data that the servent with the given IP address and port number is sharing on the network.

A Pong descriptor is only sent in response to an incoming Ping descriptor. More than one Pong may be sent in response to one Ping, which enables the host caches to send cached servent address information.

Query (0x80)

Minimum Speed (0-1) Search Criteria (2-...)

Minimum Speed

The minimum speed, in kilobits per second, of servents that can respond to this message. A servent receiving a Query descriptor with a minimum Speed field of n kb/s should only respond with a QueryHit if it is able to communicate at a speed greater than or equal to n kb/s.

Search Criteria

A null (i.e. 0x00) terminated search string. The maximum length of this string is bounded by the Payload_Length field of the descriptor header.

QueryHit (0x81)

 
Number of Hits (0) Port (1-2) IP Address (3-6) Speed (7-10) Result Set (11-...) Servent Identifier (n-n+16)

Number of Hits

The number of query hits in the Result Set. Port: The port number on which the responding host can accept incoming connections.

IP Address

The IP address of the responding host.

Speed

The speed, in kb/s, of the responding host.

Result Set

A set of responses to the corresponding Query. This set contains Number_of_Hits elements, each with the following structure:

File Index (0-3) File Size (4-7) File Name (8-...)

File Index

A number assigned by the responding host that uniquely identifies the file.

File Size, File Name

The size of the result set is bounded by the size of the Payload_Length field in the Descriptor Header.

Servent Identifier

A 16-byte string uniquely identifying the responding servent on the network. This is typically some function of the servent’s network address. The Servent Identifier is instrumental in the operation of the Push Descriptor. QueryHit descriptors are only sent in response to an incoming Query descriptor. A servent should only reply to a Query with a QueryHit if it contains data that strictly meets the Query Search Criteria. The Descriptor_ID field in the Descriptor Header of the QueryHit should contain the same value as that of the associated Query descriptor. This allows a servent to identify the QueryHit descriptors associated with Query descriptors it generated.

Push (0x40)

Servent Identifier (0-15) File Index (16-19) IP Address (20-23) Port (24-25)
PICTURE

Servent Identifier

The 16-byte string uniquely identifying the servent on the network who is being requested to push the file with index File_Index. The servent initiating the push request should set this field to the Servent_Identifier returned in the corresponding QueryHit descriptor. This allows the recipient of a push request to determine whether of not it is the target of that request.

File Index

The index uniquely identifying the file to be pushed from the target servent. The servent initiating the push request should set this field to the value of one of the File_Index fields form the Result Set in the corresponding QueryHit descriptor.

IP Address, Port

The IP address & port of the host to which the file with File_Index should be pushed.

A servent may send a Push descriptor if it receives a QueryHit descriptor from a servent that does not support incoming connections. This might occur when the servent sending the QueryHit descriptor is behind a firewall. When a servent receives a Push descriptor, it may act upon the push request if and only if the Servent_Identifier field contains the value of its servent identifier. The Descriptor_ID field in the Descriptor Header of the Push descriptor should not contain the same value as that of the associated QueryHit descriptor, but should contain a new value generated by the servent’s Descriptor_ID generation algorithm.

Gnutella Protocol Detail

CONNECTING

The first step to connect a Gnutella servent to the network begins by establishing a connection with another servent currently on the network in order to obtain the servent’s IP address. Once the address is obtained, a TCP/IP connection to the servent is created and the Gnutella connection request string is sent. The handshake message “GNUTELLA CONNECT/0.4\n\n is sent to the other peer, who then responds with “GNUTELLA OK\n\n”.  Connections may be rejected, for example, because the versions are not compatible or because that particular servent already has too many connections.

DOWNLOADS

Once a servent receives a QH descriptor, it may initiate the direct download of one of the files described by the descriptors Result Set. Files are downloaded out-of-network (i.e. a direct connection between the source and target servent is established in order to perform the data transfer). File data is never transferred over the Gnutella network. The file download protocol is HTTP. The servent initiating the download sends a request string of the following form to the target server:

The server receiving this download request responds with HTTP 1.0 compliant headers such as

The file data then follows and should be read up to and including the number of bytes specified in the Content-length provided in the server’s HTTP response. The HTTP Range parameter is used so that interrupted downloads may be resumed at the point where they terminated. 

FIREWALLS:

If a direct connection to download from a servent cannot be established due to the presence of a firewall, the servent attempting the download may request a file push. The servent with the desired file routs a Push request to the servent requesting the file. Upon receipt of this Push descriptor, the servent should establish a new TCP/IP connection to the requesting servent. If both parties are behind a firewall, the connection cannot be established and the file transfer cannot take place. If a direct connection can be established, the servent behind the firewall sends the following message:

Where and are the values of the File Index and Servent Identifier fields respectively from the Push request received, and is the name of the file in the local file table whose file index number is . The servent receiving the GIV request header (i.e. the Push requester) should extract the and fields form the header and construct an HTTP GET request of the following form:

Then the file is transfered just like in a regular download.

Descriptor Routing: How is network traffic routed?

Routing tables

In order to properly route replies on the network nodes keep a routing table. This table will keep track of recent traffic. Each table should have a message ID, descriptor ID and connection ID. In this way the node can see where specific descriptors came from. For example, a node will get a Ping from Node X with ID = 5. It will reply with a Pong and also forward that Ping onto its directly connected neighbors. When those neighbors reply the node will have no way of knowing that those Pongs go back to Node X unless it has stored the information. When the node sees a Pong with ID = 5 it knows that this is only in response to a Ping with the same ID. It looks them up in the table and routes the Pong to the source of the original Ping. The same is true for Querys and QueryHits. This reduces erroneous packets on the network. If a node is misbehaving and sending out Pongs for no reason, other nodes will konw to discard those packets because they did not see the corresponding Ping at any time.

Click here to view our in class Presentation.

View this info in a DOC file.

APPLET

This applet demonstrates how the Gnutella Protocol works over a network.  You can simulate its operation as well as what happens when an error occurs.

Click here to see some popular file sharing programs. 

Popular Uses of Gnutella

Some outside Links.