FreePastry release notes
Release 1.3.2, 2/26/04.
FreePastry is a modular, open source implementation of the Pastry p2p routing
and location substrate.
Contributors
Peter Druschel , Eric
Engineer, Romer Gil , Jeff Hoye , Y. Charlie Hu , Sitaram Iyer , Andrew Ladd , Alan Mislove , Animesh Nandi , Ansley Post, Charlie Reis, Atul Singh , and RongMei Zhang
contributed to the FreePastry code. The code is based on algorithms and
protocols described in the following papers:
- A. Rowstron and P.
Druschel, Pastry: Scalable, distributed object location and routing
for large-scale peer-to-peer systems. IFIP/ACM International
Conference on Distributed Systems Platforms (Middleware), Heidelberg,
Germany, pages 329-350, November, 2001. [
pdf.zip |
ps.zip | pdf
| ps ]
- M. Castro, P. Druschel,
Y. C. Hu, A. Rowstron, Exploiting network proximity in peer-to-peer
overlay networks. Submitted for publication. [
pdf.zip |
ps.zip | pdf
|
ps ]
- M. Castro, P. Druschel,
A.-M. Kermarrec and A. Rowstron, "SCRIBE: A large-scale and
decentralised application-level multicast infrastructure", IEEE
Journal on Selected Areas in Communications (JSAC) (Special issue on
Network Support for Multicast Communications). 2002, to appear. [
pdf.zip |
ps.zip | pdf
| ps
]
- A.
Rowstron and P. Druschel, "Storage management and caching in PAST, a
large-scale, persistent peer-to-peer storage utility", 18th
ACM SOSP'01, Lake Louise, Alberta, Canada, October 2001. [ pdf.zip
| ps.zip
| pdf
| ps
] (Corrected - erratum for original version: ps)
-
F. Dabek,
P. Druschel, B. Zhao, J. Kubiatowicz, and I. Stoica, "Towards a
Common API for Structured Peer-to-Peer Overlays", 2nd IPTP'03,
Berkeley, CA, February, 2003.
[ pdf
]
-
Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, Animesh Nandi, Antony Rowstron and Atul Singh,
SplitStream: High-bandwidth multicast in a cooperative environment.
In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). Lake George, New York, October 2003.
[ pdf ]
Requirements
The software requires a Java runtime, version 1.4. The software was
developed using Sun's SDK, version 1.4.1.
Changes since release 1.3.1
- Overhaul of the wire package. The new version is higher
performance and has eliminated some known synchronization issues that
can cause deadlock. Furthermore the package produces the more sensible
NodeIsDeadException if a message is attempted to be sent after the node
has simulated being killed. Killing of nodes remains for testing, and is not supported on all platforms.
- New version of Replication Manager in rice.p2p.replication. Based on commonapi instead of pastry. There is an alternate/simpler interface to rm in the rice.p2p.replication.manager package.
- Past has been migrated to the new rm (p2p.replication).
Changes since release 1.3
- New version of Scribe implemented on the common API. New
version
is re-designed to increase the performance, reliability and ease of
use. The previous version of Scribe is still include in the release.
- New version of PAST provides a method to obtain handles to all
replicas, and various bug fixes. The old version of PAST that was built
on top of the Pastry API is now deprecated. The version built upon the
commonApi should now be used.
- The discovery protocol for automatic location of nearby node given
any bootstrap node is now implemented. This is described in the Pastry
proximity paper.
- A prototype implementation of SplitStream, a high bandwidth
multicast system,is now released. This implementation does not yet
implement all of the optimizations described in the paper; therefore,
overheads maybe higher than those reported in the paper. See below
for more details about using SplitStream.
Changes since release 1.2
- FreePastry now supports the common API, as described in the
IPTPS'03 paper listed above. Newly developed applications should
use this API, and only import the p2p.commonapi package. The previous,
native FreePastry API continues to be supported for backward
compatibility.
- A more general implementation of the PAST
archival storage system was added in this release. The release adds
support for replication and caching of data. The implementation
provides a generic distributed hash table (DHT) facility, and allows
control over the semantics of tuple insertion for a given,
application-specific value type. The previous version of PAST has been
marked as deprecated and may not be included in future releases.
Applications that use Past should migrate to the new version.
- A version of the replication manager, which provides
application-independent management of replicas, is included.
Application that need to replicate data on the set of n nodes
closest to a given key can use the replication manager in order to
perform this task.
Changes since release 1.1
- A simple implementation of the PAST
archival storage system was added in this release. The implementation
does not currently perform the storage balancing algorithms described in
the SOSP paper, nor does it perform data replication or caching. Support
for replication and caching will be included in the next release.
- An anycast primitive was added to the implementation of Scribe, a
group communication infrastructure. Also, several methods and new
interfaces and a new interface were added to provide apps more control
over the construction and maintenance of Scribe trees.
- Some initial performance work was done. As a result, large
simulations run about 50% faster, and use a lot less memory.
Notes
Release 1.3.2 has the following limitations.
- More performance tuning needs to be done.
- Three "transport protocols" are provided with this release,
"Direct", "RMI", and "Wire".
- "Direct" emulates a network, allowing many Pastry nodes to
execute in one Java VM without a real network. This is very useful for
application development and testing.
- "RMI" is simple transport protocol based on Java RMI. Several
"virtual" Pastry nodes can be started in each Java VM. RMI is used for
communication across physical nodes.
- "Wire" uses an event-based implementation based on sockets, and
uses the non-blocking NIO support in Java 1.4. It uses UDP as transport
by default, switching dynamically to TCP for large messages or in the
event that a stream of traffic is sent to a given node. The wire
protocol is till in beta testing, in part because the implementations of
the Java NIO do not yet work properly and efficiently in several Java
VMs on some platforms (for instance, Sun's JDK 1.4.1 RC1 and earlier on
Windows platforms).
Support for simulated killing of nodes is for testing purposes only. There is a potential to lose any messages that are queued to be sent when you simulate killing a node. Any such dropped messages will print the error:
Potentially lost the message: [message.toString()]
There is a known issue in the BSD (FreeBSD, OS X) implementations of Java NIO that bubbles up the error: "IOException: Bad file descriptor", whenever a node simulates being killed. We catch the error and print the message to the command line. Simulated killing of nodes causes instability in some JVMs prior to JRE 1.4.2.
On Unix systems, Java's socket implementation uses File Descriptors. In this implementation, the File Descriptors can be used up if too many nodes are running in a single process. If you require running more than one node inside a single process, consider increasing the number of File Descriptors per process (bash: ulimit -n), or lowering the available sockets per node (rice.pastry.wire.SocketManager.MAX_OPEN_SOCKETS).
Future transport protocols will also be based on an open
standard to ensure interoperability among different implementations.
- Security support does not exist in this release. Therefore,
the software should only be run in trusted environments. Future
releases will include security.
(Background: To start a Pastry node, the IP address (and
port number, unless the default port is used) of a "bootstrap" or
"contact" node must be provided. If no such node is provided, and no
other Pastry node runs on the local machine, then FreePastry creates a
new overlay network with itself as the only node. Any node that is
already part of the Pastry node can serve as the bootstrap node.)
- The Scribe implementation included in this release does not yet
support the tree optimization techniques describe in Sections IV, E-F of
the
Scribe paper.
Installation
To use the binary distribution, download the pastry jar file and set
the Java classpath to include the path of the jar file. This can be done
using the "-cp" command line argument, or by setting the CLASSPATH
variable in your shell environment.
To compile the source distribution, you will need to have GNU make
installed (available from ftp://ftp.gnu.org/pub/gnu/make
, or as part of cygwin ) on your
system. Expand the archive (FreePastry-1.3.tgz or FreePastry-1.3.zip)
into a directory. Set the environment variables mentioned in setpath.csh
to values appropriate for your system. Execute "make" in the top level
directory (you may have to run "make" twice the first time), then change
to the "classes" directory to run FreePastry.
You may have to provide a Java security policy file with sufficient
permissions to allow FreePastry to contact other nodes. The simplest way
to do this is to install a ".java.policy" file with the following
content into your home directory:
grant {
permission java.security.AllPermission;
};
Running FreePastry
1. To run a HelloWorld example:
java [-cp pastry.jar] rice.pastry.testing.DistHelloWorld
[-msgs m] [-nodes n] [-port p] [-bootstrap bshost[:bsport]] [-protocol [wire,rmi]]
[-verbose|-silent|-verbosity v] [-help]
Without -bootstrap bshost[:bsport], only localhost:p is used for bootstrap.
Default verbosity is 5, -verbose is 10, and -silent is -1 (error msgs only).
(replace "pastry.jar" by "FreePastry-<version>.jar", of course)
Some interesting configurations:
a. java rice.pastry.testing.DistHelloWorld
Starts a standalone Pastry network, and sends two messages
essentially to itself. Waits for anyone to connect to it,
so terminate with ^C.
b. java rice.pastry.testing.DistHelloWorld -nodes 2
One node starts a Pastry network, and sends two messages to
random destination addresses. At some point another node
joins in, synchronizes their leaf sets and route sets, and
sends two messages to random destinations. These may be
delivered to either node with equal probability. Note how
the sender node gets an "enroute" upcall from Pastry before
forwarding the message.
c. java rice.pastry.testing.DistHelloWorld -nodes 2 -verbose
Also prints some interesting transport-level messages.
d. pokey$ java rice.pastry.testing.DistHelloWorld
gamma$ java rice.pastry.testing.DistHelloWorld -bootstrap pokey
Two machines coordinate to form a Pastry network.
e. pokey$ java rice.pastry.testing.DistHelloWorld
gamma$ java rice.pastry.testing.DistHelloWorld -bootstrap pokey
wait a few seconds, and interrupt with <ctrl-C>
gamma$ java rice.pastry.testing.DistHelloWorld -bootstrap pokey
The second client restarts with a new NodeID, and joins the
Pastry network. One of them sends messages to the now-dead
node, finds it down, and may or may not remove it
from the leaf sets. (repeat a few times to observe both
possibilities, i.e., leaf sets of size 3 or 5). If the
latter, then leaf set maintenance kicks in within a minute
on one of the nodes, and removes the stale entries.
f. pokey$ java rice.pastry.testing.DistHelloWorld
gamma$ java rice.pastry.testing.DistHelloWorld -bootstrap pokey -nodes 2
The client on gamma instantiates two virtual nodes, which
are independent in identity and functionality. Note how the
second virtual node bootstraps from the first (rather than
from pokey). Try starting say 10 or 30 virtual nodes, killing
with a <ctrl-C>, starting another bunch, etc.
2. To run the same HelloWorld application on an emulated network:
java [-cp pastry.jar] rice.pastry.testing.HelloWorld [-msgs m] [-nodes n] [-verbose|-silent|-verbosity v] [-simultaneous_joins] [-simultaneous_msgs] [-help]
Some interesting configurations:
a. java rice.pastry.testing.HelloWorld
Creates three nodes, and sends total three messages from
randomly chosen nodes to random destinations addresses
(which are delivered to the node with the numerically
closest address).
b. java rice.pastry.testing.HelloWorld -simultaneous_joins -simultaneous_msgs
Join all three nodes at once, then issue three messages,
then go about delivering them.
3. To run a regression test that constructs 500 nodes connected by an
emulated network:
java [-cp pastry.jar] rice.pastry.testing.DirectPastryRegrTest
4. To run a simple performance test based on an emulated network with
successively larger numbers of nodes:
java [-cp pastry.jar] rice.pastry.testing.DirectPastryPingTest
Writing applications on top of FreePastry
Applications that wish to use the native Pastry API must extend the
class rice.pastry.client.PastryAppl. This class implements the Pastry
API. Each application consists minimally of an application class that
extends rice.pastry.client.PastryAppl, and a driver class that
implements main(), creates and initializes one of more nodes, etc.
Example applications and drivers can be found in rice.pastry.testing;
the Hello World suite (HelloWorldApp.java, HelloWorld.java,
DistHelloWorld.java) may be a good starting point.
Another sample Pastry application is rice.scribe.
Application writers are stringly encouraged to base newly written
applications on the new common API. Such applications should import the
package rice.p2p.commonapi.
Running Scribe
1. To run a simple distributed test:
java [-cp pastry.jar] rice.p2p.scribe.testing.ScribeRegrTest [-nodes n] [-port p] [-bootstrap bshost[:bsport]] [-protocol (direct|wire|rmi)] [-help]
Ports p and bsport refer to contact port numbers (default = 5009).
Without -bootstrap bshost[:bsport], only localhost:p is used for bootstrap.
(replace "pastry.jar" by "FreePastry-<version>.jar", of course)
Running PAST
1. To run a simple distributed test:
java [-cp pastry.jar] rice.p2p.past.testing.PastRegrTest [-nodes n] [-protocol (direct|wire|rmi)]
This creates a network of n nodes (10 by default), and then
runs the Past regression test over these nodes.
Running SplitStream
The FreePastry implementation of SplitStream implements the system
described in the SOSP '03 paper.
SplitStream.java class provides an interface that can be used by
applications to create SplitStream instances. Each SplitStream forest
is represented by a channel object (Channel.java), where a channel
object encapsulates multiple stripe trees. Each stripe tree for a
SplitStream forest is represented by a class (Stripe.java), which
handles the data reception and subscription failures.
Applications can configure the maximum capacity each channel can
accomodate in terms of number of children it is willing to accept. Applications can control total
outgoing capacity they are willing to provide by changing the value in ScribeSplitStreamPolicy.java.
1. To run a simple distributed test:
java [-cp pastry.jar] rice.p2p.splitstream.testing.SplitStreamRegrTest [-nodes n] [-protocol (direct|wire|rmi)]
This creates a network of n nodes (10 by default), and then
runs the SplitStream regression test over these nodes.