Welcome to the PRISMon web page
PRISMon is part of the PRISM research project of the Laboratory for Advanced Systems Research at the University of Texas at Austin.
PRISM is a scalable monitoring service that makes imprecision a first-class abstraction for its scalable DHT-based aggregation service. Exposing imprecision is essential for both correctness in the face of network and node failures and scalability to large systems. Specifically, PRISM introduces the notion of conditioned consistency that quantifies imprecision along a three-dimensional vector:
- Arithmetic imprecision (AI) bounds numeric inaccuracy (e.g., +/- 10%).
- Temporal imprecision (TI) bounds update delays (e.g., at most 15 seconds staleness).
- Network imprecision (NI) quantifies uncertainty due to network and node failures (e.g., 90% nodes are alive).
PRISMon builds on PRISM's scalable monitoring abstraction to provide a new distributed ``24/7 monitoring'' service for the PlanetLab global testbed that is considerably more scalable and time-responsive than existing services. Compared to current centralized CoMon service which updates every 5 minutes, PRISMon is a distributed monitoring service for PlanetLab that
- Scales to tens of thousands of nodes and millions of attributes.
- Reduces monitoring overhead by an order of magnitude!
- Time responsive: updates every 15 seconds.
PRISMon provides three key monitoring functionalities:
(1) An aggregation backplane for tracking
node-centric and slice-centric resource usage across Planetlab
that compute
the maximum, average, and total resource usage statistics
with bounded numeric precision (e.g., +/- 10%)
in real-time (e.g., atmost 10 seconds latency)
for each individual slice
running on Planetlab machines,
(2) A traffic watchdog service that analyzes both incoming and outgoing
traffic across each Planetlab node to detect compromised nodes and slices,
port-scans, worm attacks, and
answer queries of the form:
(a) Is Planetlab being
used as a platform for mounting DDoS attacks? and
(b) Which top-k ports have been heavily scanned
in the recent past indicating port-scanning activity
or even a potential worm outbreak, that commonly spreads
by exploiting vulnerabilities in services listening on those ports and
(3) A SQL query interface for selecting ``lightly-loaded''
nodes and ``resource-hogging'' slices to deploy new experiments.
See the publications page for more technical details on PRISMon.
Check out our cool demo!