Skip to Content

The University of Texas at Austin

CS Technical Abstracts (2005)

TR-05-01

Dimitrov, Nedialko B. and C. Greg Plaxton. "Optimal Cover Time for a Graph-Based Coupon Collector Process". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-01 (technical report). January 28, 2005.

In this paper we study the following covering process defined over an arbitrary directed graph. Each node is initially uncovered and is assigned a random integer rank drawn from a suitable range. The process then proceeds in rounds. In each round, a uniformly random node is selected and its lowest-ranked uncovered outgoing neighbor, if any, is covered. We prove that if each node has in-degree $\Theta(d)$ and out-degree $O(d)$, then with high probability, every node is covered within $O(n\cdot\max(1,(\log n)/d))$ rounds, matching a lower bound due to Alon. Alon has also shown that, for a certain class of $d$-regular expander graphs, the upper bound holds no matter what method is used to choose the uncovered neighbor. Is the same true of all graphs? We answer the latter question in the negative.

DOWNLOAD tr05-01.pdf

TR-05-02

Bunescu, Razvan C. "Learning for Collective Information Extraction". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-02 (technical report). February 1, 2005. 45 pages.

An Information Extraction (IE) system analyses a set of documents with the aim of identifying certain types of entities and relations between them. Most IE systems treat separate potential extractions as independent. However, in many cases, considering influences between different candidate extractions could improve overall accuracy. For example, phrase repetitions inside a document are usually associated with the same entity type, the same being true for acronyms and their corresponding long form. One of our goals in this thesis is to show how these and potentially other types of correlations can be captured by a particular type of undirected probabilistic graphical model. Inference and learning using this graphical model allows for ``collective information extraction'' in a way that exploits the mutual influence between possible extractions. Preliminary experiments on learning to extract named entities from biomedical and newspaper text demonstrate the advantages of our approach. The benefit of doing collective classification comes however at a cost: in the general case, exact inference in the resulting graphical model has an exponential time complexity. The standard solution, which is also the one that we used in our initial work, is to resort to approximate inference. In this proposal we show that by considering only a selected subset of mutual influences between candidate extractions, exact inference can be done in linear time. Consequently, a short term goal is to run comparative experiments that would help us choose between the two approaches: exact inference with a restricted subset of mutual influences or approximate inference with the full set of influences. The set of issues that we intend to investigate in future work is two fold. One direction refers to applying the already developed framework to other natural language tasks that may benefit from the same types of influences, such as word sense disambiguation and part-of-speech tagging. ‰‰‰Another direction concerns the design of a sufficiently general framework that would allow a seamless integration of cues from a variety of knowledge sources. We contemplate using generic sources such as external dictionaries, or web statistics on discriminative textual patterns. We also intend to alleviate the modeling problems due to the intrinsic local nature of entity features by exploiting syntactic information. All these generic features will be input to a feature selection algorithm, so that in the end we obtain a model which is both compact and accurate.

DOWNLOAD tr05-02.pdf

TR-05-03

Choi, Young-Ri. and Mohamed G. Gouda. "Disciplined Flood Protocols in Sensor Networks". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-03 (technical report). February 22, 2005. 10 pages.

Flood is a communication primitive that can be initiated by the base station of a sensor network to send a copy of some message to every sensor in the network. When a flood of some message is initiated, the message is forwarded by every sensor that receives the message until the sensors decide not to forward the message any more. This uncontroled flood can cause the forwarded messages to collide with one another, with the result that many sensors in the network do not receive any copy of the flooded message. In this paper, we present a family of flood protocols, called the disciplined flood protocols, that aim to reduce or prevent most message collisions that occur in a regular flood protocol. We show by simulation that whereas a regular flood protocol can cause a flooded message to reach ˆˆbetween 60% and 80% of all sensors in the network, a disciplined flood ˆprotocol can cause a flooded message to reach between 88% and 99% of all sensors in the network.

DOWNLOAD tr05-03.pdf

TR-05-04

Napper, Jeff. and Lorenzo. Alvisi. "Lock-Free Serializable Transactions". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-04 (technical report). February 22, 2005. 29 pages.

NO ABSTRACT

DOWNLOAD tr05-04.pdf

TR-05-05

Hanson, Richard J. and David R. Kincaid. "Notes on GMRES Algorithm Organization". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-05 (technical report). March 5, 2005. 10 pages.

The Generalized Minimum Residual (GMRES) iterative method and variations of it are frequently used for solving systems of linear equations of the form Ax=b, where A is a large sparse nonsingular nonsymmetric matrix. We discuss ways to reorganize the algorithm to improve its efficiency.

DOWNLOAD tr05-05.pdf

TR-05-06

Ramakrishnan, Smriti R., Rui Mao, Aleksey A. Nakorchevskiy, John T. Prince, Willard S. Willard, Weijia Xu, Edward M. Marcotte, and Daniel P. Miranker. "A fast coarse filtering method for protein identification by mass spectrometry". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-06 (technical report). March 9, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-07

Willard, Willard S., Wenguo Liu, Rui Mao, Shulin Ni, Weijia Xu, and Daniel P. Miranker. "mSQL: SQL Extensions and Database Mechanisms forManaging Biosequences". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-07 (technical report). March 9, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-08

Mao, Rui, Ving I. Lei, Smriti Ramakrishnan, Weijia Xu, and Daniel P. Miranker. "On Metric-Space Indexing and Real Workloads". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-08 (technical report). March 9, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-09

Grechanik, Mark, William R. Cook, and Karl J. Lieberherr. "Static Checking of Object-Oriented Polylingual Systems". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-09 (technical report). March 15, 2005.

Significant complexity of large-scale polylingual systems, which consist of multiple interoperating programs written in different languages and running on different platforms, makes their design, evolution, and maintenance difficult. During these stages of software development programmers introduce various hard-to-find defects when reasoning about how programs interoperate with one another. We present Foreign Object REification Language (FOREL), an extension for object-oriented languages which enables dynamic creation and destruction of connectors between interoperating programs in polylingual systems. FOREL enables programmers to navigate and manipulate foreign objects as first-class entities of programs. FOREL type checking coupled with static analysis enables developers to reason more effectively about polylingual systems. We describe our implementation, prove the soundness of the FOREL type system, and give our type checking algorithm. We used our solution for a real-world commercial polylingual system and discovered bugs that were not detected during its design and testing. Our experience suggests that FOREL is practical, and its type checking algorithm is efficient.

DOWNLOAD NOT AVAILABLE

TR-05-10

Martin, Jean-Philippe, Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement, Michael Dahlin, and Carl Porth. "BAR Tolerance for Cooperative Services". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-10 (technical report). December 20, 2005. 27 pages.

This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both rational behaviors when nodes arbitrarily deviate from the protocol for their local benefit and Byzantine behaviors when a broken, misconfigured, or malicious node arbitrarily deviates from the protocol for any other reason. The paper examines this problem in the context of a cooperative backup system and makes three contributions. First, it introduces the BAR (Byzantine, Altruistic, Rational) model, which provides the foundation for reasoning about the properties of this class of services. Second, it presents a general three-tier architecture aimed at reducing the complexity of building services developed in the BAR model. Our realization of this architecture includes an asynchronous replicated state machine that provides the normal safety and liveness guarantees as long as at most than (n-2)/3 nodes are Byzantine; the rest of the nodes can be rational. The paper's third contribution is to describe an implementation of PIB, the first cooperative backup service to tolerate both Byzantine users and an unbounded number of rational users. We show that, under the BAR model, PIB provides provable safety and liveness guarantees. We also show that our approach is practical: our prototype of a BART state machine executes 20 requests per second and our PIB prototype can back up a gigabyte of data in 20 minutes.

DOWNLOAD tr05-10.pdf

TR-05-11

Mudigonda, Jayaram. The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-11 (technical report). March 18, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-12

Boyer, Robert S., Warren A. Hunt Jr., and Serita M. Nelesen. "A compressed format for collections of phylogenetic trees and improved consensus performance". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-12 (technical report). March 21, 2005. 34 pages.

NO ABSTRACT

DOWNLOAD tr05-12.pdf

TR-05-13

Gouda, Mohamed G. and Eunjin (EJ) Jung. "Vulnerability Analysis of Certificate Graphs". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-13 (technical report). April 4, 2005. 20 pages.

A certificate system can be represented by a directed graph, called a certificate graph, where each node represents a user that has a public key and a private key and each edge (u; v) represents a certificate that is signed by the private key of u and contains the public key of v. Two types of damage can be done in a certificate graph when the private key of a node u in the graph is revealed to an adversary: explicit and implicit. The explicit damage is that the adversary can impersonate node u to other nodes in the graph (until it is known to other nodes that the private key of u is revealed). The implicit damage is that the adversary can impersonate nodes other than u to other nodes in the graph. In this paper, we define a metric called vulnerability that measures the scope of explicit and implicit damage that may occur in a certificate graph when the private key of a node in the graph is revealed to an adversary. Using this metric, we analyze the vulnerability of different classes of certificate graphs. We present three algorithms that compute the vulnerability of an arbitrary certificate graph, and use these algorithms to show that certificate dispersal and stricter acceptance criteria reduce the vulnerability of certificate graphs.

DOWNLOAD tr05-13.pdf

TR-05-14

Batory, Don. "Feature Models, Grammars, and Propositional Formulas". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-14 (technical report). April 11, 2005. 13 pages.

Feature models are used to specify members of a product-line. Despite years of progress, contemporary tools provide limited support for feature constraints and offer little or no support for debugging feature models. We integrate prior results to connect feature models, grammars, and propositional formulas. This connection allows arbitrary propositional constraints to be defined among features and enables off-the-shelf satisfiability solvers to debug feature models. We also show how our ideas can generalize recent results on the staged configuration of feature models.

DOWNLOAD tr05-14.pdf

TR-05-15

Bientinesi, Paolo, Kazushige Goto, Tze Meng Low, Enrique Quintana-Orti, Robert van de Geijn, and Field Van Zee. "FLAME 2005 Prospectus: Towards the Final Generation of Dense Linear Algebra Libraries FLAME Working Note #16". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-15 (technical report). April 20, 2005. 13 pages.

What if one set out to develop the final dense linear algebra library? Such a library would not necessarily have to be backward compatible to existing libraries (although this would be preferred), but it would have to be forward compatible to future architectures, languages, and functionality. Invariably such a final generation library would have to be able to generate routines from specification rather than taking the form of the static libraries that have evolved from EISPACK and LINPACK. In other words, we believe that the software architecture of such a final generation library would be very different from LAPACK and ScaLAPACK. In this talk we discuss results from our FLAME project that suggest that mechanical derivation of algorithms from mathematical specification is achievable, as is the mechanical analysis (cost and stability), and mechanical code generation. These results suggest that the input to such a system would be the mathematical specifications of operations to be included in the library, rewrite rules for translating algorithms to code, and models of target architectures. From this, a full-blown version of the system should then be able to mechanically generate algorithms and implementations, packaged as libraries tuned on mechanically generated performance analyses, with mechanically generated stability analyses.

DOWNLOAD tr05-15.pdf

TR-05-16

Lopez-Herrejon, Roberto E., Don Batory, and William Cook. "Evaluating Support for Features in Advanced Modularization Technologies. Extended Report". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-16 (technical report). April, 2005. 37 pages.

A software product-line is a family of related programs. Each program is defined by a unique combination of features, where a feature is an increment in program functionality. Modularizing features is difficult, as feature-specific code often cuts across class boundaries. New modularization technologies have been proposed in recent years, but their support for feature modules has not been thoroughly examined. In this paper, we propose a variant of the expression problem as a canonical problem in product-line design. The problem reveals a set of technology-independent properties that feature modules should exhibit. We use these properties to evaluate five technologies: AspectJ, Hyper/J, Jiazzi, Scala, and AHEAD. The results suggest an abstract model of feature composition that is technology-independent and that relates compositional reasoning with algebraic reasoning.

DOWNLOAD tr05-16.pdf

TR-05-18

Mark, William R. and Donald Fussell. "Real-Time Rendering Systems in 2010". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-18 (technical report). May 2, 2005. 8 pages.

We present a case for future real-time rendering systems that support non-physically-correct global illumination techniques by using ray tracing visibility algorithms, by integrating scene management with rendering, and by executing on general-purpose single-chip parallel hardware (CMP's). We explain why this system design is desireable and why it is feasible. We also discuss some of the research questions that must be addressed before such a system can become practical.

DOWNLOAD tr05-18.pdf

TR-05-19

McDonald, Robert, Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, and Stephen W. Keckler. "TRIPS Instruction Set Architecture (ISA) Manual". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-19 (technical report). May 13, 2005. 194 pages.

This document specifies the Instruction Set Architecture (ISA) for the TRIPS architecture, a novel, scalable, and low power architecture for future technologies.

DOWNLOAD tr05-19.pdf

TR-05-20

Smith, Aaron, Jon Gibson, Jim Burrill, Robert McDonald, Doug Burger, Stephen W. Keckler, and Kathryn S. McKinley. "TRIPS Intermediate Language (TIL) Manual". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-20 (technical report). May 13, 2005.

This document specifies the TRIPS Intermediate Language (TIL) for the TRIPS architecture, a novel, scalable, and low power architecture for future technologies. TIL is a RISC-like intermediate representation for use by humans and compilers that want to write low-level TRIPS code.

DOWNLOAD tr05-20.pdf

TR-05-21

Yoder, Bill, Jon Gibson, Jim Burrill, Robert McDonald, Doug Burger, Stephen W. Keckler, and Kathryn S. McKinley. "TRIPS Assembly Language (TASL) Manual". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-21 (technical report). May 13, 2005. 30 pages.

This document specifies the TRIPS Assembly Language (TASL) for the TRIPS architecture, a novel, scalable, and low power architecture for future technologies.

DOWNLOAD tr05-21.pdf

TR-05-22

Smith, Aaron, Jim Burrill, Ramadass Nagarajan, Robert McDonald, Karthikeyan Sankaralingam, Nick Nethercote, Bill Yoder, Doug Burger, Stephen W. Keckler, and Kathryn S. McKinley. "TRIPS Application Binary Interface (ABI) Manual". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-22 (technical report). May 13, 2005. 16 pages.

This document specifies the TRIPS Application Binary Interface (ABI) for the TRIPS architecture, a novel, scalable, and low power architecture for future technologies.

DOWNLOAD tr05-22.pdf

TR-05-23

Djeu, Peter, Jim Burrill, and William R. Mark. "Measuring Memory Performance in Commercial Shadow Volume Rendering". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-23 (technical report). May 13, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-24

Li, Harry C., Lorenzo Alvisi, and Allen Clement. "The Game of Paxos". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-24 (technical report). May 16, 2005. 12 pages.

We describe two abstractions that show how Lamport's Paxos and Castro and Liskov's PBFT are essentially the same consensus protocol, but for different failure models. The first abstraction is a regular register that captures how processes in both protocols propose and decide values. The second abstraction is tokens that capture how these protocols guarantee agreement despite partial failures. Together, the register and tokens provide the abstraction of a write-once regular register, which we claim is an intuitive way to conceptualize Paxos and PBFT. We also point out how details specific to Paxos and PBFT manifest themselves in the implementation of our abstractions.

DOWNLOAD tr05-24.pdf

TR-05-25

Kim, Min Sik, Taekhyun Kim, YongJune Shin, Simon S. Lam, and Edward J. Powers. "Scalable Clustering of Internet Paths by Shared Congestion". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-25 (technical report). May 23, 2005. 10 pages.

Identifying Internet paths that share the same bottleneck enables efficient and fair resource allocation among flows along those paths. In particular, the topology of an overlay network can be improved by identifying bottlenecks shared by multiple overlay connections. Such bottlenecks can be identified using several shared congestion detection techniques. However, all of these techniques, except DCW (Delay Correlation with Wavelet denoising), require that the paths under consideration share a common end point. As a result, their applicability to paths between different sources and different destinations in an overlay network is limited. Moreover, all the techniques (including DCW) have been designed to detect shared congestion between a pair of paths. To cluster N paths, a straightforward approach of using pairwise tests would require O(N^2) time complexity. In this paper, we propose a scalable approach to cluster Internet paths using multidimensional indexing. By using DCW as the underlying shared congestion detection technique, our approach does not require a common end point for the paths being considered. By storing per-path data in an indexed multidimensional space, we can reduce the computational complexity of clustering to O(N log N). The indexing overhead can be further improved by reducing dimensionality of the space through the wavelet transform. Computation cost is kept low by using the same wavelet transform for both denoising in DCW and dimensionality reduction. Our approach is evaluated using simulations and found to be effective for large N. The tradeoff between indexing overhead and clustering accuracy is shown empirically.

DOWNLOAD tr05-25.pdf

TR-05-26

Li, Nancy. "Parameterized Firewall Templates: A New Approach to Firewall Design". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-26 (honors thesis). May 23, 2005.

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-27

Lopez-Herrejon, Roberto E. and Don Batory. "Taming Aspect Composition: A Functional Approach". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-27 (technical report). June 8, 2005. 10 pages.

Aspect Oriented Programing is a promising paradigm that challenges traditional notions of program modularity. Despite its increasing acceptance, aspects have been documented to suffer limited reuse, unpredictable behavior, and difficult modular reasoning. We develop an algebraic model that treats aspects as program transformations and uncovers aspect composition as the source of the problems mentioned. We propose an alternative model of composition that eliminates these problems, preserves the power of aspects, and lays out an algebraic foundation on which to build and understand emerging modularization technologies and tools.

DOWNLOAD tr05-27.pdf

TR-05-28

Hwang, Irvin and Peter Stone. "Discovering Conditions for Intermediate Reinforcement with Causal Models". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-28 (honors thesis). May 18, 2005. 18 pages.

Learning to perform a task in an environment with sparse feedback is a difficult problem. While several approaches for increasing feedback during learning have been taken, these methods suffer from the dependency on human knowledge and engineering to find good solutions. We propose using causal models to increase the amount of feedback that will improve learning. This approach does not require domain-specific human engineering because causal models can be constructed directly from the environment using empirical data. Preliminary experiments and results show causal models can be used to automatically discover conditions for applying intermediate feedback that accelerate learning.

DOWNLOAD tr05-28.pdf

TR-05-29

Jain, Navendu, Mike Dahlin, and Renu Tewari. "Using Bloom Filters to Refine Web Search Results". The University of Texas at Austin, Department of Computer Sciences and IBM Almaden Research Center, San Jose, CA. Report# TR-05-29 (technical report). May 27, 2005. 9 pages.

Search engines have primarily focused on presenting the most relevant pages to the user quickly. A less well explored aspect of improving the search experience is to remove or group all near-duplicate documents in the results presented to the user. In this paper, we apply a Bloom filter based similarity detection technique to address this issue by refining the search results presented to the user. First, we present and analyze our technique for finding similar documents using content-defined chunking and Bloom filters, and demonstrate its effectiveness in compactly representing and quickly matching pages for similarity testing. Later, we demonstrate how a number of results of popular and random search queries retrieved from different search engines, Google, Yahoo, MSN, are similar and can be eliminated or re-organized. Finally, we apply our near-duplicate detection technique to show how to effectively remove similar search results and improve user experience.

DOWNLOAD tr05-29.pdf

TR-05-30

Gupta, Prateek and Vitaly Shmatikov. "Towards Computationally Sound Symbolic Analysis of Key Exchange Protocols". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-30 (technical report). June 13, 2005. 29 pages.

We present a cryptographically sound formal method for proving correctness of key exchange protocols. Our main tool is a fragment of a symbolic protocol logic. We demonstrate that proofs of key agreement and key secrecy in this logic imply simulatability in Shoup's secure multi-party framework for key exchange. As part of the logic, we present cryptographically sound abstractions of CMA-secure digital signatures and Diffie-Hellman exponentiation, which is a technical result of independent interest. We illustrate our method by constructing a proof of security for a simple authenticated Diffie-Hellman protocol.

DOWNLOAD tr05-30.pdf

TR-05-31

Dhillon, Inderjit S. and Suvrit Sra. "Generalized Nonnegative Matrix Approximations with Bregman Divergences". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-31 (technical report). June 1, 2005. 14 pages.

Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation of the nonnegative input data. Due to these advantages, NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively deficient. This paper makes algorithmic progress by modeling and \emph{solving} (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its low-rank approximation. The multiplicative update formulae in the pioneering work by \citet{lee00} arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of ``link'' functions for modeling non-linear relationships are also discussed.

DOWNLOAD tr05-31.pdf

TR-05-32

Liu, Huaiyu. "Designing a Resilient Routing Infrastructure for Peer-to-Peer Networks Protocols". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-32 (ph.d. dissertation). June 28, 2005. 231 pages.

Peer-to-Peer (P2P) networks have enabled a new generation of large scale distributed applications. Unlike the traditional client-server model, in a P2P network, all peers in the network both contribute to and receive services from the network. Due to their decentralized and self-organizing nature, P2P networks enable tens of thousands (potentially millions) of Internet machines to form virtual communities and share the vast resources aggregated from the participating machines. In this dissertation, we address a fundamental problem in designing P2P networks: How to construct and maintain a resilient infrastructure to provide reliable, scalable, and efficient routing service for millions of Internet nodes without central service and administration? The absence of central administration, the large number of nodes involved, and the high rate of node dynamics pose great challenges to the design of a resilient routing infrastructure for P2P networks. Our work tackles the above challenges and has successfully addressed the following problems: (1) How to design protocols to maintain ``consistency" of routing tables to ensure successful routing? (2) How to reason about correctness of these protocols? (3) How to evaluate the system's ability to sustain high rates of node dynamics and how to improve this ability? In particular, we have designed a suite of protocols that construct and maintain a resilient routing infrastructure. To base the protocol design on a sound foundation, we have introduced a theoretical foundation, called C-set trees, to guide protocol design and correctness reasoning. Based on the theoretical foundation, we have designed a join protocol and developed rigorous correctness proofs for the protocol. We have also designed an efficient failure recovery protocol, which has been demonstrated by extensive simulations to perform perfect recovery even when 50% of network nodes fail. Both the join protocol and the failure recovery protocol have been integrated into a single framework following a module composition approach. Furthermore, we have conducted extensive simulation experiments to study behaviors of the designed system under different rates of node dynamics (churn experiments). We find our system to be effective, efficient, and provide reliable and scalable routing service for an average node lifetime as low as 8.3 minutes (the median lifetime measured for two deployed P2P networks, Napster and Gnutella, was 60 minutes). Based on our system design, we have implemented a prototype system, named Silk, as the routing component of a shared infrastructure for P2P networks and other large-scale distributed applications.

DOWNLOAD tr05-32.pdf

TR-05-33

Zhang, Xincheng. "Protocol Design for Scalable and Reliable Group Rekeying Divergences". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-33 (technical report-dissertation). June 29, 2005. 183 pages.

In secure group communications, group users share a symmetric key, called group key. The group key is used for encrypting data traffic among group users or restricting access to resources intended for group users only. A key server needs to change the group key after users join and leave (called group rekeying), by composing a rekey message that consists of encrypted new keys (encryptions, in short) and delivering it to all users. When group size is large, it becomes infeasible to rekey per user join or leave because of its high processing and bandwidth overheads. Instead, the key server changes the group key per rekey interval, the length of which indicates how tight the group access control is. It is desired to reduce the overhead of group rekeying as much as possible in order to allow frequent rekeying. To address the scalability issue of the key server, Wong, Gouda, and Lam proposed the key tree approach in 1998. The same idea also appears in RFC 2627. The scalability of rekey transport, however, was not addressed. The objective of this dissertation is to design a scalable and reliable rekey transport protocol and evaluate its performance. Rekey transport differs from data transport because rekey messages require scalable, reliable, and real-time delivery. Furthermore, each user needs only a small subset of all of the encryptions in a rekey message. We have proposed a scalable and reliable rekey transport protocol; its efficiency benefits from the special properties of rekey transport. The protocol runs in two steps: a multicast step followed by a unicast recovery step. Proactive forward error correction (FEC) is used in multicast to reduce delivery latency and limit the number of users who need unicast recovery. The unicast recovery step provides eventual reliability; it also reduces the worst-case delivery latency as well as user bandwidth overhead. In the protocol design, various technical issues are addressed. First, a key identification scheme is proposed for each user to identify the subset of new keys that it needs. The communication cost of this scheme is only several bytes per packet. Second, we investigate how to space the sending times of packets to make proactive FEC resilient to burst loss. Lastly, an adaptive FEC scheme is proposed to make the number of users who need unicast recovery controlled around a small target value under dynamic network conditions. This scheme also makes FEC bandwidth overhead and rekey interval close to the the feasible minima. Application-layer multicast (ALM) offers new opportunities to do naming and routing. In the dissertation, we have studied how to use ALM to support concurrent rekey and data transport in secure group communications. Rekey traffic is bursty and requires fast delivery. It is desired to reduce rekey bandwidth overhead as much as possible since it competes for available bandwidth with data traffic. Towards this goal, we propose a multicast scheme that exploits proximity in the underlying network. We further propose a rekey message splitting scheme to significantly reduce rekey bandwidth overhead at each user access link and network link. We formulate and prove correctness properties for the multicast scheme and rekey message splitting scheme. Simulation results show that our approach can reduce rekey bandwidth overhead from several thousand encryptions to less than ten encryptions for more than 90% of users in a group of 1024 users.

DOWNLOAD tr05-33.pdf

TR-05-34

Castrillon-Candas, Julio E., Chandrajit Bajaj, and Vinay Siddavanahalli. "An Adaptive Irregularly Spaced Fourier Method for Protein-Protein Docking". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-34 (regular technical report). July 11, 2005. 28 pages.

Abstract: In this paper we introduce a grid free irregularly sampled Fourier approach for accurately predicting rigid body protein-protein docking sites. Of the many docking approaches, grid based Fast Fourier Transform (FFT) approaches have been shown to produce by far the fastest, correlation profiles of complex protein-protein interactions over the six dimensional search space. However, these uniform sampling methods still possess high time complexity, and in particular, are highly memory intensive for predicting large protein-protein docking sites. By taking advantage of an irregularly and adaptively sampled, smooth particle representation of molecular shape, in combination with the use of irregularly spaced FFT transforms, we eliminate an explicit uniform grid. We are able to produce efficiently, highly compressed, but accurate, docking correlation profiles.

DOWNLOAD tr05-34.pdf

TR-05-35

Sumners, Rob and Sandip Ray. "Proving Invariants via Rewriting and Abstraction". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-35 (regular tech report). July 7, 2005. 13 pages.

We present a deductive method for proving invariants of reactive systems. Our approach uses term rewriting to reduce invariant proofs to reachability analysis on a finite graph. This substantially automates invariant proofs by obviating the need to define inductive invariants while still benefitting from the expressiveness of deductive methods. We implement a procedure supporting this approach which interfaces with the ACL2 theorem prover. The interface affords sound extension of our procedure with rewrite rules based on proven theorems. We demonstrate the method in the verification of cache coherence protocols.

DOWNLOAD tr05-35.ps

TR-05-36

Kim, Min Sik. "Buildling and Maintaining Overlay Networks for Bandwidth-Demanding Applications". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-36 (dissertation). July 11, 2005. 171 pages.

The demands of Internet applications have grown significantly in terms of required resources and types of services. Overlay networks have emerged to accommodate such applications by implementing more services on top of IP (Internet Protocol). However, while overlay networks are successful in circumventing limitations of IP, the task of building and maintaining an overlay network is still challenging. In an overlay network, participating hosts are virtually fully-connected through the underlying Internet. However, since the quality of overlay connections varies, the performance of the overlay network is dependent on which connections are chosen to be utilized. Therefore, maintaining a ``good'' overlay network topology is crucial in achieving high performance. To demonstrate how much performance gain can be achieved through topology changes, a distributed algorithm to build an overlay multicast tree is proposed for streaming media distribution. The algorithm finds an optimal tree such that the average bandwidth of receivers is maximized under an abstract network model. However, increasing bandwidth does not necessarily lead to a better overlay topology; in overlay networks, interference between overlay connections should be taken into account. Since such interference occurs when different overlay connections pass through a congested link simultaneously, detecting congestion shared by multiple overlay connections is necessary to avoid bottlenecks. For shared congestion detection, a novel technique called DCW (Delay Correlation with Wavelet denoising) is proposed. Previous techniques to detect shared congestion have limitations in applying to overlay networks; they assume a common source or destination node, drop-tail queueing, or a single point of congestion. However, DCW is applicable to any pair of paths on the Internet without such limitations. It employs a signal processing method, wavelet denoising, to separate queueing delay caused by network congestion from various other delay variations. The proposed technique is evaluated through both simulations and Internet experiments. They show that for paths with a common synchronization point, DCW provides faster convergence and higher accuracy while using fewer packets than previous techniques. Furthermore, DCW is robust and accurate without a synchronization point; more specifically, it can tolerate a synchronization offset of up to one second between two packet flows. Because DCW is designed to detect shared congestion between a pair of paths, there is a concern about scalability when it is used in a large-scale overlay network. To cluster N paths, a straightforward approach of using pairwise tests would require O(N^2) time complexity. To address this issue, a scalable approach to cluster Internet paths using multidimensional indexing is presented. By storing per-path data in a multidimensional space indexed using a tree-like structure, the computational complexity of clustering is reducible to O(N log N). The indexing overhead can be further improved by reducing dimensionality of the space through the wavelet transform. Computation cost is kept low by using the same wavelet transform for both denoising in DCW and dimensionality reduction. The proposed approach is evaluated using simulations and found to be effective for large N. The tradeoff between indexing overhead and clustering accuracy is shown empirically. As a case study, an algorithm that improves overlay multicast topology is designed. Because overlay multicast forwards data without support from routers, data may be delivered multiple times over the same physical link, causing a bottleneck. This problem is more serious for applications demanding high bandwidth such as multimedia distribution. Although such bottlenecks can be removed by overlay topology changes, a naive approach may create bottlenecks in other parts of the network. The proposed algorithm removes all bottlenecks caused by the redundant data delivery of overlay multicast, detecting such bottlenecks using DCW. In a case where the source rate is constant and the available bandwidth of each link is not less than the rate, the algorithm guarantees that every node receives at the full source rate. Simulation results show that even in a network with a dense receiver population, the algorithm finds a tree that satisfies all the receiving nodes while other heuristic-based approaches often fail. A similar approach to finding bottlenecks and removing them through topology changes is applicable to other types of overlay networks. This research will enable bandwidth-demanding applications to build more efficient overlay networks to achieve higher throughput.

DOWNLOAD tr05-36.pdf

TR-05-37

Fidelman, Peggy, Thayne Coffman, Risto Miikkulainen, and Peter Stone. "Detecting Motion in the World with a Moving Quadruped Robot". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-37 (regular report). August 19, 2005. 12 pages.

For a robot in a dynamic environment, the ability to detect motion is crucial. Although legs are arguably the most versatile means oflocomotion for a robot, and thus the best suited to an unknown or changing domain, existing methods for motion detection either require that the robot have wheels or that its walking be extremely slow and tightly constrained. This paper presents a method for detecting motion from a quadruped robot walking at its top speed. The method is based on a neural network that learns to predict optic flow at each timestep, thus allowing anomalies to be detected. The system is demonstrated to be capable of detecting motion in the robot's surroundings.

DOWNLOAD tr05-37.pdf

TR-05-38

Zhang, Yin, Zihui Ge, Matthew Roughan, and Albert Greenberg. "Network Anomography". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-38 (regular technical report). 16 pages.

Anomaly detection is a first and important step needed to respond to unexpected problems and to assure high performance and security in IP networks. We introduce a framework and a powerful class of algorithms for {\em network anomography,} the problem of inferring network-level anomalies from widely available data aggregates. The framework contains novel algorithms, as well as a recently published approach based on Principal Component Analysis (PCA). Moreover, owing to its clear separation of inference and anomaly detection, the framework opens the door to the creation of whole families of new algorithms. We introduce several such algorithms here, based on ARIMA modeling, the Fourier transform, Wavelets, and Principal Component Analysis. We introduce a new {\em dynamic anomography} algorithm, which effectively tracks routing and traffic change, so as to alert with high fidelity on intrinsic changes in network-level traffic, yet not on internal routing changes. An additional benefit of dynamic anomography is that it is robust to missing data, an important operational reality. To the best of our knowledge, this is the first anomography algorithm that can handle routing changes and missing data. To evaluate these algorithms, we used several months of traffic data collected from the Abilene network and from a large Tier-1 ISP network. To compare performance, we use the methodology put forward earlier for the Abilene data set. The findings are encouraging. Among the new algorithms introduced here, we see: high accuracy in detection (few false negatives and few false positives), and high robustness (little performance degradation in the presence of measurement noise, missing data and routing changes).

DOWNLOAD tr05-38.pdf

TR-05-39

Grechanik, Mark, Kathryn S. McKinley, and Dewayne E. Perry. "Automating and Validating Program Annotations". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-39 (technical report). September 7, 2005. 38 pages.

Program annotations help to catch errors, improve program understanding, and specify invariants. Adding annotations, however, is often a manual, laborious, tedious, and error prone process especially when programs are large. We offer a novel approach for automating a part of this process. Developers first specify an initial set of annotations for a few variables and types. Our LEarning ANnnotations (Lean) system combines these annotations with run-time monitoring, program analysis, and machine-learning approaches to discover and validate annotations on unannotated variables. We evaluate our prototype implementation on open-source software projects and our results suggest that Lean can generalize from a small set of annotated variables to annotate many other variables. Our experiments show that after users annotate approximately 6% of the program variables and types, Lean correctly annotates an additional 69% of variables in the best case, 47% on the average, and 12% in the worst case.

DOWNLOAD tr05-39.pdf

TR-05-40

Hall, Brandon. "Slot Scheduling: General-Purpose Multiprocessor Scheduling for Heterogeneous Workloads". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-40 (masters thesis). September 6, 2005. 61 pages.

This thesis presents slot scheduling, a approach to general-purpose CPU scheduling for multiprocessor systems. The chief virtues of slot scheduling are versatility (the ability to support a broad range of classes of schedulers) and intelligibility (allowing system designers to easily reason about interactions among different kinds of schedulers). In particular, slot scheduling is well-suited for meeting the needs of both real-time and gang-scheduled applications. These characteristics distinguish slot scheduling from existing scheduler proposals which tend to suffer from overspecialization and complexity. Slot scheduling achieves its goals by employing an explicit ``bulletin board'' representation of CPU allocation that decentralizes the task of scheduling from the kernel and permits scheduling logic to be carried out by applications themselves. We show that this approach is both feasible and efficient.

DOWNLOAD tr05-40.pdf

TR-05-41

Liu, Alex X., Chin-Tser Huang, and Mohamed G. Gouda. "Anti-Spam: An Integrated Approach". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-41 (regular report).

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-42

Jain, Navendu, Mike Dahlin, and Renu Tewari. "TAPER: Tiered Approach for Eliminating Redundancy in Replica Synchronization". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-42 (regular technical report).

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-43

Chowdhury, Rezaul Alam. "Experimental Evaluation of a Cache-Oblivious LCS Algorithm". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-43 (regular technical report). October 5, 2005. 12 pages.

We present the results of an extensive computational study of an I/O-optimal cache-oblivious LCS (longest common subsequence) algorithm developed by Chowdhury and Ramachandran. Three variants of the algorithm were implemented (CO denoting the fastest variant) along with the widely used linear-space LCS algorithm by Dan Hirschberg (denoted Hi). Both algorithms were tested on both random and real-world (CFTR DNA) sequences consisting upto 2 million symbols each, and timing and caching data were obtained on three state-of-the-art architectures (Intel Xeon, AMD Opteron and SUN UltraSparc-III+). In our experiments: CO ran a factor of 2 to 6 times faster than Hi. Hi incurred upto 4,000 times more L1 cache misses and upto 30,000 times more L2 cache misses than CO when run on pairs of sequences consisting of 1 million symbols each. CO executed 40%-50% fewer machines instructions than Hi. Unlike Hi, CO was able to conceal the effects of caches on its running time; its actual running time could be predicted quite accurately from its theoretical time complexity. CO was less sensitive to alphabet size than Hi. These results suggest that CO and the algorithmic technique employed by CO can be of practical use in many fields including sequence alignment in computational biology.

DOWNLOAD tr05-43.ps

TR-05-44

Wang, Feng, Lili Qiu, and Simon Lam. "Probabilistic Region-based Localization for Wireless Networks". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-44 (regular report). October 13, 2005. 14 pages.

Determining the physical location of wireless nodes is important to a wide variety of applications. In this paper, we propose a series of probabilistic region-based localization algorithms, including using static grids, segments of grids, and dynamic meshes. These algorithms provide a wide range of trade-off between accuracy and cost, making them suitable for different types of networks, such as sensor networks and mesh networks. Furthermore, we propose several techniques to extract and leverage additional information on location constraints, which is shown to significantly improve the localization accuracy and can be applied to other localization schemes. Finally we develop techniques to enhance robustness of localization, and show that the enhanced scheme can achieve high accuracy even in the presence of significant measurement errors.

DOWNLOAD tr05-44.pdf

TR-05-45

Monroy, German A., Kenneth O. Stanley, and Risto Miikkulainen. "Coevolution of Neural Networks Using a Layered Pareto Archive". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-45 (technical report).

Report number TR-05-45 has been moved to the 2006 Artificial Intelligence (AI) Technical Report index and is now listed as Technical Report AI-06-333.

DOWNLOAD NOT AVAILABLE

TR-05-46

Gouda, Mohamed G. and Young-ri Choi. "A State-based Model of Sensor Protocols". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-46 (regular technical report).

NO ABSTRACT

DOWNLOAD NOT AVAILABLE

TR-05-47

Guyer, Samuel Z., Kathryn S. McKinley, and Daniel Frampton. "A Static Analysis for Automatic Individual Object Reclamation in Java". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-47 (regular report). December 6, 2005. 10 pages.

Automatic garbage collection has proven software engineering benefits: fewer memory-related errors and less programmer effort. However, to attain performance competitive with explicit memory management, garbage collection needs much more memory. This key tradeoff exists because the collector only reclaims memory when it is invoked: invoking it more frequently reclaims memory quickly, but incurs a significant cost, while invoking it less frequently fills memory with dead objects. This work comes closer to the best of both worlds by adding novel runtime and compiler support for compiler-inserted frees to a garbage-collected system. The compiler analysis automatically identifies when objects become unreachable and inserts calls to free. The analysis combines a simple interprocedural pointer analysis with liveness information. The free() implementation depends on the allocator, and we demonstrate variations for \emph{free-list} and \emph{bump-pointer} allocators. Explicitly freeing objects reduces memory requirements by reclaiming memory quickly, reduces garbage collection load, and improves performance, often substantially for mark-sweep collectors. Compared to the baseline, free-me cuts total time by 22\% on average, collector time by 50\% to 70\% on average, and allows programs to run in 17\% less memory on average. Our approach differs from stack and region allocation in two crucial ways. First, it frees objects incrementally exactly when they become unreachable, instead of based on program scope. Second, our system does not require allocation-site lifetime homogeneity, and thus can free objects on some program paths and not on others. Our system also handles common design patterns, such as freeing inside loops and objects created by factory methods.

DOWNLOAD tr05-47.pdf

TR-05-48

Grechanik, Mark. "Viola: A Verifier for Interoperating Components". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-48 (regular technical report). December 20, 2005. 31 pages.

Two or more components (e.g., objects, modules, or programs) interoperate when they exchange data, such as XML data. Currently, there is no approach that can detect a situation at compile time when one component modifies XML data so that it becomes incompatible for use by other components, delaying discovery of errors to runtime. Our solution, a Verifier for Interoperating cOmponents for finding Logic fAults (Viola) builds abstract programs from the source code of components that exchange XML data. Viola symbolically executes these abstract programs thereby obtaining approximate specifications of the data that would be output by these components. The computed and expected specifications are analyzed to find errors in XML data exchanges between components. We describe our approach, implementation, and give our error checking algorithm. We used Viola on open source and commercial systems and discovered errors that were not detected during their design and testing.

DOWNLOAD tr05-48.pdf

TR-05-49

Melville, Prem. "Creating Diverse Ensemble Classifiers to Reduce Supervision". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-49 (dissertation). December 13, 2005. 157 pages.

Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise. DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, Active-DECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that Active-DECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on Active-DECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach. Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.

DOWNLOAD tr05-49.pdf

TR-05-50

Lei, Ving Ian. "Optimizing Data Page Placement for Similarity Join in Metric Spaces". The University of Texas at Austin, Department of Computer Sciences. Report# TR-05-50 (regular report). December 14, 2005. 14 pages.

In this paper, we address the similarity join problem in metric space. We describe how page locality may be improved by reorganizing disk pages using an algorithm borrowed from numerical methods concerning the reorganization of the rows and columns of matrices to form banded matrices. The model is further exploited as a method to schedule the operations of similarity join for finding all pairs of neighbors in a metric space. Experimental results demonstrate significant performance improvements over the symmetric clustering join algorithm proposed in [15] and the cost-based clustering algorithm presented in [7].

DOWNLOAD tr05-50.pdf

For help please contact trcenter@cs.utexas.edu